For more than 150 years, the assumption that language is a singular event has hampered progress in explaining its evolution. Another obstacle was the inability to recognize that certain social interactions, uniquely human, are necessary for the evolution of language.
These problems have recently been addressed by recognizing that words need to evolve before grammar and by uncovering the nonverbal emotional and cognitive relationships between an infant and the caregiver. As I explain below, these relationships are known as intersubjectivity and joint attention.
Source: Julia Margaret Cameron / Wikipedia
Darwin argued that the theory of evolution could explain the transition from animal communication to language by the principle of natural selection. The idea was that “language differed in degree and not in kind” from animal communication. It remains to discover the degree – “the innumerable gradations” which separate them.
Some of these gradations have been discovered in recent years. But their nature suggests that language differs in nature from animal communication. Along with Darwin, Alfred Wallace, who published the first article on the theory of natural selection, wondered how natural selection, which assumes the survival value of a new ability, could account for “higher intelligence. Of the man. Compared to apes, Wallace couldn’t understand why natural selection would produce anything other than a slight increase in mental capacity. Language, let alone digital literacy or music, is hardly necessary for survival.
Because Wallace assumed language was a singular event, he didn’t realize that words had to evolve before grammar. If he had, he might have recognized how consistent a theory of the evolution of words would be with the principle of natural selection.
Before words could evolve, some of our ancestors had to become more cooperative than the apes. This increased cooperation was necessary for intersubjectivity and joint attention to evolve. To see how the verbal and nonverbal components of language relate to each other, it helps to examine why chimps, our closest living relative, cannot learn language.
For chimpanzees, competition is the norm, not cooperation. Mother chimpanzees (and other monkeys) don’t allow anyone else to interact with their babies for about six months. In contrast, human mothers allow others (parents and non-parents, so-called “allomothers”) to interact with a newborn baby immediately after birth. This practice, known as collective breeding, seems to have started with homo erectus, an ancestor who lived about 1.8 million years ago.
Infants raised in collective rearing face two problems. In addition to discerning their mother’s emotions and learning to identify with her, collectively-raised infants encounter the same problem when interacting with alloparents. Therefore, infants raised this way are socially challenged in a way that monkeys are not.
Chimpanzees are not only more competitive than humans, but they rarely share any rewards, such as trading a banana for grapes. Collective breeding changed that and made cooperation, rather than competition, the norm.
Human infants not only exchange physical rewards, but also participate in exchanges where the reward is social, for example, when they inform another person of the location of a missing object by pointing at it.
Infants who have good relationships with alloparents were more likely to survive than those who did not. This selection pressure has helped foster the high degree of cooperation which is crucial for developing intersubjectivity and joint attention.
Intersubjectivity refers to exchanges of affect between an infant and the caregiver that often manifest itself in play. Peek-a-boo, a game observed in all cultures, is a good example. Joint attention refers to a relationship between an infant and caregiver in which they share their attention on external objects, for example, an infant denoting a dog.
Intersubjectivity begins at birth, a consequence of the cradle and proximity of the infant’s eyes to those of his mother. The bond they form is then amplified by the joint attention of an infant and his caregiver to objects of mutual interest.
The dynamics of intersubjectivity and joint attention are invisible to the untrained eye. What greater joy for parents than playing hide and seek with their baby or seeing their baby show something and then smile? Such a game is necessary to produce the child’s first words around his first birthday.
Hyper-cooperation, intersubjectivity, and joint attention have collectively created a perfect storm for the transition from animal communication to words. Linguists have neglected this transition in favor of the passage from words to grammar, the most famous feature of the language. It is easy to show that the transition from animal communication to words required more structural changes than words to grammar, in particular, the transition from analog primate calls to discrete digital speech. But grammar could not evolve without words.
The analog signals that animals use to communicate vary in intensity and frequency. In addition, the average number of signals used by a given species rarely exceeds two dozen. On the other hand, the variations of meaning in the language are conveyed by the choice of discrete words, the upper limit of which is enormous. A reader of this blog knows more than 50,000.
The shift from the analog emotional signals that animals communicate to discrete words has been a dramatic evolutionary change. Besides the nature of the signal, emotional signals also differ fundamentally from words in that they are involuntary, un-learned, and one-way. Their only function is to influence the behavior of others, such as asserting dominance, demarcating a territory, expressing an interest in mating, alerting others to a predator, finding food, etc.
Emotional signals are also immutable. Dogs can only bark, cats can only purr, birds can only sing, and lions can only roar.
Words are voluntary, learned and arbitrary. Typically, words are also conversational. A speaker and a listener alternate roles while sharing information. Unlike emotional signals, the form of which is fixed, the form of a word is arbitrary. A person can say tree, tree, der baum, el árbol, il arbero, or their equivalent, in over 6,000 spoken languages, or in gestures used in dozens of sign languages.
In sum, the shift from emotional signals from animals to words involved a larger change in form of expression than the shift from words to grammar. These only concern their organization, order, inflection, etc. The shift from animal communication to words marks the first time our ancestors communicated in a conversational, arbitrary fashion. This is not to minimize the importance of the passage from words to grammar but only to clarify that it does not require a new form of expression.
Despite these apparent facts about words, they remain stepchildren in discussions of the evolution of language. As mentioned earlier, the vast literature on the evolution of language has focused on grammar, not words.
Source: Wugapodes / Wikimedia
This imbalance can be attributed to Chomsky and his students. For more than 70 years, they have sought to discover the nature and origins of grammar, possibly at the expense of words. As can be seen from a recent comment, Chomsky seems to be aware of this problem:
The minimal meaning-carrying elements of human languages… are radically different from anything known in animal communication systems. Their origin is entirely obscure, posing a serious problem for the evolution of human cognitive capacities, in particular of language.1
I recognize the importance of understanding grammar and why a theory of grammar would be the ultimate step in explaining the evolution of language. But neglecting the origin of words in the quest to understand the origin of grammar seems to me to put the cart before the horse. It’s like trying to understand molecules without understanding the nature of the elements and the atoms that define them. Just as the efforts of alchemists to transmute lead into gold have hampered our understanding of chemistry, ignorance of the origin of words hampers our understanding of language and its functions.
Focusing on words rather than grammar, however, reveals an interesting problem. Linguists have yet to agree on the definition of a word. Culturally, linguists consider all individual statements in the form of words. This is true for humans as well as for animals. Children’s statements such as hi, no, up, ouch, over, bye, etc. are considered words. The same goes for the statements that monkeys learned in experiments about “language” and the signals of animals to communicate. For example, the alarm calls of vervet monkeys, eagles, leopards and snakes have been mistakenly called words.
What is needed is a definition that will distinguish between these statements and the referential properties of words. This is why I define words as arbitrary statements that are used in conversation. Speakers use words to denote objects or events for the benefit of a listener and vice versa.
This definition provides an important evolutionary limit that preserves the essence of human language. It can violate deeply felt cultural biases by excluding a tiny number of statements infants make, statements that are not referential. But language as we know it would never develop if such utterances were all a child could learn.
To recap, I argued that the best way to advance in the evolution of language is to focus on the origins of words, not grammar. This effort must be both phylogenetic and ontogenetic.
From a phylogenetic point of view, it is important to ask what psychological and environmental factors facilitated the shift from animal communication to words?
Ontogenetically, we must ask ourselves, how do the utterances of human infants become words of reference?