Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs | Lex Fridman Podcast #426

Added: Apr 18, 2024

In this podcast episode with Edward Gibson, a psycho-linguistics professor at MIT, the discussion revolves around the fascinating aspects of human language. Gibson's interest in language began during his school days when he found grammar to be an intriguing puzzle. His background in mathematics and computer science led him to approach language as a structured system that could be analyzed and understood.

Fascination with Language

Gibson finds the beauty of human language in the generalizations that exist within and across different languages. He highlights the patterns of word order in languages, such as subject-verb-object or verb-subject-object, and how these patterns contribute to minimizing dependencies between words. This concept of minimizing dependencies is crucial for effective communication and understanding in language.

Dependencies in Language

Dependencies in language refer to the connections between words in a sentence. Each word is connected to another word, forming a tree-like structure where the root represents the main event or action in the sentence. Linguists generally agree that all sentences can be broken down into a tree structure, with each word hanging on to another based on their grammatical relationships.

Morphology and Morphemes

Morphology deals with the study of morphemes, which are the smallest units of meaning in a language. In English, words typically consist of one or two morphemes, while languages like Finnish may have more complex morphology with multiple morphemes per word. Morphemes can be prefixes, suffixes, or infixes, altering the meaning or grammatical function of a word.

Evolution of Language

The evolution of language and its morphology is a complex process that is not fully understood. Language evolves through communication and interaction within a community. The stickiness of certain linguistic features may arise from their effectiveness in communication and the cultural norms of a particular group. The evolution of language is also influenced by contact between different language groups, leading to the adoption of useful elements from other languages.

Color Words

Gibson delves into the topic of color words and how different cultures have varying numbers of words to describe colors. He mentions that English has around 11 common color words that everyone knows, such as black, white, red, blue, green, yellow, purple, gray, and pink. However, he highlights that there are millions of distinctions in colors that can be seen by individuals with normal color vision. He explains that the evolution of language, especially in non-industrialized cultures, is influenced by the need to communicate effectively about certain topics, such as color.

Syntax and Grammar

Gibson then moves on to discuss syntax and grammar, particularly focusing on the differences between phrase structure grammar and dependency grammar. He explains that phrase structure grammar, proposed by Noam Chomsky, involves breaking down language into categories like S (sentence), NP (noun phrase), and VP (verb phrase). On the other hand, dependency grammar emphasizes the relationships and dependencies between words in a sentence. Gibson prefers dependency grammar as it makes the connections between words more transparent and highlights the lengths of dependencies in a sentence.

Movement in Language

One of the key points of disagreement between different linguistic theories is the concept of movement in language. Chomsky proposed the idea of movement, where words or phrases shift from one position to another in a sentence to create different structures, such as questions or passive voice. However, Gibson presents an alternative theory called lexical copying, where words have different forms for different sentence structures without the need for movement. He argues that lexical copying is more learnable and accounts for the variations in usage seen in languages like English.

Learnability and Universal Grammar

Gibson discusses the concept of learnability in language acquisition and how different theories of grammar impact the ability to learn a language. He mentions that Chomsky's theory of movement poses challenges in terms of learnability, as it makes it difficult to determine the underlying structure of a language. In contrast, the lexical copying theory offers a more bottom-up approach to learning language rules and accounts for the variations in usage seen in different contexts. This difference in theories also ties into the concept of Universal Grammar, where Chomsky argues that certain language structures are innate and universal across languages.

Formal Language Theory

The discussion also touches upon formal language theory, which involves studying the structure and complexity of languages, including human languages and computer programming languages. Gibson explains the hierarchy of formal languages, ranging from regular languages to context-free languages to context-sensitive languages. He clarifies that human languages fall under the context-free category, which allows for more complex structures like recursion and long-distance dependencies, unlike regular languages.

Center Embedded Recursion

Gibson discusses the phenomenon of center embedded recursion within sentences, where a sentence is embedded within another sentence. He explains that this type of recursion is challenging for language processing due to the increased distance between words. He points out that center embedded recursion is universally difficult across languages, as longer dependencies lead to confusion and hinder communication.

Purpose of Language and Audience Considerations

The conversation delves into the purpose of language and the role of the audience in linguistic optimization. Gibson reflects on the balance between optimizing language for ease of production and ensuring effective communication with the listener. While acknowledging the importance of tailoring language to the audience, Gibson maintains that the primary objective is to make language production easier for the speaker, ultimately facilitating understanding and conveying meaning.

Language Network in the Brain

Gibson explains that the human brain has a specialized language network that is activated when individuals engage in language-related tasks such as speaking, listening, reading, or writing. This network is consistent across individuals and remains stable over time. Through fMRI scans, researchers can pinpoint the areas of the brain responsible for language processing, showing that language comprehension and production activate specific regions.

Separation of Language and Thinking

One fascinating aspect discussed is the separation of language from thinking. Gibson mentions cases where individuals have suffered a stroke that specifically affects their language abilities while leaving other cognitive functions intact. These individuals, known as global aphasics, can perform various tasks like playing chess or driving a car despite their inability to communicate through language. This suggests that language is not a prerequisite for thinking and that the two processes can exist independently in the brain.

Inner Voice and Language Processing

The concept of an inner voice, where individuals hear themselves speaking in their minds, is explored. While some people report having a distinct inner voice when thinking or reading, others, like Gibson, do not experience this phenomenon. This raises questions about the role of the inner voice in language processing and its connection to the language network in the brain.

Large Language Models and Understanding

The conversation shifts towards large language models, such as those used in natural language processing tasks. These models excel at predicting and generating language based on vast amounts of training data. While they are proficient in formulating sentences and text that mimic human language, there are doubts about their understanding of deeper meanings. Gibson points out that large language models can be easily tricked or misled, indicating a gap between surface-level language processing and true comprehension of meaning.

Construction-Based Theories of Language

Gibson mentions construction-based theories of language, which focus on the usage and structure of language in real-world contexts. These theories emphasize the relationship between form and meaning in linguistic expressions. He suggests that large language models align closely with construction-based theories, as they excel at generating language based on patterns and structures observed in training data.

Future Directions in Language Research

As researchers continue to explore the capabilities of large language models and their implications for language understanding, there is a need to delve deeper into the mechanisms that underpin language processing in the human brain. Understanding the relationship between form, meaning, and thought is crucial for advancing our knowledge of language and cognition.

Legal Language Analysis

The focus then shifts to legal language, which Gibson has analyzed to identify the features that make legal language hard to understand. He found that legal language is characterized by high levels of center embedding, low-frequency words, and passive voice constructions. However, the study revealed that passive voice had no significant impact on comprehension, while low-frequency words slightly affected recall. The most significant factor affecting understanding was the presence of center embedding in legal texts.

Lawyers' Perception and Preferences

Contrary to the assumption that lawyers may prefer complex language to maintain their expertise, the study found that lawyers also struggled with understanding and recalling center embedded legal language. When presented with simplified versions of legal texts without center embedding, both laypeople and lawyers preferred the clearer and more straightforward versions. This suggests that the complexity of legal language may not be intentional but rather a result of certain stylistic conventions or performative aspects associated with legal writing.

Language as a Communication System

Gibson continues by explaining that language is essentially a communication system that allows individuals to convey messages from one mind to another. He highlights the role of noisy channels in communication, where various forms of noise, such as background noise, speaker errors, and listener difficulties, can impact the transmission of information. This concept is rooted in the work of Claude Shannon, who pioneered information theory in the 1940s.

Dependency Length and Language Structure

The conversation then shifts to the concept of dependency length, which refers to the distance between words in a sentence that are dependent on each other. Gibson explains that languages tend to optimize dependency length for efficient communication, with word order and syntax playing a crucial role in this optimization process. He emphasizes that while some aspects of language structure may be influenced by the need to minimize noise in communication, the exact mechanisms are still subject to ongoing research and debate.

Cultural Influence on Language

The conversation delves into the influence of culture on language, with Gibson emphasizing the need to consider cultural factors when studying language diversity. He mentions the unique characteristics of isolate languages, such as the absence of specific words for numbers in the P language. This lack of numerical vocabulary challenges traditional assumptions about language and highlights the intricate relationship between language, culture, and cognition.

Translation Challenges

One of the key challenges discussed by Gibson is the difficulty of translating between languages with distinct concepts. He illustrates this point by describing the limitations faced when trying to translate exact counting words from English to Paha. The absence of specific words for numbers in Paha hinders the ability to perform tasks that require precise counting, highlighting the importance of language in enabling certain cognitive functions.

Human Language vs. Other Forms of Communication

The discussion delves into the evolution of language and the role of language in shaping human cognition. Gibson challenges the notion that human language is inherently superior to other forms of communication, emphasizing the need for humility when considering the communication systems of other species. He suggests that there may be underlying signals in non-human communication systems that are yet to be fully understood.

Drawing parallels between communication with remote tribes and potential communication with intelligent alien civilizations, Gibson explores the possibility of establishing common languages with non-human species. He acknowledges the work being done to communicate with animals like whales and crows, highlighting the potential for finding common ground in communication across different species.

Videos

Full episode

Episode summary