How Do Computers Learn to Understand Language?

Have you ever typed a sentence into Google Translate and gotten something back that was almost right, but weird? Or asked Siri a question and gotten a reply that made no sense at all? If you have, you’ve run into one of the hardest problems in computer science: getting machines to understand human language.

Here’s the strange thing: computers are incredibly good at some things that humans find hard, like multiplying huge numbers or sorting through millions of records. But they are terrible at something almost every 8-year-old can do: understanding a simple sentence. Why is that? And what happens when you try to teach a computer to do what your brain does without even thinking?

The Basic Problem: Words Are Slippery

Imagine you read this sentence: “The bank was steep, so we had to walk carefully.” Your brain instantly knows that “bank” here means the side of a river, not a place where people keep money. How? You used the rest of the sentence as a clue. A computer, on the other hand, sees the word “bank” and has to choose between dozens of possible meanings.

Now imagine: “He hit the man with the baguette.” Did he use the baguette to hit the man? Or did he hit a man who happened to be holding a baguette? The sentence is genuinely ambiguous. Your brain might not even notice the ambiguity unless you stop to think about it.

These are the kinds of puzzles that computational linguistics tries to solve. The field has two main goals. One is practical: build systems that can translate languages, answer questions, summarize articles, and hold conversations. The other is scientific: understand how human language processing works well enough to model it on a computer.

Two Big Approaches: Rules vs. Statistics

For the first few decades of computational linguistics, researchers tried to teach computers language the way you might learn a foreign language in school: by giving them grammar rules and dictionaries. If a computer knows that “Thetis loves a mortal” is a sentence made up of a noun phrase (“Thetis”) followed by a verb phrase (“loves a mortal”), and that “loves” is a verb that needs both a subject and an object, then in theory it should be able to figure out who did what to whom.

This approach, called the rule-based or symbolic approach, had some success. In the 1970s, a program called SHRDLU could understand commands about a world made of colored blocks on a table. You could say “Pick up the big red block” and it would figure out which block you meant and move it. But SHRDLU only worked in that tiny world. If you tried to talk about anything else, it was lost.

The problem was that real language doesn’t follow neat rules. People say things like “I’m gonna” instead of “I am going to.” They make mistakes, change their minds mid-sentence, and use words in creative ways. A rule-based system can’t handle this. It’s like trying to catch fish with a rigid net made of steel bars—the holes are either too big or too small.

By the 1990s, a completely different approach had taken over. Instead of giving computers grammar rules, researchers started feeding them enormous amounts of real language—millions of sentences from books, newspapers, and websites—and letting the computers figure out the patterns for themselves. This is called the statistical or corpus-based approach.

Here’s how it works in practice. If you want to build a program that can figure out whether “bank” means a river bank or a financial bank, you don’t write a rule saying “If the sentence contains the word ‘river,’ choose the first meaning.” Instead, you give the computer thousands of sentences that have been hand-labeled with the correct meaning. The computer then looks for patterns: what words tend to appear nearby when “bank” means a river? What about when it means a financial institution? Over time, it learns to predict the correct meaning based on context.

How to Teach a Computer Grammar

One of the most basic tasks in computational linguistics is parsing: figuring out the grammatical structure of a sentence. Consider “Thetis loves a mortal.” A parser needs to recognize that “Thetis” is a noun, “loves” is a verb, “a mortal” is a noun phrase, and that together they form a sentence.

In the rule-based approach, you’d write something like:

A sentence can be made of a noun phrase followed by a verb phrase.
A verb phrase can be made of a verb followed by a noun phrase.
A noun phrase can be made of a determiner (“a,” “the”) followed by a noun.
“Thetis” is a noun. “Loves” is a verb. “Mortal” is a noun.

The parser then works through the sentence, trying to apply the rules to build a tree-like structure. This is called a parse tree. If the sentence is ambiguous—like “I saw the man with the telescope”—the parser might produce multiple trees, and then have to decide which one is right.

But as we saw, real language doesn’t always follow neat rules. People say things like “Them dogs is hungry,” which violates standard grammar but is perfectly understandable. So modern parsers use probabilistic grammars. Instead of saying “A noun phrase must contain a determiner,” they say “A noun phrase contains a determiner 80% of the time.” They learn these probabilities from real data.

This part gets complicated, but here’s what it accomplishes: a probabilistic parser can handle messy, real-world language because it doesn’t demand perfect rules. It just looks for the most likely structure.

Meaning: The Really Hard Part

Parsing tells you the grammatical structure of a sentence, but it doesn’t tell you what the sentence means. Getting from structure to meaning is where computational linguistics faces its biggest challenges.

One major approach is to treat language as a kind of logic. When you say “Thetis loves a mortal,” according to this view, you’re really saying something like “There exists an x such that x is mortal and Thetis loves x.” This is a logical form of the sentence. If we can translate English sentences into logical forms, then we can use computers to reason with them: to check whether one sentence follows from another, to answer questions, and to detect contradictions.

This approach, pioneered by a philosopher named Richard Montague in the 1970s, is elegant and powerful. But it has a problem: language is full of things that don’t fit neatly into logic. Consider “John believes that our universe is infinite.” John’s belief could be true or false, but the sentence doesn’t tell us which. Worse, “John designed a starship” doesn’t require that any starship actually exists—John might have designed one that was never built. Standard logic has trouble handling sentences like these.

Another challenge is anaphora: figuring out what pronouns refer to. In “John owns a donkey. He beats it,” your brain knows that “he” refers to John and “it” refers to the donkey. But how? The rule-based approach uses things like gender agreement (John is male, donkey is neuter) and sentence position. The statistical approach looks at patterns in thousands of similar sentences.

Then there’s ellipsis: missing material that the reader or listener has to fill in. “Shakespeare made up words, and so can you.” “So” stands in for “make up words.” That’s easy. But what about “Felix is under more pressure than I am”? Your brain fills in “under pressure” after “I am,” but the exact degree of pressure is left unspecified. And if I say “Felix is under more pressure from his boss than I am,” the missing material could mean either “Felix is under more pressure from his boss than I am under pressure from Felix’s boss” or “Felix is under more pressure from his boss than I am under pressure from my boss.” Both are possible. Which one is right depends on context.

The Knowledge Problem

Here’s the deepest challenge. Understanding language requires knowing an enormous amount about the world. Consider this sentence from a famous example in the field:

“The city council refused the women a parade permit because they feared violence.”

Who is “they”? The city council? The women? Your answer depends on what you know about city councils and women and parade permits and violence. If you think city council members tend to be cautious and women tend to be peaceful, you might infer that “they” refers to the city council, who feared the women might cause violence. If you think women seeking a parade permit might be activists, you might infer that “they” refers to the women, who feared violence from opponents.

Now change “feared” to “advocated”: “The city council refused the women a parade permit because they advocated violence.” Now “they” almost certainly refers to the women. The same sentence, one word changed, completely different meaning—and the difference depends entirely on your knowledge of how the world works.

This is called the knowledge acquisition bottleneck. To truly understand language, a computer would need to know everything that an ordinary human knows: that water is wet, that people can get hurt, that restaurants have menus and waiters, that if you drop a glass it might break. And it would need to use this knowledge automatically, instantly, without being told explicitly.

Some researchers have tried to build this knowledge by hand. The Cyc project, started in 1984, has spent decades manually encoding millions of facts about the world into a computer. It knows that trees are plants, that plants need water, that water is a liquid, and so on. But even Cyc doesn’t know everything, and adding new knowledge is incredibly slow.

Other researchers have tried to get computers to read and learn on their own. You could give a computer all of Wikipedia and let it try to extract facts. But this runs into the same problem: the computer has to understand what it reads in order to learn from it.

Statistical Semantics: Cheating or Genius?

A completely different approach to meaning has emerged in recent years. Instead of trying to extract logical forms from sentences, why not just use the sentences themselves as the representation of meaning?

This sounds circular, but it works surprisingly well for some tasks. If you want to know whether “John is a fluent French speaker” entails “John speaks French,” you don’t need to represent both sentences in logic. You can just look at millions of similar sentence pairs and learn which word patterns tend to indicate entailment.

This is called statistical semantics, and it’s the reason Google Translate works as well as it does. Google doesn’t “understand” French in any deep sense. It has analyzed billions of French-English sentence pairs and learned probabilistic patterns: given a French phrase, what English phrase is most likely to follow?

The same approach powers modern question-answering systems. When you ask “Who killed President Lincoln?” the computer doesn’t look up the answer in a knowledge base. Instead, it searches billions of web pages for sentences that contain the right words in the right patterns. It finds “John Wilkes Booth assassinated Abraham Lincoln” and extracts the name.

This approach has limitations. It can’t reason step by step. It can’t handle novel situations. It doesn’t really understand anything. But for many practical purposes, it works well enough.

Where We Are Now

The field of computational linguistics has made remarkable progress. Fifty years ago, computers could barely understand “Pick up the red block.” Today, you can have a conversation with your phone about the weather, translate a web page from Chinese to English, or ask a computer to summarize a long article.

But these systems are still far from human-like understanding. They work by recognizing patterns, not by reasoning. They can tell you that “John killed Bill” entails “Bill died,” but they can’t tell you why. They can translate “The spirit is willing but the flesh is weak” into Russian and back to English as “The vodka is good but the meat is tough” (this actually happened with an early translation system). They can’t imagine, invent, or truly communicate.

The deepest question in computational linguistics remains unanswered: what would it mean for a computer to really understand language? Is it enough to manipulate words and patterns, or does understanding require something more—consciousness, experience, a body that interacts with the world? Nobody really knows. But trying to answer that question has taught us a tremendous amount about both computers and ourselves.

Key Terms

Term	What it does in this debate
Parsing	The process of figuring out the grammatical structure of a sentence
Logical form	A precise, logic-like representation of what a sentence means
Anaphora	The relationship between a pronoun and whatever it refers to
Corpus	A large collection of real language examples used for training computer models
Probabilistic grammar	A grammar with probabilities attached to rules, learned from real data
Knowledge acquisition bottleneck	The difficulty of giving computers all the background knowledge humans have
Statistical semantics	Using patterns of word co-occurrence to determine meaning
Ellipsis	Missing words that the listener must fill in

Key People

Noam Chomsky — A linguist who argued that the human brain has an innate “language organ.” His idea of transformational grammar influenced early computational linguistics.
Richard Montague — A philosopher who showed that large parts of natural language could be translated into formal logic. His work showed that “language is logic” was a real possibility.
Joseph Weizenbaum — Created ELIZA in the 1960s, a program that simulated a therapist by matching user inputs to pre-written responses. It fooled many people into thinking it understood them.
Roger Schank — Argued that understanding language requires “scripts” (knowledge of typical situations like going to a restaurant) and that much of meaning can be reduced to a small set of action primitives.

Things to Think About

When you read a sentence that has two possible meanings, how do you know which one is right? Try to catch yourself doing this—can you describe the process? Is it something you could teach a computer?
If a program like Google Translate can translate between dozens of languages without understanding anything, does it count as “knowing” language? What would it take to convince you that a computer really understands what it’s saying?
Imagine you had to write down every fact a computer would need to understand a simple story about two kids playing in a park. What facts would you need to include? How many pages would it take? What does this tell you about what’s happening in your own brain when you read that story?
Some researchers think that true language understanding requires a body—you can’t really understand what “hot” means unless you’ve felt heat, or what “run” means unless you’ve run. Others think a computer could understand everything purely from text. Which view seems more plausible to you, and why?

Where This Shows Up

Google Translate and similar tools work by finding statistical patterns across billions of translated documents. They don’t “know” languages, but they produce useful results.
Siri, Alexa, and voice assistants combine speech recognition with pattern matching and limited reasoning. They understand only within narrow domains.
Spam filters use statistical learning to recognize patterns in unwanted emails. They don’t understand the content, but they can spot suspicious language.
Autocorrect and predictive text use probabilistic models of word sequences. When your phone guesses the next word, it’s using the same kind of statistical language model that powers translation systems.
Chatbots like the ones on customer service websites use a mix of pattern matching and pre-written responses, much like ELIZA did in the 1960s.