Language and thought

Language is the most over-loaded concept in AI discourse. It is at once the medium humans think with, a social coordination device, a sequence-prediction problem, and the surface the largest AI models are trained on. Untangling these is essential.

Two traditions in linguistics

The Chomskyan tradition treats grammar as an innate, mostly universal cognitive faculty, distinct from general intelligence. Children acquire syntax with very little data because their brains come pre-wired with a Universal Grammar.

The usage-based tradition (Tomasello, Bybee, Goldberg) treats language as learned bottom-up from communicative experience using general-purpose statistical and social-cognitive abilities. Recent corpus and developmental evidence has, on balance, shifted the field toward usage-based accounts, though the debate is far from over.

What LLMs are models of

A large language model is, mathematically, a conditional probability distribution over token sequences: given the previous tokens, what comes next? Trained on trillions of tokens, it ends up encoding a remarkable amount of world knowledge, syntactic structure, and reasoning shortcuts.

But it is a model of text, not of thought. The same model trained on different corpora produces different 'personalities' and different worlds. That is a clue about how much of LLM behaviour is the data and how much is the architecture.

Thinking without language

Humans do much of their thinking non-linguistically. Spatial reasoning, motor planning, visual imagination, and emotional inference can be done with little or no inner speech. Aphasic patients without functional language can still solve complex problems.

This is one reason language-only AI is suspected to be missing something. Multimodal models add vision and audio; embodied agents add action. Whether that closes the gap or just papers over it is contested.

Where LLMs surprise — and where they fail

LLMs do better than expected on tasks that look like reasoning (mathematical word problems, code, analogies). They fail more than expected on tasks that should be trivial: planning beyond a few steps, maintaining consistent goals over long contexts, and noticing when they are simply wrong.

These failure modes are not random; they trace the boundary between pattern recognition and structured cognition. That boundary is exactly where the AGI debate lives.

Key terms

Universal Grammar: Chomsky's proposal that grammar is an innate cognitive faculty.
Usage-based: Linguistic account where language is learned from communicative experience.
Token: A sub-word unit; the elementary symbol an LLM consumes and produces.
Multimodal: Trained on or operating over multiple input modalities (text, image, audio, action).
Grounding: Connecting symbols to sensorimotor experience or world referents.

Connects to AGI

If thought is essentially linguistic, LLMs are very close to general intelligence. If it isn't, they are powerful but partial. Most AGI roadmaps now assume the second view and are racing to add the missing pieces.