Large Language Models, pt. 1: What are LLMs?
Large Language Models, or LLMs, have been making news lately. The US government is treating AI as a national security issue; AI capability is seen as a key axis of the contest between the US and China. Skeptics question what practical utility these models have, and what side effects they may come with. Fanatics say that AI could be the harbinger of the end of labor, while detractors argue LLMs are merely a toy with little practical application.
What are Large Language Models? #
“AI” is a complicated term, with a lot of history. It’s had a few different meanings over time, but more recently, it’s come to indicate technologies using a technique called “machine learning”, which is built on the underlying technology of “neural networks”. To summarize this very quickly: Neural networks are a method of programming computers based on our understanding of how brains work, with simulated neurons interacting. Instead of programming with if/else clauses, it is programmed by feeding inputs into it, checking them against outputs, and reinforcing the strength of the neural connections that tend to yield better results. This process yields a system that can pattern-match quite well, uncovering connections that humans don’t even realize, and doing a good job of it. This is, naturally, one of the key technology that underlies recommendation algorithms, image recognition, and… LLMs.
A LLM is a clever trick on top of this technology. At its core, a LLM (usually) predicts the next word in a sequence. More or less: “based on the preceding text, what is the next piece of text”. Initially, the technology wasn’t very good, but by vastly increasing the size of the neural networks, and developing some key techniques, vast improvements were made. As a result, we have now created a machine that can generate text that is hard to tell from being human.
Tokens and Inference #
As an additional layer, the LLM does not understand text like humans do. As part of its neural network framework, the actual text is broken up into pieces through a process called “tokenization”. Each token is assigned a number, and the neural network operates on these numbers. For example, the phrase “I like strawberries” might tokenize as [40, 1299, 106502]
. Text gets translated into these tokens when fed into the LLM, and get translated back from token to text on their way back out. Incidentally, this is why LLMs tend to fail at the “how many R’s are there in ‘strawberry’?” question; they can’t see the text.
Tokens are useful for understanding how LLMs work. They are the fundamental building block of the inputs and outputs, but are also generally the basis of how billing and rate-limiting work. Words are often a single token, but sometimes require more. As a rule of thumb, 750 words generally equates to about 1000 tokens.
The consumption of a Large Language Model, the usage of that “next token prediction” engine, is generally called “inference”. There are various inference providers. Most famously, OpenAI and Anthropic, but also clouds like AWS, Google, and Alibaba, and global firms like Mistral, Z.AI, and Deepseek. Inference may be purchased in a couple different ways. Most consumer-facing inference services charge a monthly fee and have usage limits, while others may charge based on usage, based on the number of tokens in inputs and outputs.
Historical Context and Sci-Fi #
The Turing test was a long-standing thought experiment in computer science. Given a computer program generating text, and a human writing text, could a judge determine which was the human and which was the machine, just by communicating with written questions and answers? The idea of a computer being able to pass as human relatively consistently has always been theoretical—a “goal that might be possible someday, but probably not.” People used to try to make chatbots that could fool people into thinking they were human. Today they program their chatbots to make sure their users know that they’re not human. The Turing test is now irrelevant. It’s no longer a high bar.
“Sentience” is a word that gets thrown around in some corners. We’ve been primed for this by science fiction, where machines are often presented as sentient, often approaching the topic using the language of “civil rights”. It’s worth noting that Sci-Fi is typically more of a commentary on the society in which it is written, than it is on a specific future. The “tokenization” and “next token” explanations above should give a decent intuition on why “sentience” is not really a feasible characteristic for LLMs to have. At their core, they are mappings of probabilities of tokens, by a system whose entire experience is the prediction of the next token. To quote Star Wars, “The ability to speak does not make you intelligent.” Incidentally, research has borne this out. While LLMs exhibit some interesting emergent properties, they don’t truly have the ability to reason using reasonable definitions. With LLMs, computers can now communicate in a way that we human beings are primed to see as peers, and so it is easy to read sentience into LLMs on an emotional level, but there is no evidence to indicate that LLMs have any more internal experience than predicting the next token. I’m not sure what sentience is, but I know it’s more than that.
Why are they Important? #
Historically, computers have been excellent at some kinds of cognitive tasks, but terrible at others. As loose categories, computers are much better at “hard” skills than “soft” skills. If one can describe how to do something in detail, to break it into instructions, then it can be made into a computer program. It’s easy for a computer to draw a perfect square, but hard to draw a watercolor portrait. It’s easy for a computer to read structured code or data, but hard for a computer to understand the tone of an email.
One of the core value propositions of neural networks is that they’re better able to do these “softer” tasks. This is important because:
- There is a massive amount of labor spent on language tasks, and the potential to automate them could benefit productivity.
- LLMs may not need to be trained on any specific domain in order to achieve useful results, because language is generally interoperable across many domains.
- While LLMs are not capable of true reasoning, they’re able to approximate cognition, on-demand, and at low cost.
This means that we now have an approximation of general-purpose cognition, capable of dealing with novel situations, available on-demand, in a format that one does not have to be particularly skilled with to use successfully. This is a fundamentally new world for the industry, in many ways.