AI

How LLMs work — a non-PhD explainer

Tokens, embeddings, attention, transformers — explained with 0 math.

Elevatools Team·2026-01-15· 3 min
Share

The pipeline

  1. Tokenize your text into pieces (~3.5 chars/token).
  2. Embed each token as a vector.
  3. Attention lets each token look at every other token.
  4. Predict the next token based on context.
  5. Repeat until done.

Why it sometimes hallucinates

The model predicts plausible text — it doesn’t “know” facts. Strong prompts + tools + retrieval reduce this.

Why temperature matters

  • Temperature 0 = deterministic
  • Temperature 1 = creative
  • Temperature 2 = chaotic

The future

Tool-use, multimodality, longer context, smaller efficient models. We’re in year 5 of a 30-year shift.

Related reads