Scramblings

Dev scratchpad. Digital garden

Human Typing Habits and Token Counts

May 8, 2026 | Reading Time: 3 min
gpt

Humans type for speed, tone, and habit. Tokenizers split text based on common patterns, and providers bill per token. That means ordinary habits like typos, shorthand, filler words, pasted IDs, and stray whitespace can change token counts without changing intent much.

I started noticing this on a tiny prompt: 5 words, 2 spelling mistakes, 13 tokens. I fixed the spelling and sent it again: 6 tokens, including the full stop.

Counts below use OpenAI’s tokenizer and Claude’s API based tokenizer. In general Claude spits out more tokens on the same text compared to OpenAI in my usage. Counts here are for isolated strings. In real prompts, counts can shift slightly based on surrounding spaces, punctuation, and casing.

Typos

Swapped letters, dropped letters, doubled letters, nearby-key misses: all normal typing habits, all billable.

  • template1, tempalte3
  • loaded1, lodaed2, Claude: 3
  • assistant1, assitant2, Claude: 3
  • simple1, simpel2
  • like1, liek2

Same intent. Different split.

Common spellings compress. Rarer spellings fragment. In code, this can compound quickly: the same bad identifier/var name/func name shows up in declarations, references, logs, errors, diffs, etc.

When I type for work (code, prompts, texts etc.) my left hand is slightly faster than my right which results in some swapped letters. I never bothered to correct myself when using Google searches, or text messages etc. Now apparently that difference has a pricing model.

Word shapes

Word shapes matter too. Quick checks:

  • describe1, describer2, describers3
  • error1, errored2

A tiny suffix looks harmless to a human. Tokenizers may split it very differently.

Conversation habits

Human chat carries a lot of low-signal padding:

  • fillers: just, basically, actually, really
  • hedges: maybe, I think, kind of, sort of
  • wrappers: hey, please, thanks, sorry
  • tails: etc., or so, and all that. etc. is a bit mixed. If it replaces a long useless tail, it can save tokens. If it just hangs off the end of a sentence, it mostly adds fog.
  • chat noise: lol, haha, ..., !!, ??
  • transcript filler: uh, um, you know, like

Tiny expressive habits count too:

  • Good1 / Good...2
  • Yes1 / Yes!!2
  • Ok/Okay1 / Ok/Okay???2
  • yes1 / yesss3
  • really1 / reeeally3

These help tone. They rarely help the task.

Shorter to type is not always cheaper

Humans optimize for keystrokes. Tokenizers optimize for common text. Those are not the same thing.

  • please1 / pls → Claude: 2
  • thanks1 / thx2
  • without1 / w/o2, Claude: 3

Most of the time standard dictionary words will be 1 token and almost always more explicit, clearer, and closer to the text models saw during training, than shorthands.

Quiet token leaks

Some things are not conversational, but they show up in normal work and still inflate tokens:

  • UUIDs, hashes, timestamps, request IDs.
    • e.g., UUID - 019d6ce9-7cfe-753a-b6d6-df719510c9e324, Claude: 26
    • e.g., RFC 3339 timestamp - 2026-05-08T21:00:00+05:3016, Claude: 17
  • long URLs and file paths
  • leading spaces, trailing spaces. Normal internal spacing is usually fine. Boundary whitespace is where things get weird.

Conclusion

The model may recover meaning from all of this. Billing does not.

Humans type by habit. Tokenizers bill by pattern.

Which is mildly annoying, because now even tempalte feels like a line item to rectify.