The Nature of Lisp, Code Generation and Wieldable Programming Power

Jun 25, 2025 | Reading Time: 7 min
gpt

The Article

  • A friend shared an article on The Nature of Lisp recently. I found the article in itself to be superb. For folks who have not read it, I would highly recommend reading it!

  • The article is a gentle introduction to Lisp and its way of thinking in terms of “syntactic abstractions” i.e., macros, DSLs, etc. Unlike a lot of Lisp intro’s/documentation out there, the article’s approach makes it easy for a non-lisp tech person to actually get to know the thought process behind Lisp and how does it actually look like. It builds on a known common point in developer ecosystem and then evolves from there towards syntactic abstractions (mini languages within a language).

  • The essay starts with XML because virtually every working programmer has touched it - a safe, shared starting point. It starts with the Ant build system and its evolution into declarative syntax and how XML relates. Ant’s XML build files are executable, so you immediately see ‘data as code’ in a language most developers know.

  • It talks about XML as a hierarchical representation of tree data. Given that Abstract Syntax Trees of programming languages are also trees, any source code can, in principle, be represented as XML.

  • If we think about XML tag names as operations, it is very easy to visualize code as data. The Ant example also is a neat domain-specific language (DSL) using a shared representation i.e., XML. Basically, defining new tags is equivalent to defining new, arbitrary instructions/operations that would not exist in any general purpose programming language.

  • The article then eases into

    • S-expressions i.e., tree structure as code syntax rather than XML.
    • Meta-programming/Code generation: programs that write programs.
    • The Lisp runtime model:
      • Symbols, lists, functions are all first-class data.
      • Evaluation rule: first element is operator, rest are operands.
      • Quoting (') turns code into inert data; Macros turn data back into code (compile time).
    • Example correlation: Lisp: (+ 3 4), XML: <add value1="3" value2="4"/>
  • Lisp ≈ executable tree structures written as s-expressions. Ability to define and reason with such new operations as and when needed without violating the base principles of a programming language is the defining feature of Lisp.

  • The article builds a bridge:

    • XML (everyone knows it)
    • Ant (XML that runs)
    • Lisp lists (same structure, nicer syntax)
    • Macros (lists that write lists)
    • DSLs (lists that become any language you need)
  • It’s a long but fun read :)

Code That Writes Code — Levels, Power, Pitfalls

Below is my opinionated tour of the code-generation landscape. Think of it as “how many extra hands do I let the computer grow, and at what cost?”

Level 0: Plain Source

  • You type every byte yourself.
  • Good: unsurprising, debuggable with printf.
  • Bad: the second time you re-type the same 20-line struct you feel the boilerplate tax.

Level 1: Textual Templates

  • printf with delusions of grandeur
  • C pre-processor macros, sed scripts, Jinja files, cookie-cutters.
  • They are cheap until they meet edge cases; then regex demons break loose because the tool has zero understanding of the syntax it is spitting out.

Level 2: Structured Generators

  • ORMs, Swagger/OpenAPI to client stubs, GUI builders that emit real ASTs or XML trees.
  • Upside: the output at least parses.
  • Downside: the moment your use-case varies 10% from the happy path you’re forced to either
    • Fork the generated file and forever lose upstream improvements, OR
    • Reverse-engineer the generator itself (try finding documentation).
  • E.g., ORMs seem neat when doing simple queries, but soon enough they start adding “magic dust” in between and then writing anything just beyond simple becomes a field of land mines. If a single hand-written SQL view replaces 300 lines of mystical ORM plumbing, I take the SQL.
  • E.g., for the GUI templates, better to write components and then import and reuse them until a point that no longer satisfies your requirements. The point of fork and customization depends a lot on the use case, but if you generated the same component at the very start, human tendency is going to bring the point of fork much much closer than it needs to be.
  • My rule: generate less, reuse more.

Level 3: Language-Level Macros & Embedded DSLs

  • Now we’re in Lisp territory: programs that fabricate syntax-checked programs.
  • Rust’s macro_rules!, Elixir’s quote, Clojure’s defmacro live here.
  • Super-power: extend the host language as if you were its BDFL.
  • Super-danger: limitless freedom freezes teams.
  • This may be resolved by splitting the teams in two layers themselves like:
    • L0 (kernel team): writes low-level, possibly unhygienic, domain specific macros.
    • L1 (feature teams): consume those macros as if they were built-ins. No ad-hoc meta-wizardry beyond this ring.
  • BUT, as anyone who has lived with a codebase for some amount of time knows, If a language exposes n ways of doing things, all n ways will be present in the codebase at some point.
  • The tension between limitless macros and disciplined subset echoes debates around “type-level programming gone wild” in Haskell, or constexpr abuse in C++.

Level 4: Visual / Low-Code Platforms

  • Zapier, Salesforce flows, Retool, etc.
  • Low-code is, in spirit, a visual DSL. They dress DSLs in drag-and-drop UI so non-devs can deploy workflows.
  • Great for internal CRUD dashboards; awful the day you need a for-loop with early return.
  • Vendor lock-in is not theory: try exporting a 300-step Salesforce flow to Git.

Level 5: Generative-AI and Code Writes Code

Large-language models (LLMs) feel like the final, most chaotic system on the “code writes code” ladder. They don’t really fall into the programming language power paradigm and are not really deterministic engines. But, they do give us an ability to generate code (may be english can be considered a DSL in itself :)).

Let’s pin down what they actually give us, where they fail, and if there is a discipline that could help us.

  • What LLMs really do

    • They predict the next token; they do not manipulate an AST the way a Lisp macro does.
    • Yet the emergent behavior looks macro-like: “Here’s a full CRUD REST service with tests.”
    • They seem like a C pre-processor trained on the entire internet: spectacular reach, but zero schema awareness.
  • Failure models

    • Issue: Verification Debt
      • Why: Model speaks with misplaced confidence.
      • Mitigation: Pair-program tests & static analysis before you read the diff.
    • Issue: Prompt Drift
      • Why: Tweak wording, silently break invariants.
      • Mitigation: Check prompts into Git; diff them like source.
    • Issue: Code Rot
      • Why: One afternoon => 5000 LOC you now maintain.
      • Mitigation: Review once and then review again before merge.
  • Can L0/L1 similar to Lisp be applied to generative AI specific code today?

    • L0 can be people who actually deal with AI. Prompt engineers, MCP maintainers, model evaluators, etc. Basic hygiene definitely needs to be maintained wrt the deliverables from here. Versioning, guard rails, ready made rules and prompt templates, model presets, etc.
    • L1 would be everyone else. They should generally not bother about AI related code tweaks and assume that they are getting the best possible output (and maybe, let the Gemini vs. OpenAI debate be with the L0 folks).
    • Unfortunately, this does not work out well enough. This is mainly due to the fact LLMs are non-deterministic code generators, a good output from a prompt once doesn’t mean that it would remain that way. Pinning model versions, temperature, seeds as a “model preset” helps, but to a limited extent only.

Personal Lines-in-the-Sand

  • Hand-write the 20% that encodes domain knowledge.
  • Reuse code if available and live with it until you can’t, generate what is pure boilerplate and can be re-generated at will (no manual edits after).
  • Adopt the L0/L1 model for any Turing-complete generator. If everybody can roll a macro, nobody can read the codebase.
  • Low-code is fine for ops dashboards; never for core transaction logic.
  • Treat LLM output as you would an intern’s PR: valuable, but never merge-on-green.

Take-aways

  • Unlimited macros (or unlimited LLM generations) are exhilarating and paralyzing.
  • Balancing these powers — knowing when to generate and when to hand-craft — will, ironically, remain a human judgment call for some time.

References / Further Reading