Software Design First Principles

In the world of software engineering, we’re surrounded by “golden rules”: SOLID principles, high cohesion and low coupling, DRY (Don’t Repeat Yourself), Clean Architecture…

These principles have been passed down from generation to generation of programmers like ancestral remedies. When we ask “why,” the answers are often empirical: “because it’s easier to maintain” or “because the masters said so.”

But if we strip away all the methodological clothing, what exactly are the First Principles of software design?

If software engineering is a science, it shouldn’t be built on experience alone—it should be built on some immutable physical constraints.

Today, I want to attempt to derive the mathematical essence of software architecture from the perspectives of cognitive science and information theory. This isn’t just to explain the past, but to see clearly into the future of the AI era.

I. The Starting Point: The Physical Limits of Cognition

Let’s first acknowledge a frustrating fact for all programmers: our brains aren’t actually that powerful.

Whether you’re a junior developer or a top architect, our biological “hardware specs” are exactly the same: our Working Memory capacity is extremely limited. In 1956, cognitive psychologist George Miller proposed the famous “magical number 7±2”—the number of chunks (units of information) humans can process simultaneously. But subsequent research showed this estimate was too optimistic—Nelson Cowan and others revised it down to a more conservative 3-4 chunks. More crucially, Cowan distinguished between “activated memory” and “focus of attention”: the former is time-limited, while the latter is capacity-limited. In other words, even if information is still “remembered,” you can only process a very small number of chunks simultaneously.

This is our Cognitive Bandwidth, or in AI-era terminology—this is our brain’s Context Window.

But what about the software systems we build? Their complexity grows exponentially. The state space of a modern enterprise application far exceeds the carrying capacity of any human brain.

This leads to the core contradiction of software engineering:

Infinitely growing system complexity vs. extremely limited biological cognitive bandwidth.

Without solving this contradiction, any program exceeding 1,000 lines of code would collapse from exceeding its author’s control.

II. The Essence of Architecture: Information Filters

Since we can’t expand our brains (Context Window stays fixed), and we can’t stop business growth (system scale keeps increasing), what’s the only way out?

The answer is: decompose the problem.

We need to design a structure that divides complex systems into relatively independent parts. This way, when the brain deals with any specific problem, it only needs to focus on a self-contained small world—small enough to fit within those 3-4 chunks of capacity, yet complete enough to make correct judgments without requiring additional context.

This is the essence of architecture. Architects aren’t stacking modules—they’re doing information shielding—ensuring that each cognitive focus produces complete, valuable decisions.

Let’s use this perspective to re-examine “Coupling.”

What is high coupling? If you read a textbook, it’ll tell you “excessive dependencies between modules.”

But from an information theory perspective, coupling is essentially “information leakage.”

Bad architecture: Module A knows Module B’s database table structure, knows B’s internal algorithms, and even knows B’s global variables.
- This means: when you try to modify B, you must also load A into your Context Window because A depends on B’s details.
- Result: your cognitive bandwidth is instantly overwhelmed, and chaos ensues.
Good architecture: Module A only knows Module B’s Interface, nothing more.
- This means: the interface acts as a low-pass filter. It blocks 99% of implementation details (high-frequency information), allowing only 1% of contract information (low-frequency information) through.
- Result: when modifying B’s internal implementation, your brain doesn’t need to load A at all. The system is large, but what you’re facing at this moment is just a highly compressed local portion.

Therefore, the core action of architectural design is minimizing “Mutual Information” between modules.

We’re not cutting off connections—we’re cutting off unnecessary information flows.

III. Re-validation: Those Classic Patterns

When we put on “context engineering” glasses, you’ll find that all classic architectural patterns are essentially doing the same thing: Context Compression.

In my previous article, we deeply compared Clean Architecture and ECS—two very different architectural styles. Now let’s re-examine them through the lens of “context compression”:

Clean Architecture segments context through interface boundaries: core business logic doesn’t need to know about specific databases, UI, or framework implementations. When writing business logic, your Context Window doesn’t need to load information about “whether this is MySQL or MongoDB.”
ECS segments context through data orthogonality: each System only cares about specific Component combinations, with no need to know whether this entity is “the Joker” or “Batman.”
Microservices forcibly isolate context using physical network boundaries. Network bandwidth limitations force developers to explicitly define “what information can be transmitted.”

Different means, same purpose: minimize the “global context” that needs to be loaded when “modifying locally.”

We can explain even more patterns with this same perspective:

Event Sourcing: compresses “current state” into “final projection of event stream”—when reading, you only need to load the aggregated view, not the complete history.
Actor Model: each Actor encapsulates its own state, communicating through message passing. When understanding one Actor, you don’t need to know the internal state of other Actors.
Pure functions in functional programming: output is determined by input, with no hidden global state. A function’s Context is compressed into its parameter list.

Conclusion: all successful architectures help us make the most of those precious 3-4 chunks.

IV. The Endgame of the AI Era: Context Engineering

Why is revisiting first principles so important at this point in time?

Because our development model is undergoing a transition from Carbon (carbon-based) to Silicon (silicon-based).

The emergence of AI programmers (Agents) hasn’t changed this first principle. On the contrary, it has made this problem more digital and more precise.

Even the largest context models like Gemini and Llama 4 are constantly breaking records (1M, 10M tokens). But “can fit” doesn’t equal “can use well.” Multiple studies in 2025 revealed a frustrating fact:

Context Rot (Chroma Research, July 2025): As input length increases, performance predictably degrades—even for simple tasks.
EMNLP 2025: Even when models can perfectly retrieve all relevant information, their reasoning performance still drops by 13.9%–85%. This means the problem isn’t “can’t find”—it’s “can’t use well.”
Maximum Effective Context Window: Researchers found that models’ “effective context window” is far smaller than vendors claim—some models start significant degradation at 1,000 tokens, differing from claimed million-level windows by over 99%.

This mirrors human brain performance exactly—in fact, cognitive neuroscience has revealed the common mechanism behind this: energy constraints.

UCL research shows that the brain’s information processing capacity is directly limited by metabolic energy supply. Even if you “see” all the information, the brain can only focus limited energy on a single attention focus. More interestingly, the “Neural Efficiency Hypothesis” suggests that more intelligent people actually consume less brain energy when completing cognitive tasks—efficient cognition isn’t about “using more resources,” but “doing more focused work with fewer resources.”

LLMs’ Transformer attention mechanisms seem to follow similar constraints: the longer the context, the more diluted the attention, and the worse the reasoning quality.

Future software engineering will no longer distinguish between code for humans to read and code for AI to read. Because humans and machines are isomorphic: we both hate High Context.

What is “legacy code” (or “shit mountain”)?

It’s code with extremely high mutual information, hopelessly entangled. To get AI to correctly change one line of code, you have to stuff 100 files from the entire codebase into the Prompt. This is not only expensive but destined to fail.
What is “AI Ready” architecture?

It’s systems with low mutual information and high cohesion. When AI tries to fix a module, you only need to feed it the current module’s code and interface definitions (minimal Context), and it can produce perfect, side-effect-free modifications.

This is the endgame of software engineering: Context Engineering.

Through architectural design, we pre-lay paths of low entropy and low context dependency within complex systems.

Whether it’s human thinking or AI reasoning walking these paths, both can reach the correct destination with minimal energy consumption.

Conclusion

Don’t be dazzled by flashy new technologies. The core challenge of software design has never changed:

Fighting infinitely growing entropy within limited cognitive bandwidth.

Architects are the information gatekeepers standing at the system’s entrance. Your mission isn’t to draw the most complex diagrams, but through design, to ensure that as systems grow ever larger, they can still be understood and controlled by a tiny mind (or Token Window).

Some might ask: since AI is also constrained by Context Windows, will architectural design itself eventually be handed over to AI?

Perhaps. But even on that day, the value of Context Engineering won’t disappear—it will only transfer.

When AI becomes the architect, you’ll still need to describe requirements, constraints, and expectations to it in some language. If your requirements are themselves a “high mutual information” chaos, the AI architect will—just like today’s human architects—get lost in endless context.

And before AI takes over completely, understanding and practicing Context Engineering will be the core competitive advantage of the programming profession. It not only helps you write better code but also makes you the person who can effectively “feed” AI and collaborate efficiently with AI in the era of human-machine partnership.

This is the “first principle” we truly need to master in the AI era.