Multi-Agent Systems: From Division of Labor to Division of Attention

Why do we need multiple AI agents when one LLM can do everything? The answer isn't about skills—it's about memory isolation and context sharding. Exploring how Multi-Agent systems solve the attention problem through context separation.

Multi-Agent Systems: From Division of Labor to Division of Attention

In the current discourse surrounding Multi-Agent systems, we are inundated with “role-playing”: Planners, Coders, Reviewers, Writers…

These roles are meticulously assigned like stations on an assembly line. Professionals do professional things; this is how human teams collaborate, and specialized division of labor has become the default truth for efficiency.

But if we look one layer deeper: What problem does human division of labor actually solve?


I. Overrated “Skills,” Underrated “Memory”

Let us first acknowledge a fact that makes many Agent frameworks awkward: AI is actually “omnipotent.”

Give an LLM a code interpreter, and it can write code; give it a browser, and it can research; give it a calculator, and it can do math. The so-called “Coder Agent” and “Researcher Agent” have no fundamental difference in underlying capability—they are the same large model, merely stuffed into different System Prompts.

This is distinct from human teams.

Humans require division of labor because of the rigid constraint of learning time: a single person cannot simultaneously master cardiac surgery and criminal defense. We use “skills” to slice responsibilities out of necessity—because the boundaries of skills are rigid and take years or decades to cross.

But for AI, skill boundaries are elastic. You only need to switch the prompt, and a “criminal lawyer” instantly becomes a “cardiac surgery consultant” (though, naturally, you wouldn’t let it operate on you).

So the question arises: Since skills can be switched instantly, why do we still need multiple Agents?

The answer lies elsewhere: Not in skills, but in memory.


II. Memory is Identity

Let me ask you a question: Who is more valuable to a company? A “veteran employee” who has been there for five years, or an “industry expert” who just joined?

From a skill stack perspective, the industry expert might be stronger—newer methodologies, broader horizons, a shinier resume.

But anyone who has worked in a real organization knows: The veteran is valuable not because they know more skills, but because they witnessed how the system grew.

They know that the seemingly stupid if-else block is there to patch a production incident from three years ago; they remember why that refactor was abandoned; they know which landmines to avoid with a specific client.

These things cannot be written into documentation or transferred through training. They are incompressible experiences.

Skills can be copy-pasted, but “lessons learned” cannot be transferred.

What defines an expert is never their Skill Set, but their Memory.

This analogy applies equally to Agents.


III. Context Sharding: The True Value of Multi-Agent Systems

Returning to our question: Since one LLM can do everything, why split it into multiple Agents?

The answer is: To isolate context.

“Context” here refers to the Agent’s Context Window—all the information it can “see” and “remember” while processing the current task.

In my previous article, “First Principles of Software Design,” I argued: Whether for the human brain or AI models, processing power is limited by the context window. Even with models claiming million-token windows, research shows that “effective context” is far smaller—attention dilutes, and noise accumulates.

This leads to the true value of multi-agent systems: Context Sharding.

Imagine:

  • Project Agent: Its memory contains only the codebase, PR history, architecture docs, and technical decisions.
  • Life Agent: Its memory contains only your schedule, family preferences, health data, and todos.

Their “capabilities” are identical—both can write code, browse the web, and plan. But their “attention focus” is physically isolated.

When you ask the Project Agent “How do I implement this?”, it retrieves project history and offers a solution matching the code style—without hallucinating your shopping list.

When you ask the Life Agent “Where should I take my daughter this weekend?”, it retrieves family preferences—without analyzing the “separation of concerns” of the playground using Clean Architecture terminology.

Isolating memory is isolating noise.


IV. From “Division of Labor” to “Division of Attention”

Human team collaboration is built on a hypothesis: Everyone’s brain capacity is limited and cannot operate concurrently.

So we need meetings to sync information, documents to pass context, and “liaisons” to translate between departments. The essence of organizational structure is an information routing scheme under limited bandwidth.

But AI is different. AI can be concurrent, cloned, and instantly switched.

So why do we need “multiple” Agents instead of one “Super Brain” holding all information?

The reason is the same as for humans: Attention is a scarce resource.

No matter how large the context window, when information volume exceeds a threshold, reasoning quality drops. Not because it “can’t find” the info, but because it “can’t use it well.” It’s like trying to have a deep conversation in a noisy bar—even with perfect hearing, you can’t focus.

The essence of Multi-Agent is not Division of Labor, but Division of Attention:

  • Slice the massive world of information into highly cohesive “context blocks.”
  • Have each Agent guard one block, ensuring the highest attention density within that block.
  • When cross-block collaboration is needed, exchange information via message passing (not shared memory).

This is like the human brain’s “Focus Mode”: When you are deep in coding, information about “what’s for dinner” isn’t forgotten—it’s temporarily “masked.” The Multi-Agent architecture helps AI achieve the same state.


V. Chat: The Natural Collaboration Interface

If every Agent is a “Context Island,” how do they collaborate?

The answer: Through message passing, not shared memory.

In plain English: Chat. You can have a chat window with each Agent, or pull several Agents into a group for a specific task. They don’t share context, but they pass necessary information through you—the human—or through direct messages.

You might ask: What if I need an Agent to access another Agent’s memory?

The answer: You need to explicitly “carry” it.

This looks like a “regression”—why not just connect all memories? But this is the core of the design: Forcing explicit information transfer forces you to think about “what information is truly necessary.”

Just like microservices use APIs instead of shared databases. The limits of network bandwidth force you to define clearly “what can be passed, and what shouldn’t.”

IM becomes the bus for exchanging information between different context blocks.


VI. Conclusion: Your Team of “Digital Twins”

The personal AI assistant of the future will not be an omnipotent “god.”

It will be a team—composed of multiple versions of “yourself,” each remembering only one slice of your life.

  • The self who is hyper-focused on code.
  • The self who is a thoughtful father.
  • The self who is an imaginative researcher.

They share capabilities but guard different memories. They don’t need “meetings” to sync because necessary cross-domain information flows naturally through your conversations.

We don’t need to assemble a team of “strangers”—those stations labeled Planner, Reviewer, Writer.

What we need is ourselves in different states.

Let memory define the role, and let messages connect the avatars.