Blog
Back
AI Engineering Agents

How To Be A World-Class Agentic Engineer

The only framework you need. Strip the dependencies. Own the context. Let the foundation companies do the rest.

Mar 4, 2026 · 8 min read


An AI agent is Claude or Codex given a task and left to work through it on its own — reading files, writing code, running commands, making decisions — without you typing every step. You set the goal. It figures out the path.

That's the promise. The reality for most people is frustration: the agent goes in circles, hallucinates libraries that don't exist, or stops halfway through and calls the job done.

The standard response is to install more stuff. Better harnesses. Smarter memory systems. Longer instruction files. More plugins.

That's exactly the wrong move.

The following is drawn from @systematicls — a hedge fund quant who has been running agents in production (not toy projects) since the early days. What follows is his framework, distilled.

The Problem Is Your Enthusiasm

Most people trying to get more out of agents are the problem. Not the model. Not the harness. Not the terminal. You.

Think about who the most intensive users of agentic systems actually are: engineers at Anthropic and OpenAI, unlimited compute, access to models that haven't shipped yet. If a third-party harness genuinely solved a real, lasting problem, they'd be using it. And then they'd build it directly into the product.

That's exactly what happened. Skills. Memory systems. Sub-agents. Planning. Stop-hooks. They all started as third-party experiments — and once they proved genuinely useful, they became core features of Claude Code and Codex.

If something truly is ground-breaking, it will be incorporated into the base products in due time. Trust me, the foundation companies are flying by.

In Practice

Two developers. One has 14 plugins, a custom harness, and a 26,000-line CLAUDE.md. The other uses bare Claude Code with 40 lines of instructions and two skills files.

The second one ships faster. Because every plugin is another layer of context the agent has to parse before it can start thinking about your actual task.

The conclusion is uncomfortable: if you need an external dependency to do your best work, you're probably solving the wrong problem.

Context Is Everything. Literally.

An AI agent doesn't have infinite memory. It has a context window — think of it like a whiteboard. Everything it needs to know for the current task has to fit on that whiteboard. When the whiteboard is full, quality collapses.

Context bloat is when your agent's whiteboard is covered in things that have nothing to do with the task at hand.

Example

You ask your agent to write a short poem about a forest.

But it's carrying: memory management rules from 26 sessions ago, a warning about a process that hung 71 sessions back, notes about your auth system, API routing logic, and three database schemas.

It starts writing the poem with its whiteboard already half-full of irrelevant information. The poem is average. You blame the model. The model is fine. The whiteboard was dirty.

Think of it like briefing a very smart new employee. If you hand them 400 pages of company history before asking them to write one email, the email will suffer. Give them only what they need for this task. They'll do the job well.

Strip everything that isn't required for the current task. Your agents should enter each task with surgical precision.

Separate Research From Implementation

This is the single highest-leverage habit change you can make, and you can do it today.

Most people give their agent a vague brief and expect a finished product. The agent then has to research, decide, and build — all in one pass, all in one context window. By the time it starts building, half the whiteboard is full of options it considered and rejected.

❌ Vague

"Build an authentication system for the app."

✅ Precise

"Implement JWT authentication with bcrypt-12 password hashing and refresh token rotation with 7-day expiry."

The vague version forces your agent to research what auth systems exist, compare JWT vs session cookies vs OAuth, read documentation on bcrypt salt rounds — and by the time it builds, its context is polluted with every path it didn't take.

The precise version lets it start building immediately. Clean whiteboard. Clean output.

What If You Don't Know the Details Yet?

That's fine. Run a research agent first: "List the three best authentication approaches for a Node.js app with these constraints. Pros, cons, and your recommendation." Read the output. Pick one. Then start a fresh agent session with only the chosen approach in context. Two focused passes beat one confused one every time.

Think of it like a chef. You don't hand them a vague "make something good for 20 people." You decide the menu first — then they execute. The decision and the execution are separate jobs, done separately.

Exploit Sycophancy. Don't Fight It.

Agents are built to agree with you. If you ask for something, they'll deliver it — even if they have to stretch the truth a little. Nobody would use a product that constantly tells them they're wrong. That's a design decision, not a flaw.

Most people treat this as a problem. The right move is to design around it.

The Three-Agent Bug System

Agent 1 — Bug Finder: Tell it you'll award +1 point for low-impact bugs, +10 for critical ones. It will aggressively surface every possible issue, including false positives. This is your superset of all potential bugs.

Agent 2 — Adversarial: Give it the opposite incentive. It earns points for disproving Agent 1's findings, but gets penalized if it's wrong. It'll push back on the list hard, but carefully.

Agent 3 — Referee: Adjudicates between the two. Tell it you have ground truth and it gets scored for accuracy. Whatever it decides, you spot-check.

The result is frighteningly accurate. You didn't fight sycophancy — you used it to create a self-correcting system.

The same principle applies everywhere. An agent told "find all edge cases" will find many false ones. An agent told "disprove these edge cases" will dismiss real ones. A third agent arbitrating between them lands close to truth.

CLAUDE.md Is a Directory, Not a Manifesto

Your CLAUDE.md (or AGENTS.md, or whatever you call it) is where you tell your agent how to behave. Most people stuff it full of everything: personality, backstory, rules for every scenario they've ever encountered, preferences, warnings, notes to self.

It becomes a novel. The agent reads the novel before every task. The whiteboard is half-full before work begins.

❌ Manifesto

"You are a senior engineer who values clean code. Always use TypeScript. Never use var. Prefer functional patterns. When working on auth, remember that we had a bug in March where... When writing tests... When the user asks for a refactor..." [26,000 lines]

✅ Directory

"When working on auth → read SKILLS/auth.md
When writing tests → read SKILLS/testing.md
When refactoring → read RULES/refactor-preferences.md"

The directory version means the agent only loads what's relevant to the current task. Everything else stays off the whiteboard.

Maintenance Tip

As rules and skills accumulate, they start contradicting each other. Performance drops. When that happens, ask your agent to consolidate: "Review all rules and skills, remove contradictions, and ask me for my current preference where things conflict." It'll feel like magic again. Do this every few weeks.

Define the End Before You Start

Agents know how to begin tasks. They do not know how to end them. Left on their own, they'll implement stubs, stop at the first plausible finish line, and declare victory. The task isn't done — but the agent thinks it is.

The fix: write a contract before the session starts.

Example: TASK_CONTRACT.md

Task: Build user authentication flow

Done when:
- [ ] All 14 tests in /tests/auth.test.ts pass
- [ ] Login page screenshot shows error state correctly
- [ ] Password reset email sends in staging environment
- [ ] No TypeScript errors on build

You may NOT mark this task complete until all items above are checked.

Attach a stop-hook that prevents the agent from closing the session until every checkbox is ticked. The agent now has a clear, unambiguous finish line.

Don't run 24-hour marathon sessions — they accumulate context from unrelated work and performance degrades. Instead: one session per contract. Short, focused, fresh whiteboard. An orchestration layer creates new contracts as work emerges and spawns a new session for each. This scales.

The Only Rule That Matters

Start bare-bones. Give the basic CLI a genuine chance before adding anything. Then add your preferences incrementally — each rule earned by friction with a real task, not downloaded from someone else's setup.

Think of it like hiring a brilliant new assistant. You don't hand them a 400-page manual on day one. You work together, correct things as they go wrong, build up a shared understanding over time. The manual writes itself from lived experience.

No agent today is perfect. You can relegate much of the design and implementation to the agents, but you will need to own the outcome.

The engineers getting extraordinary results from agents right now are not the ones with the most sophisticated setups. They're the ones who understood three things: precision beats volume, context is sacred, and the model is almost never the problem.

Strip it down. Get precise. Define the end. That's it.

Continue Reading


Context Is Everything 5 min →
The AI Consumption Trap 4 min →

Source: @systematicls on X · Hedge fund quant, agentic systems builder · Synthesis by Research Hub