Blog

Back

AI Engineering Agents

How To Be A World-Class Agentic Engineer

The only framework you need. Strip the dependencies. Own the context. Let the foundation companies do the rest.

Mar 4, 2026 · 8 min read

An AI agent is Claude or Codex given a task and left to work through it on its own — reading files, writing code, running commands, making decisions — without you typing every step. You set the goal. It figures out the path.

That's the promise. The reality for most people is frustration: the agent goes in circles, hallucinates libraries that don't exist, or stops halfway through and calls the job done.

The standard response is to install more stuff. Better harnesses. Smarter memory systems. Longer instruction files. More plugins.

That's exactly the wrong move.

The following is drawn from @systematicls — a hedge fund quant who has been running agents in production (not toy projects) since the early days. What follows is his framework, distilled.

The Problem Is Your Enthusiasm

Most people trying to get more out of agents are the problem. Not the model. Not the harness. Not the terminal. You.

Think about who the most intensive users of agentic systems actually are: engineers at Anthropic and OpenAI, unlimited compute, access to models that haven't shipped yet. If a third-party harness genuinely solved a real, lasting problem, they'd be using it. And then they'd build it directly into the product.

That's exactly what happened. Skills. Memory systems. Sub-agents. Planning. Stop-hooks. They all started as third-party experiments — and once they proved genuinely useful, they became core features of Claude Code and Codex.

If something truly is ground-breaking, it will be incorporated into the base products in due time. Trust me, the foundation companies are flying by.

In Practice

Two developers. One has 14 plugins, a custom harness, and a 26,000-line CLAUDE.md. The other uses bare Claude Code with 40 lines of instructions and two skills files.

The second one ships faster. Because every plugin is another layer of context the agent has to parse before it can start thinking about your actual task.

The conclusion is uncomfortable: if you need an external dependency to do your best work, you're probably solving the wrong problem.

Context Is Everything. Literally.

An AI agent doesn't have infinite memory. It has a context window — think of it like a whiteboard. Everything it needs to know for the current task has to fit on that whiteboard. When the whiteboard is full, quality collapses.

Context bloat is when your agent's whiteboard is covered in things that have nothing to do with the task at hand.

Example

You ask your agent to write a short poem about a forest.

But it's carrying: memory management rules from 26 sessions ago, a warning about a process that hung 71 sessions back, notes about your auth system, API routing logic, and three database schemas.

It starts writing the poem with its whiteboard already half-full of irrelevant information. The poem is average. You blame the model. The model is fine. The whiteboard was dirty.

Think of it like briefing a very smart new employee. If you hand them 400 pages of company history before asking them to write one email, the email will suffer. Give them only what they need for this task. They'll do the job well.

Strip everything that isn't required for the current task. Your agents should enter each task with surgical precision.

Separate Research From Implementation

This is the single highest-leverage habit change you can make, and you can do it today.

Most people give their agent a vague brief and expect a finished product. The agent then has to research, decide, and build — all in one pass, all in one context window. By the time it starts building, half the whiteboard is full of options it considered and rejected.

❌ Vague

"Build an authentication system for the app."

✅ Precise

"Implement JWT authentication with bcrypt-12 password hashing and refresh token rotation with 7-day expiry."

The vague version forces your agent to research what auth systems exist, compare JWT vs session cookies vs OAuth, read documentation on bcrypt salt rounds — and by the time it builds, its context is polluted with every path it didn't take.

The precise version lets it start building immediately. Clean whiteboard. Clean output.

What If You Don't Know the Details Yet?

That's fine. Run a research agent first: "List the three best authentication approaches for a Node.js app with these constraints. Pros, cons, and your recommendation." Read the output. Pick one. Then start a fresh agent session with only the chosen approach in context. Two focused passes beat one confused one every time.

Think of it like a chef. You don't hand them a vague "make something good for 20 people." You decide the menu first — then they execute. The decision and the execution are separate jobs, done separately.

Exploit Sycophancy. Don't Fight It.

Agents are built to agree with you. If you ask for something, they'll deliver it — even if they have to stretch the truth a little. Nobody would use a product that constantly tells them they're wrong. That's a design decision, not a flaw.

Most people treat this as a problem. The right move is to design around it.

The Three-Agent Bug System

Agent 1 — Bug Finder: Tell it you'll award +1 point for low-impact bugs, +10 for critical ones. It will aggressively surface every possible issue, including false positives. This is your superset of all potential bugs.

Agent 2 — Adversarial: Give it the opposite incentive. It earns points for disproving Agent 1's findings, but gets penalized if it's wrong. It'll push back on the list hard, but carefully.

Agent 3 — Referee: Adjudicates between the two. Tell it you have ground truth and it gets scored for accuracy. Whatever it decides, you spot-check.

The result is frighteningly accurate. You didn't fight sycophancy — you used it to create a self-correcting system.

The same principle applies everywhere. An agent told "find all edge cases" will find many false ones. An agent told "disprove these edge cases" will dismiss real ones. A third agent arbitrating between them lands close to truth.

CLAUDE.md Is a Directory, Not a Manifesto

Your CLAUDE.md (or AGENTS.md, or whatever you call it) is where you tell your agent how to behave. Most people stuff it full of everything: personality, backstory, rules for every scenario they've ever encountered, preferences, warnings, notes to self.

It becomes a novel. The agent reads the novel before every task. The whiteboard is half-full before work begins.

❌ Manifesto

"You are a senior engineer who values clean code. Always use TypeScript. Never use var. Prefer functional patterns. When working on auth, remember that we had a bug in March where... When writing tests... When the user asks for a refactor..." [26,000 lines]

✅ Directory

"When working on auth → read SKILLS/auth.md
When writing tests → read SKILLS/testing.md
When refactoring → read RULES/refactor-preferences.md"

The directory version means the agent only loads what's relevant to the current task. Everything else stays off the whiteboard.

Rules encode preferences. See your agent do something you don't like? Add one line. It won't do it again.
Skills encode recipes. Have a specific way you want something done? Write the approach as a skill file. The agent follows it exactly.
CLAUDE.md just points to the right rule or skill. That's its only job.

Maintenance Tip

As rules and skills accumulate, they start contradicting each other. Performance drops. When that happens, ask your agent to consolidate: "Review all rules and skills, remove contradictions, and ask me for my current preference where things conflict." It'll feel like magic again. Do this every few weeks.

Define the End Before You Start

Agents know how to begin tasks. They do not know how to end them. Left on their own, they'll implement stubs, stop at the first plausible finish line, and declare victory. The task isn't done — but the agent thinks it is.

The fix: write a contract before the session starts.

Example: TASK_CONTRACT.md

Task: Build user authentication flow

Done when:
- [ ] All 14 tests in /tests/auth.test.ts pass
- [ ] Login page screenshot shows error state correctly
- [ ] Password reset email sends in staging environment
- [ ] No TypeScript errors on build

You may NOT mark this task complete until all items above are checked.

Attach a stop-hook that prevents the agent from closing the session until every checkbox is ticked. The agent now has a clear, unambiguous finish line.

Don't run 24-hour marathon sessions — they accumulate context from unrelated work and performance degrades. Instead: one session per contract. Short, focused, fresh whiteboard. An orchestration layer creates new contracts as work emerges and spawns a new session for each. This scales.

The Only Rule That Matters

Start bare-bones. Give the basic CLI a genuine chance before adding anything. Then add your preferences incrementally — each rule earned by friction with a real task, not downloaded from someone else's setup.

Think of it like hiring a brilliant new assistant. You don't hand them a 400-page manual on day one. You work together, correct things as they go wrong, build up a shared understanding over time. The manual writes itself from lived experience.

No agent today is perfect. You can relegate much of the design and implementation to the agents, but you will need to own the outcome.

The engineers getting extraordinary results from agents right now are not the ones with the most sophisticated setups. They're the ones who understood three things: precision beats volume, context is sacred, and the model is almost never the problem.

Strip it down. Get precise. Define the end. That's it.

AI agent je Claude alebo Codex, ktorý dostane úlohu a sám si nájde cestu k jej splneniu — číta súbory, píše kód, spúšťa príkazy, robí rozhodnutia, bez toho aby ste vy zadávali každý krok. Vy nastavíte cieľ. On si nájde cestu.

To je prísľub. Pre väčšinu ľudí je realita frustrujúca: agent chodí v kruhoch, vymýšľa knižnice, ktoré neexistujú, alebo sa zastaví v polovici a vyhlási prácu za dokončenú.

Štandardná reakcia je nainštalovať viac vecí. Lepší harness. Sofistikovanejší pamäťový systém. Dlhší súbor s inštrukciami. Viac pluginov.

To je presne nesprávny krok.

Nasledujúce pochádza od @systematicls, kvantitatívneho analytika hedžového fondu, ktorý agenty prevádzkuje v produkcii od ich počiatkov. Čo nasleduje, je jeho framework.

Problémom je vaša nadšenosť

Väčšina ľudí, ktorí chcú od agentov viac, je sama problémom. Nie model. Nie harness. Nie terminál. Vy.

Zamyslite sa, kto sú skutočne najintenzívnejší používatelia agentických systémov: inžinieri v Anthropic a OpenAI, neobmedzený výpočtový výkon, prístup k modelom, ktoré ešte nevyšli. Keby nejaký nástroj tretej strany naozaj riešil reálny, trvalý problém, používali by ho. A potom by ho zabudovali priamo do produktu.

Presne to sa aj stalo. Zručnosti. Pamäťové systémy. Sub-agenti. Plánovanie. Stop-hooky. Všetky začali ako experimenty tretích strán — a keď sa ukázali naozaj užitočné, stali sa základnými funkciami Claude Code a Codex.

Ak je niečo naozaj prelomové, základné spoločnosti to do svojich produktov zapracujú. Verte mi, frontové spoločnosti letia dopredu neuveriteľnou rýchlosťou.

V praxi

Dvaja vývojári. Jeden má 14 pluginov, vlastný harness a 26 000-riadkový CLAUDE.md. Druhý používa základný Claude Code so 40 riadkami inštrukcií a dvoma súbormi zručností.

Druhý dodáva rýchlejšie. Každý plugin je ďalšia vrstva kontextu, ktorú musí agent spracovať pred tým, než vôbec začne premýšľať o vašej skutočnej úlohe.

Záver je nepríjemný: ak na svoju najlepšiu prácu potrebujete externú závislosť, pravdepodobne riešite nesprávny problém.

Kontext je všetko. Doslovne.

AI agent nemá nekonečnú pamäť. Má kontextové okno — predstavte si ho ako tabuľu. Všetko, čo potrebuje vedieť pre aktuálnu úlohu, sa musí zmestiť na túto tabuľu. Keď je tabuľa plná, kvalita klesá.

Preťaženie kontextu nastáva, keď je tabuľa vášho agenta pokrytá vecami, ktoré nemajú nič spoločné s aktuálnou úlohou.

Príklad

Požiadate agenta, aby napísal krátku báseň o lese.

No nesie so sebou: pravidlá správy pamäte z 26 sedení dozadu, varovanie o procese, ktorý zamrzol pred 71 sedeniami, poznámky o vašom autentifikačnom systéme, logiku smerovania API a tri schémy databázy.

Báseň začína písať s tabuľou z polovice plnou nepotrebných informácií. Báseň je priemerná. Obviníte model. Model je v poriadku. Tabuľa bola špinavá.

Predstavte si, že zadávate úlohu novému, veľmi inteligentnému kolegovi. Ak mu pred napísaním jedného e-mailu dáte 400 strán histórie spoločnosti, e-mail bude trpieť. Dajte mu len to, čo potrebuje pre túto úlohu. Odvede dobrú prácu.

Oddeľte výskum od implementácie

Toto je zmena s najvyššou pákou, ktorú môžete urobiť hneď dnes.

Väčšina ľudí dáva agentovi vágny brief a očakáva hotový produkt. Agent musí skúmať, rozhodovať a stavať — všetko v jednom prechode, v jednom kontextovom okne. Keď začne stavať, je tabuľa z polovice plná možností, ktoré zvažoval a odmietol.

❌ Vágne

"Postav autentifikačný systém pre aplikáciu."

✅ Presné

"Implementuj JWT autentifikáciu s bcrypt-12 hashovaním hesiel a rotáciou refresh tokenov s expiráciou 7 dní."

Vágna verzia nutí agenta skúmať, aké autentifikačné systémy existujú, porovnávať možnosti, čítať dokumentáciu. Keď začne stavať, kontext je znečistený každou cestou, ktorou nešiel.

Presná verzia mu umožní okamžite začať stavať. Čistá tabuľa. Čistý výstup.

Čo ak detaily ešte nepoznáte?

Nevadí. Najprv spustite výskumného agenta: „Vypíš tri najlepšie prístupy k autentifikácii pre Node.js aplikáciu s týmito obmedzeniami. Výhody, nevýhody a odporúčanie." Prečítajte výstup. Vyberte si. Potom spustite nové sedenie agenta iba so zvoleným prístupom v kontexte. Dva sústredené prechody porazí jeden zmätený.

Predstavte si kuchára. Nedáte mu vágne „uvar niečo dobré pre 20 ľudí." Najprv rozhodnete o menu — potom ho realizuje. Rozhodnutie a realizácia sú oddelené úlohy, robené oddelene.

Využite sycofanciu. Nebojujte s ňou.

Agenti sú navrhnutí tak, aby s vami súhlasili. Ak niečo požiadate, dodajú — aj keby museli trochu natiahnuť pravdu. Nikto by nepoužíval produkt, ktorý mu neustále hovorí, že sa mýli. To je dizajnové rozhodnutie, nie nedostatok.

Väčšina ľudí to vníma ako problém. Správny postup je navrhnúť systém, ktorý to využíva.

Trojagentový systém na hľadanie chýb

Agent 1 — Hľadač chýb: Povedzte mu, že za chyby s nízkym dopadom dostane +1 bod, za kritické +10. Bude agresívne surfovať každý možný problém vrátane falošných poplachov. Toto je nadmnožina všetkých potenciálnych chýb.

Agent 2 — Adversariálny: Dajte mu opačnú motiváciu. Získava body za vyvracanie nálezov Agenta 1, no dostáva penalizáciu, ak sa mýli. Bude tvrdo, no opatrne odmietať chyby.

Agent 3 — Rozhodca: Posudzuje medzi oboma. Výsledok je prekvapivo presný. Nebojovali ste so sycofanciou — použili ste ju na vytvorenie samokorigujúceho systému.

CLAUDE.md je adresár, nie manifest

Váš CLAUDE.md je miesto, kde agentovi hovoríte, ako sa správať. Väčšina ľudí ho napĺňa všetkým: osobnosťou, históriou, pravidlami pre každý scenár, preferenciami, varovaniami. Stane sa románom. Agent číta román pred každou úlohou. Tabuľa je z polovice plná skôr, než sa začne pracovať.

❌ Manifest

"Si senior inžinier, ktorý si cení čistý kód. Vždy používaj TypeScript. Nikdy nepoužívaj var. Pri autentifikácii pamätaj, že v marci sme mali chybu kde..." [26 000 riadkov]

✅ Adresár

"Pri autentifikácii → čítaj SKILLS/auth.md
Pri písaní testov → čítaj SKILLS/testing.md
Pri refaktoringu → čítaj RULES/refactor.md"

Pravidlá kódujú preferencie. Vidíte agenta robiť niečo, čo sa vám nepáči? Pridajte jeden riadok. Nebude to robiť znova.
Zručnosti kódujú recepty. Máte špecifický spôsob, ako chcete niečo urobiť? Napíšte postup ako súbor zručnosti. Agent ho bude deterministicky nasledovať.
CLAUDE.md len odkazuje na správne pravidlo alebo zručnosť. To je jeho jediná úloha.

Tip na údržbu

Keď sa pravidlá a zručnosti hromadia, začínajú si protirečiť. Výkon klesá. Keď sa to stane, požiadajte agenta o konsolidáciu: „Skontroluj všetky pravidlá a zručnosti, odstráň rozpory a opýtaj sa ma na moju aktuálnu preferenciu, kde si veci odporujú." Opäť to bude pôsobiť ako mágia.

Definujte koniec skôr, než začnete

Agenti vedia, ako začať úlohy. Nevedia, ako ich ukončiť. Ponechaní sami sebe implementujú strely, zastavia sa pri prvej pravdepodobnej cieľovej čiare a vyhlásia víťazstvo. Úloha nie je hotová — ale agent si myslí, že áno.

Riešenie: napíšte kontrakt pred začatím sedenia.

Príklad: TASK_CONTRACT.md

Úloha: Postaviť autentifikačný systém

Hotovo keď:
- [ ] Všetkých 14 testov v /tests/auth.test.ts prebehne
- [ ] Screenshot prihlasovacej stránky ukazuje chybový stav správne
- [ ] E-mail na obnovu hesla sa odošle v staging prostredí
- [ ] Žiadne TypeScript chyby pri buildi

Sedenie NESMIEŠ ukončiť, kým nie sú všetky body zaškrtnuté.

Zabudujte stop-hook, ktorý agentovi bráni zatvoriť sedenie, kým nie je každá položka odškrtnutá. Agent má teraz jasnú, jednoznačnú cieľovú čiaru.

Nerobte 24-hodinové maratónske sedenia. Jedno sedenie na jeden kontrakt. Kratšie, sústredené, čerstvá tabuľa. Orchestračná vrstva vytvára nové kontrakty, ako sa objavuje práca, a pre každý spúšťa nové sedenie.

Jediné pravidlo, na ktorom záleží

Začnite s minimom. Dajte základnému CLI skutočnú šancu skôr, než čokoľvek pridáte. Potom pribúdajte preferencie postupne, každé pravidlo zaslúžené trením s reálnou úlohou, nie stiahnuté z cudzieho nastavenia.

Predstavte si, že prijímate brilantného nového asistenta. Prvý deň mu nedáte 400-stranový manuál. Pracujete spolu, opravujete veci, keď sa pokazí, budujete spoločné pochopenie postupne. Manuál sa píše sám zo živej skúsenosti.

Žiadny agent dnes nie je dokonalý. Dizajn a implementáciu môžete delegovať na agentov, ale výsledok musíte vlastniť vy.

Inžinieri dosahujúci s agentmi mimoriadne výsledky nemajú najsofistikovanejšie nastavenia. Sú to tí, ktorí pochopili tri veci: presnosť prekonáva objem, kontext je posvätný a model takmer nikdy nie je problémom.

Zjednodušte. Buďte presní. Definujte koniec. To je všetko.

Context Is Everything 5 min →

The AI Consumption Trap 4 min →

Source: @systematicls on X · Hedge fund quant, agentic systems builder · Synthesis by Research Hub