Agentic Engineering is not a tool. It is a way of working.

What have we learned after a lot of workshops, PoC's and projects.

A pattern keeps showing up when we look back at the workshops, projects and operational teams we've worked with over the past couple of years. AI is being adopted at every layer. But most of the "velocity" is theatre. Individual engineers ship technical debt faster, and everyone downstream scrambles to deal with more of a problem that was already hard.

The tools are being used. The way of working with them is not.

Agents and AI are not faster autocomplete. They are not a better Stack Overflow either. Used properly, they change who does what, how decisions get made, and what a developer's, engineer's or operator's day actually looks like.

What actually changes when you go agentic

This is on the same scale as the move to cloud, or virtualization before it. It touches the whole delivery chain: how developers write code, how platform teams build cloud environments, how operations keeps workloads running.

It is not a subtle shift. It requires:

A mindset built around quality, not quantity of output
Real skill in prompt engineering, context design and output validation
Workflows that assume agents are doing large parts of the execution
Observability and review patterns that can keep up with the new pace
An agentic-first default, with humans added into the loop where they matter

Teams that treat agentic AI as a productivity add-on will get marginal gains at best. Organisations that redesign around it see a different kind of result entirely.

The mindset shift

The uncomfortable part is that most of the work isn't technical. It's cultural.

For years we optimised for individual output. Lines of code, tickets closed, incidents resolved. That whole metric set quietly breaks the moment one engineer with an agent can produce a week of code in an afternoon. The bottleneck moves. It isn't typing speed anymore, or how much a single person can hold in their head. It's review capacity, test coverage, and whether anyone actually understands what just got merged.

The engineer's job changes too. Writing the code used to be most of the work. Now the work is framing the problem, setting the constraints, and checking the result. The typing is a commodity. Senior engineers become less like the best coders and more like editors with taste and a low tolerance for plausible-looking garbage.

Platform teams have to catch up fast. If agents are doing real work, they need sandboxes, scoped credentials, audit trails and cost ceilings. Operations can't stay reactive either. Agent-generated incidents compound quickly, and the team that built its muscle around responding to human-paced failures will struggle.

And then there's the awkward one: if agents write the boilerplate that juniors used to learn from, how do juniors build judgment? We don't have a clean answer to this. Neither does anyone else, honestly. Anyone claiming they do is selling something.

None of this is comfortable, and most of it can't be bought. You can buy the tools. You can't buy the habits.

"Context engineering... is to carefully and skillfully construct the right context to get great results from LLMs."

— Simon Willison, June 2025

The organisations we see getting real value out of agentic AI aren't the ones with the biggest budgets or the loudest announcements. They're the ones where leadership accepted early that roles, metrics and the definition of "done" all had to move. Everyone else is measuring the old things, wondering why the numbers look fine but the work feels worse.

New skills to use agentic

The skills that mattered five years ago still matter. Reading stack traces, knowing your language cold, navigating a big codebase - all still useful. But there's a new set on top, and most teams are underestimating them.

Context engineering is the big one, and it's not the same as prompt engineering. Prompts are the surface. Context is what the agent can see: the files, the docs, the tools, the repo conventions, what "done" means on your team. Most bad agent output is a context problem. Teams blame the model and buy a more expensive one when the fix was handing the agent the README and three example PRs.

Writing specs an agent can execute. Humans are forgiving readers. Agents are not. "Fix the login bug" works for a teammate who's been around a year. An agent will hallucinate something confident and break two other things on the way past. Constraints, edge cases, out-of-scope - all of it has to be on the page now.

Reviewing output you don't trust. Agents are wrong in subtle, confident ways. A deleted test. A hallucinated API. A function that handles every case except the one that matters. Sharper review habits and automated checks aren't optional, and neither is the discipline not to rubber-stamp a long diff at 4pm on a Thursday.

Knowing when to stop the agent. They'll happily keep going. They'll refactor things you didn't ask about, "fix" tests by deleting them, spin for an hour on something impossible. Killing a run that's going sideways is a skill, and most people learn it the expensive way.

Observability from day one. Every run traced, every tool call logged, every decision auditable. Teams that bolt this on later end up debugging by vibes.

Taste. When the agent can produce ten plausible implementations in the time it used to take to write one, the thing that matters is whether you can tell which one is any good. Hard to teach, harder to measure, and the main thing separating real leverage from shipping mediocre code faster.

None of these are bolt-on skills. You can't send someone on a two-day course and tick the box. They build the way engineering skills always have: on real work, by getting burned, and paying attention to what actually happened.

Replace, don't repair workflows

"We shape our tools and thereafter our tools shape us."

— Marshall McLuhan, Understanding Media (1964)

Most teams try to slot agents into the workflows they already have. Same tickets, same review cadence, same change process, just with an agent somewhere in the middle. It almost always underperforms, and then people conclude agents aren't ready. They are. The workflow isn't.

Existing workflows - on both the development side and the infrastructure side - were designed around human constraints. A developer holds two or three tasks in their head at once. An ops engineer responds to an alert in minutes, not seconds. Change advisory boards meet weekly. Platform tickets queue up and get picked off in order. All of that is shaped by the fact that humans are slow, forgetful, and parallelise badly.

Agents break every one of those assumptions. They run ten things in parallel. They finish in minutes. They have no sense of scope unless you give them one, and they don't care whether it's 4pm on a Friday. Dropping that into a process built for humans is like putting a jet engine on a bicycle and wondering why the frame keeps snapping.

On the development side Tickets have to be rewritten as specs. Constraints, edge cases, acceptance criteria, things explicitly out of scope. "Fix the login bug" is fine for a teammate. An agent will hallucinate something and break two other things on the way past.

Review moves earlier. You review the plan before the agent runs, not just the diff after, because by the time the diff exists there may be thirty of them. Small PRs become the default - ten agent-generated changes are easier to reason about than one giant one.

Standups change shape. "What are you working on" turns into "what are your agents working on, which runs failed, and what did we learn." The engineer's job shifts from typing code to framing problems and editing output with taste.

On the infrastructure and operations side This is the side most teams underestimate. If agents are doing real work, the platform has to evolve to meet them. Sandboxes with scoped credentials. Cost ceilings that actually cut off runaway runs. Audit trails a human can read when something goes wrong at 2am. Kill switches you can hit without filing a ticket.

Operations can't stay reactive. Agent-generated incidents compound fast - one bad deploy at human pace is a problem, one bad deploy at agent pace, repeated across ten parallel runs, is a postmortem that takes a week. The muscle shifts from responding to incidents toward designing the review gates, rollback patterns and observability that stop incidents from compounding in the first place.

Change management has to follow. Weekly CABs don't work when agents are proposing infrastructure changes continuously. Either the process gets faster and more automated, or it becomes the bottleneck the whole organisation quietly routes around.

The uncomfortable part A lot of the processes teams are proud of - on both sides - are there to compensate for humans being slow and forgetful. When the slowness goes away, the process goes from helpful to in-the-way. Keeping it out of habit is how you get the worst of both worlds: agents bottlenecked by rituals designed for a pace that no longer applies, and humans exhausted trying to hand-operate a flow of changes that was never meant for a human to hand-operate.

Repair the workflow and you get a faster version of what you already had. Replace it, across development and operations together, and you get something that actually works differently.

Platform and operational teams

For a long time the job was automation. Terraform, pipelines, scripted runbooks - anything that got humans out of the loop. That work mattered, and most platform teams are rightly proud of it. But automation has a ceiling. It does what you told it to do, and nothing the script didn't already think of.

Agentic is the step after. It changes the job in two directions at once. Platform and operations teams have to support agents that other teams are running, and they have to start using agents on their own work. Most teams are underestimating both sides, though the second one is the one that tends to get postponed indefinitely.

Supporting agents the rest of the company is running If developers are shipping with agents, the platform underneath has to be built for it. Sandboxes with scoped credentials, not one shared service account that can reach everything. Cost ceilings that cut off a runaway loop before it bills you for a small car. Audit trails a human can actually read at 2am without grepping through JSON. Kill switches that don't need a ticket.

Access patterns have to change. Agents aren't humans and shouldn't be given human credentials. Short-lived scoped tokens, proper identity for non-human actors, per-run permissions - this isn't exotic stuff anymore. Any platform still handing out long-lived keys to agents is a breach that's already been written, just not yet filed.

Observability stops being optional. If the platform can't tell you what an agent did last Tuesday, what it tried, what it touched, what it cost you, then you're debugging by vibes.

Operations has to evolve with it Ops teams built their muscle around human-paced failures. Deploy, pager, incident, war room. That model bends quickly once agents are deploying continuously and a bad change can replicate across parallel runs before anyone's phone buzzes.

So the work shifts earlier. Review gates in the pipeline. Rollback patterns that can stop an agent mid-loop. Circuit breakers that don't need a human to pull them. That's engineering work, not reactive firefighting, which is partly why ops and platform teams are blurring into each other.

Change management needs the same treatment. Weekly CABs don't hold up when infrastructure changes are being proposed continuously. Most of the gating has to move to policy-as-code and automated checks, with human review saved for the things that actually warrant it. Anything else becomes the thing people quietly route around.

And then - use it yourself This is the part platform and ops teams keep putting off. You built the sandboxes. You wired up the observability. You hardened the credentials for everyone else. Now use agents on your own backlog.

Agents are genuinely good at the work platform teams lose hours to. Writing Terraform modules. Keeping runbooks current. Triaging alert floods. Correlating logs across three systems when something weird is happening. Generating boilerplate for new service templates. None of that is why anyone joined a platform team, and there's no prize for doing it slowly.

Operations has its own version. First-pass alert triage. Drafting the timeline on a postmortem so the human writes the interesting parts instead. Suggesting rollback steps. Spotting patterns across incidents that a tired on-call at 3am is going to miss.

The teams getting this right run agents on their own platform the same way they expect developers to run them on application code - with guardrails, observability and review. It's the muscle they already have. They're just pointing it at themselves.

The uncomfortable part Automation got some platform and ops teams a long way. It also made it easy to confuse "we automated it" with "we're done." Agentic doesn't respect that comfort. The platform has to evolve. The operating model has to evolve. And the team itself has to start working differently, not just enabling everyone else to.

Platform teams that figure this out become the thing that lets the rest of the organisation move faster. The ones that don't end up as the quiet bottleneck.

Starting from agentic first

The mindset is the same one we went through with cloud. For years, new projects defaulted to on-prem and cloud was the thing you had to argue for. Then at some point the default flipped. New systems started cloud-first, and suddenly it was on-prem that needed the justification. Teams that flipped early pulled ahead. The ones that waited spent the next few years apologising for legacy choices in architecture reviews.

Agents are sitting at that same inflection now. Most organisations still default to "a human does this" and bolt agents on where it looks convenient. In our experience that's the wrong starting position.

Agentic-first really just means changing the first question you ask when something new lands on the roadmap. Instead of "how do we staff this," the question becomes "can an agent do this, and where does a human belong in the loop." That sounds like a small rhetorical change. It isn't.

Once you make that the default, a lot of downstream decisions quietly change shape. A new internal tool stops being designed around a person clicking through a UI and starts being designed as an API an agent can call, with a UI layered on for the people who still need one. Runbooks stop being prose written for a sleepy on-call at 3am and start looking more like specs an agent can execute, with humans approving the destructive bits. New services get stood up on the assumption that agents will maintain large parts of them, so the guardrails and the observability show up from day one instead of being bolted on after the first incident.

Humans stay in the loop. The point is that the loop is deliberate. People belong where judgment actually matters - anything irreversible, anything that touches money or safety, anything the organisation would genuinely regret later if it breaks. They don't belong in the loop out of habit. A human rubber-stamping agent output at the end of a pipeline isn't adding safety. They're adding latency, and quietly convincing themselves they're still in control.

The teams that resist this tend to end up in the worst version of both worlds. They've adopted agents, but only in narrow pockets. The default is still human. So they pay the full cost of running agents - the platform work, the observability, the guardrails - without getting much of the leverage. Then they look at the results, shrug, and decide the technology isn't ready. The technology is fine. The starting position is the problem.

Cloud-first was never about running everything in the cloud. It was about which answer had to be justified. Agentic-first is the same kind of move. Start from "an agent does this," and pull the human back in wherever it genuinely matters - which, for the record, is a lot of places. Just fewer than most organisations currently assume.

So what does all of this mean

Well it means many different things to different people, I will give you 3 takeaways.

First - Agentic is here and it's bigger than it looks. There's a reflex in most organisations to file this alongside the last few hype cycles and wait it out, and honestly I get it - a lot of the things called "transformational" over the last five years weren't. This one is. It changes who does the work, it changes how decisions get made, and it changes what a week inside an engineering team actually looks like. The last time the ground moved this much was cloud, and before that virtualization. Both took about a decade to fully land. This one is moving faster, and the teams working it out now are going to be a couple of years ahead of the ones still debating whether it's real.

Second - and this is the one most organisations get wrong: the hard part isn't the technology. The tools are sitting right there. A credit card and a weekend will get most teams to something that works. What they don't have, and can't buy, is the operating model around it. Scoping work for agents. Reviewing output you don't trust on instinct. Metrics that still mean something when an agent wrote half the diff. Deciding what "done" actually means now. That's culture, and habit, and the way people have worked for years, and those things don't move because there's a new budget line. They move because leadership accepts early that they have to, and then does the unglamorous work of changing how teams operate. Most implementations and programmes quietly stall right there.

Third, this isn't a developer problem. It's not really a platform problem either, or an ops problem. It's all of them happening at once, and the chain doesn't work if any single link stays still. A dev team going agentic on top of a platform still handing out long-lived credentials is a breach with a date on it. A platform team shipping gorgeous sandboxes while ops stays reactive just means a small incident becomes a large one while everyone watches. An ops team tightening review gates inside a company that still runs weekly CABs becomes a bottleneck people learn to route around. I've seen all three of those in the last six months, sometimes in the same organisation. No single team wins this one on their own, and that's actually the good news, because it means no single team has to carry it either.

"The autonomy slider — the choice on every task isn't human or agent, it's where on the slider you sit."

— Andrej Karpathy, "Software 3.0" (2025)

The organisations that do well with this won't be the ones with the biggest AI budget or the slickest launch announcement. They'll be the ones where dev, platform and ops started moving together, early, before the picture was clear. If you're waiting for the clear picture, or for someone else to publish the playbook first, my honest advice is to stop waiting. The playbook isn't coming. Everyone you respect in this space is figuring it out with the lights half on, same as you.

Some of us have a little more light or experience than if you're starting from a blank page, but we are all very much learning as we go, just like we did with cloud and with virtualization.