Stepan Samko | Consulting [email protected]

Why AI Development Tools Must Be Execution-Aware

December 2025

From code readers to system observers


AI development tools have made real progress by getting better at one thing: understanding code.

Larger context windows. Repository-wide analysis. Better static reasoning. Smarter refactors. For many tasks – navigation, refactoring, feature scaffolding – this works extremely well.

And then you hit debugging.

Not syntax errors. Not missing imports.
The hard problems:

At that point, the tools stall – not because they lack intelligence, but because they lack visibility into execution.

This isn’t a missing feature.
It’s a missing design principle.

AI development tools must be execution-aware by default.

That means treating how code runs as a first-class input, not an afterthought layered on top of static analysis.


Static understanding hits a ceiling

Static analysis is powerful. It tells you:

But static analysis cannot tell you:

Consider this code:

await db.query(...)
await fetch(...)
await cache.set(...)

Statically, it’s fine. Semantically, it’s fine.

Whether this is fast, slow, flaky, or broken depends on things that do not appear in the source:

The gap here isn’t subtle. Static understanding simply runs out of information.


Debugging is about evidence, not intent

When something goes wrong, developers don’t start by rereading the entire codebase.

They ask questions like:

These are questions about observed behavior, not design intent.

Most AI tools invert this process:

  1. Read code
  2. Guess likely causes
  3. Ask the human to check behavior
  4. Wait for a summarized explanation
  5. Propose a fix

The human becomes the bridge between what happened and what the AI can reason about.

That bridge is slow, lossy, and fragile.


Execution signals already exist – but aren’t treated as inputs

Modern systems already produce rich execution data:

The problem isn’t lack of data.

The problem is that AI tools usually see this data only after it’s been:

By the time the model sees it, the data is no longer something it can interrogate. It’s something it can only react to.


A design principle: execution-aware by default

Instead of thinking in terms of features (“let’s add trace summaries”), it helps to frame this as a principle:

Execution signals are first-class inputs to AI development tools.

That implies a few concrete shifts:

This aligns with how debugging actually works: observe → hypothesize → narrow → confirm.


What execution-aware tools enable

Once tools can observe execution directly, several things change immediately:

These aren’t incremental improvements. They change how problems get narrowed and solved.


Why “just summarize traces” isn’t enough

A common response is:

“Why not just summarize traces for the AI?”

Because debugging is interactive.

Summaries:

Observation allows:

A debugger that only gives you summaries is frustrating. An AI tool constrained to summaries has the same limitation.


Addressing common objections

”Isn’t this just log analysis?”

No. Logs are flat and inconsistent. Execution signals like traces preserve structure, timing, and causality. They support questions like “what dominated this request?” rather than “what messages were printed?"

"Won’t this overwhelm the model?”

Only if you dump raw data into prompts. Execution-aware design means queryable interfaces, not streaming everything. The model pulls small, relevant slices – just like a human does.

”Isn’t this dangerous in production?”

It can be, which is why execution-aware doesn’t mean unrestricted access. Scope, redaction, and access controls still matter. The principle is about what counts as input, not about removing safeguards.

”Isn’t this just observability?”

Observability tools are built for humans to inspect dashboards. Execution-aware AI tools are built so machines can interrogate behavior directly.


A concrete example (but not the point)

One concrete implementation of this idea is otel-mcp, which exposes OpenTelemetry traces to AI agents during local development.

It’s intentionally narrow:

Its importance isn’t the tool itself, but what it demonstrates: execution data can be treated as something an AI tool queries, not something a human must explain.


A shift in interaction models

You can visualize the difference like this:

flowchart LR
  A[Running System]
  B[Logs / Traces]
  C[Human]
  D[Prompt]
  E[AI]

  A --> B --> C --> D --> E

Today, the human is the interpreter.

flowchart LR
  A[Running System]
  B[Execution Signals]
  E[AI]
  C[Human Oversight]

  A --> B --> E
  E --> C

Execution-aware tools let AI observe behavior directly, while humans supervise, validate, and decide.


Why this matters long-term

As AI tools take on more responsibility – refactoring, optimizing, deploying – the cost of acting without execution awareness grows.

Without it:

With it:

This is the difference between tools that talk about systems and tools that can work with them.


Choosing tools through this lens

As builders and users of AI dev tools, a useful question becomes:

Does this tool treat execution as a first-class input, or as something I have to explain?

That question applies regardless of language, framework, or vendor.

It’s a design stance – not a feature checkbox.


Closing thought

Static code understanding got AI tools into the room.

Execution awareness is what lets them stay useful once things get messy.

The next step in AI-assisted development isn’t bigger prompts or better guesses – it’s grounding reasoning in how systems actually behave.

Tools that can observe execution will feel fundamentally different from tools that can only read code.

Over time, that difference will matter more than almost any single feature.