Understanding deploys

See what each release did to your agents.

Every prompt change, model swap, and tool update shifts behavior in ways tests don't catch. Flowlines plots behavior against your deploy timeline, so a regression shows up as a step in the chart, not a flood of tickets a week later.

Book a demo Understanding users

01 · Timeline

Behavior, version by version.

Fabricated-completion rate over time, with a marker on every deploy. The regression is the one place the line steps up.

Fabricated completion raterefund-agent

flag ratedeploy

v2.3 took fabricated completions from 1.1% to 4.6%. A prompt change dropped the tool-result check.

Comparev2.2v2.3

Fabricated completion1.1%4.6% ▲318%

Context loss2.4%3.0% ▲25%

Repeat loop3.7%3.3% ▼11%

p95 latency410ms500ms ▲22%

02 · Diff

What changed between two versions.

Pick any two deploys and Flowlines diffs every behavior and metric between them: what got worse, what got better, by how much. The release notes you wish your agent shipped with.

Every behavior and metric, side by side
Sample sessions from each side

03 · Alerts

Get told the moment a deploy regresses.

Set a baseline and Flowlines watches every release against it. When a deploy pushes any behavior past your threshold, the alert names the version, the behavior, and the size of the jump, in Slack, before your users find it.

Per-behavior regression thresholds
Slack, Linear, or CI gate on webhook

v2.3 regressed fabricated completion.LIVE

1.1% → 4.6% · threshold 2.0% · deployed 41m ago

v2.3 p95 latency up 22%.WATCH

410ms → 500ms · within threshold

v2.2 cleared its baseline.PASS

All behaviors within threshold · safe to roll out

Catch the next regression before users do.

We'll connect a sample of your sessions and replay your last few deploys, so you see exactly what each one changed.

Book a demo