ProductUse casesDeploysUsersBlog
Understanding deploys

See what each release did to your agents.

Every prompt change, model swap, and tool update shifts behavior in ways tests don't catch. Flowlines plots behavior against your deploy timeline, so a regression shows up as a step in the chart, not a flood of tickets a week later.

Book a demoUnderstanding users
01 · Timeline

Behavior, version by version.

Fabricated-completion rate over time, with a marker on every deploy. The regression is the one place the line steps up.

Fabricated completion raterefund-agent
flag ratedeploy
6%4%2%0%v2.1v2.2v2.3v2.4
v2.3 took fabricated completions from 1.1% to 4.6%. A prompt change dropped the tool-result check.
Comparev2.2v2.3
Fabricated completion1.1%4.6% ▲318%
Context loss2.4%3.0% ▲25%
Repeat loop3.7%3.3% ▼11%
p95 latency410ms500ms ▲22%
02 · Diff

What changed between two versions.

Pick any two deploys and Flowlines diffs every behavior and metric between them: what got worse, what got better, by how much. The release notes you wish your agent shipped with.

  • Every behavior and metric, side by side
  • Sample sessions from each side
03 · Alerts

Get told the moment a deploy regresses.

Set a baseline and Flowlines watches every release against it. When a deploy pushes any behavior past your threshold, the alert names the version, the behavior, and the size of the jump, in Slack, before your users find it.

  • Per-behavior regression thresholds
  • Slack, Linear, or CI gate on webhook
v2.3 regressed fabricated completion.LIVE
1.1% → 4.6% · threshold 2.0% · deployed 41m ago
v2.3 p95 latency up 22%.WATCH
410ms → 500ms · within threshold
v2.2 cleared its baseline.PASS
All behaviors within threshold · safe to roll out

Catch the next regression before users do.

We'll connect a sample of your sessions and replay your last few deploys, so you see exactly what each one changed.

Book a demo