Glossary

The vocabulary of behavioral observability.

Plain definitions for the terms Flowlines uses. If a word on the site is unfamiliar, it's probably defined here.

Behavioral observability

Behavioral observability is the practice of measuring how an AI agent behaves across many sessions and users, not just whether individual calls succeeded.

Where execution observability inspects one LLM call (latency, tokens, status), behavioral observability looks at patterns across thousands of sessions: which failure signals are firing, which user cohorts are degrading, and whether a new prompt version helped or hurt. It sits on top of your existing traces.

How do I know if my agent is working in production? →

Behavioral signal

A behavioral signal is a named, tunable detector that fires when a session shows a specific failure or quality pattern, such as hallucination, cascade_failure, user_frustration, or cost_escalation.

Signals are the unit of detection in Flowlines. They run automatically on every ingested session and roll up into rates, baselines, and cohort breakdowns. Teams can promote recurring patterns from the discoveries queue into their own custom signals.

What signals does Flowlines detect out of the box? →

Agent drift

Agent drift is the gradual divergence of an agent's behavior from its original instructions, learned state, or expected outcomes over time.

Drift doesn't crash; it shows up as a sustained rise in a behavioral signal rate, a cohort sliding from happy to at-risk, or a regression after a prompt deploy. It is invisible at the single-trace level and only detectable by comparing behavior across rolling windows and versions.

How do I detect agent drift over time? →

Silent failure

A silent failure is a session that returns a successful response (HTTP 200) while the answer is actually wrong, empty, looping, or off-policy, so error monitoring never flags it.

Silent failures are the most common and most damaging agent failures because they don't throw. Behavioral signals like hallucination, low_quality_response, and cascade_failure are how Flowlines surfaces them.

How do I catch silent failures in production? →

Behavioral cohort

A behavioral cohort is a group of users defined by observed behavior rather than demographics, for example happy, neutral, at-risk, active, dormant, or churning.

Flowlines computes cohorts automatically from activity and a CSAT proxy, and supports custom rule-based cohorts over any traced attribute. Filtering the rest of the app by cohort reveals which segment is driving a signal or a regression.

How do I segment my users by behavior? →

Power Score

Power Score is a per-user metric defined as CSAT multiplied by cadence, surfacing users who are both highly active and highly satisfied (or active and unhappy).

It separates your genuine power users from users who are merely frequent-but-frustrated, which is the segment most likely to churn. It's computed from behavior, with no survey required.

How do I measure satisfaction without CSAT surveys? →

Discovery

A discovery is a recurring behavioral pattern Flowlines surfaces automatically that doesn't yet match a defined signal, queued for you to promote into a tracked custom signal.

Discoveries are how the signal library grows from your own data: the system flags patterns (including feature requests phrased in users' own words) that recur often enough to matter, and you decide which become first-class signals.

How do I find feature requests buried in conversations? →

Version impact

Version impact is the measured before-and-after effect of a prompt or agent deploy on signal rates, success rate, cost, and latency, pinned to the deployed version.

Instead of guessing whether a new prompt helped, you see the deploy as a marked point on the timeline and the deltas it caused, the same day it shipped, broken down by intent or cohort.

Is my new prompt working better than the old one? →

Trace ingestion (read-layer)

Trace ingestion is how Flowlines connects to the traces you already store (Langfuse, OpenTelemetry, or a JSON drop) and reads them, with no SDK shipped into your agent.

Because Flowlines reads from your existing trace store rather than instrumenting your code, there's no code change and no added latency. The first connect backfills the last 30 days so signals fire within minutes.

How do I connect Flowlines to my agent? →

Memory recommendation (preview)

A memory recommendation is a drafted, structured memory field that would have prevented a recurring failure, surfaced with the evidence behind it for you to approve, reject, or auto-merge.

This closes the loop from detection to fix and is in design-partner preview. It integrates with whatever memory store you already use; every write is reversible.

How do I know what to fix next? →