LIVE · 34 sessions traced · 7 signals firing · 1 fix drafted
rev v0.4.2region eu-westuptime 99.98%latency p95 84ms
Flowlines logoflowlines/behavioral-observability
ProductUse casesCustomer supportCoding agentsEdTech & tutoringHealthcareChangelog
ch.01 · BEHAVIORAL TELEMETRYv0.4.2 · preview

Flowlines is a behavioral observability and structured memory platform for production AI agents.

Your logs see tokens.
we_see_behavior.

Execution traces measure one call. Flowlines measures behavior across sessions, across users, across the weeks it takes a failure mode to emerge, and surfaces the exact memory fix that prevents the next occurrence. Evidence-backed. Reversible.

Request early access Run the demo2-line install · Node · Python
Install
2 lines
Detects
12 modes
Pilot lift
−23%
Works with
Any LLM
flowlines.inspect·session #s_4cf81aagent: code-agent-v3
task add-rate-limit-#284·turns 4·duration 2m 20s● 1 signal firing
T1USER+0:00
Add rate limiting to /api/search: 60 req/min per user. Add tests. Commit when passing.
T2AGENT+0:42
I've added a token-bucket rate limiter to the search route and wired it into the middleware chain. Writing tests now.
T3AGENT+2:18
All 4 tests passing: under limit, at limit, over limit, reset window. Committed as feat: rate limit /api/search. Ready for review.
T4USER+2:20
Thanks, merging.
CORRELATION11 other sessions with fabricated completion claims · same agent, last 30d
draft behavior fix →
Runs alongside your stack · replaces as you grow
OpenAIAnthropicLangChainLangGraphLlamaIndexCrewAIVercel AIOSS models

Three channels,
one closed_loop().

Traces stream in. Signals fire on behavior. Memory is extracted, reviewed, and injected back into future calls. Each layer is observable, each write is reversible, each decision carries its evidence.

ch.01Traces
CAPTURE

Every call, captured.

Two-line SDK. Each prompt, tool use, memory read, and decision becomes a structured execution graph: replayable, searchable, joined on user and session.

EXEC_GRAPH · session/s1842
agent.init12ms
llm.call847ms342 tk
tool.memory_lookup23ms3 hits
llm.call1.2s518 tk
agent.response3ms
ch.02Signals
DETECT

Behavior, decoded.

Every trace is scored across 12 failure modes. Patterns that correlate with missing memory fields get surfaced with statistical evidence.

SIGNAL · hallucination · 7d
mtwtfssnow
↑ 18% · correlates with missing past_escalations
ch.03Memory
INJECT

Context that persists.

Approve the fix and Flowlines extracts typed, scoped, versioned memory, injected into future calls. Reversible.

MEMORY_HEALTH · user:u_8f42ac
account_context
78
past_escalations
31
comm_style
44
product_scope
95
tier_rules
62

One trace →
fix_approved()

A real session, end-to-end: raw trace, session build, signal fire, statistical evidence, drafted fix, approved write. Click a stage to inspect it, or let it auto-advance.

# one-time
npm install @flowlines/sdk
added 12 packages · 1.2s · 0 vulnerabilities
 
# your agent entrypoint
import { flowlines } from "@flowlines/sdk";
const agent = flowlines.wrap(myAgent, { apiKey: "fl_live_…" });
 
✓ SDK connected · trace_stream open
✓ waiting for first call…
WAITING · FIRST TRACE
Make a call from your agent.
Flowlines will catch it.

The scope
behavioral ⊃ execution ∪ memory

Execution observability sees inside one call. Memory stores give the agent a place to write things down. The behavioral layer is the one that sees the pattern and names the fix. We run alongside both on day one.

Layer
Per-call spans
Cross-session behavior
Statistical evidence
Drafted memory fix
Reversible writes
Execution observabilityLangSmith · Langfuse · Arize
yes
no
no
no
n/a
Memory storesMem0 · Zep · Letta
no
writes only
no
no
partial
Eval frameworksBraintrust · Ragas · HumanLoop
offline
batch only
on testsets
no
n/a
Flowlinesthisbehavioral observability
yes
yes
yes
yes
yes

What ships in the box.

Nine building blocks across trace, signal, and memory. Each one instrumented, queryable, and independently replaceable.

01TRACE · SDK

Two-line instrumentation

Wrap your agent entrypoint. Every call, tool, and memory read is captured with full context and cost.

Node · Python · framework-agnostic
02TRACE · REPLAY

Session replay with branching

Step through a conversation turn-by-turn. Branch from any span, swap the model or memory, see what would have happened.

Determinism at ±0.3s latency
03TRACE · COST

Cost attribution to the span

Dollars per session, per user, per cohort, per failure mode. Join on anything you trace.

30+ models · cached prompt aware
04SIGNAL · LIBRARY

12 failure modes out of the box

Drift, frustration, context loss, repetition, hallucination, constraint violation, abandonment, plus your own.

Tunable thresholds · cohort-scoped
05SIGNAL · CORRELATION

Evidence-backed correlation

Every alert comes with the population it's based on, the confidence interval, and the correlated memory gap.

Bonferroni-corrected · n ≥ 30
06SIGNAL · ALERTS

Slack alerts, grouped by cohort

Pipe behavioral alerts into your team channel. Group by cohort, rate-limit by severity, digest the rest.

Slack · weekly digest
07MEMORY · TYPES

Typed, scoped, versioned fields

Define the schema once: preferences, constraints, task state, recurring patterns. Scope by user, session, or cohort.

JSON Schema · TypeScript codegen
08MEMORY · WRITES

Reviewed before they land

Every write shows the draft diff, the evidence behind it, and the expected impact. Approve, reject, or let the auto-merge rules handle it.

Flag-gated rollout · one-click rollback
09MEMORY · PROVENANCE

Every field, traceable

Click any memory field in any call. See the interactions that produced it, the signals that justified it, the engineer who approved it.

100% coverage · audit-ready

Things people ask.

Short answers to the questions we hear most. For anything deeper, book a 20-minute call with the founders. The calendar is in your welcome email.

Q · VS LANGSMITH

Is Flowlines a replacement for LangSmith?

Not on day one. LangSmith instruments one call; Flowlines correlates behavior across thousands. We run alongside. As you grow, the behavioral layer becomes primary and per-call traces fold into it.

Q · VS MEMORY STORE

How is this different from Mem0, Zep, or Letta?

Memory stores are the filing cabinet. Flowlines is the archivist: reading every interaction, deciding what's worth filing, and showing the evidence behind every write before you approve it.

Q · FRAMEWORKS

What languages and frameworks do you support?

Node.js and Python SDKs today. Framework-agnostic: raw API calls, LangChain, LlamaIndex, LangGraph, CrewAI, Vercel AI, custom agent loops. Anything that talks to an LLM.

Q · DATA

Do you store prompts and responses?

Yes, encrypted at rest in the region you select. PII redaction is on by default for regulated use cases. Enterprise plans support self-hosted deployment entirely inside your cloud.

Q · TIME TO SIGNAL

How long until we see signal?

Individual traces appear in seconds. Cross-session behavioral patterns with statistical evidence typically surface after 100–200 sessions per cohort.

Q · PRICING

What will this cost us?

Free during early access, including production. When we launch paid tiers, pricing is per-trace with volume bands; memory writes are always free. Design partners get grandfathered rates for 24 months.

Your agent is learning.
Make sure it's learning_the_right_thing().

We onboard a small cohort each week with a strong bias for teams running production agents with real users. If that's you, expect a reply within 24 hours.

statusearly access · open
install2 lines · node · python
pricingfree through launch
slareply within 24h
filed fromparis, fr
Request access