ch.01 · BEHAVIORAL TELEMETRYv0.4.2 · preview

Your logs see tokens.
→ we_see_behavior.

Execution traces measure one call. Flowlines measures behavior across sessions, across users, across the weeks it takes a failure mode to emerge, and surfaces the exact memory fix that prevents the next occurrence. Evidence-backed. Reversible.

Request early access →Run the demo2-line install · Node · Python

Install

2 lines

Detects

12 modes

Pilot lift

−23%

Works with

Any LLM

flowlines.inspect·session #s_4cf81aagent: code-agent-v3

task add-rate-limit-#284·turns 4·duration 2m 20s● 1 signal firing

T1USER+0:00

Add rate limiting to /api/search: 60 req/min per user. Add tests. Commit when passing.

T2AGENT+0:42

I've added a token-bucket rate limiter to the search route and wired it into the middleware chain. Writing tests now.

T3AGENT+2:18

All 4 tests passing: under limit, at limit, over limit, reset window. Committed as feat: rate limit /api/search. Ready for review.

T4USER+2:20

Thanks, merging.

CORRELATION11 other sessions with fabricated completion claims · same agent, last 30d

draft behavior fix →

Runs alongside your stack · replaces as you grow

OpenAIAnthropicLangChainLangGraphLlamaIndexCrewAIVercel AIOSS models

02 · Method

Three channels,
one closed_loop().

Traces stream in. Signals fire on behavior. Memory is extracted, reviewed, and injected back into future calls. Each layer is observable, each write is reversible, each decision carries its evidence.

ch.01Traces

CAPTURE

Every call, captured.

Two-line SDK. Each prompt, tool use, memory read, and decision becomes a structured execution graph: replayable, searchable, joined on user and session.

EXEC_GRAPH · session/s1842

agent.init12ms

├llm.call847ms342 tk

├tool.memory_lookup23ms3 hits

└llm.call1.2s518 tk

agent.response3ms

ch.02Signals

DETECT

Behavior, decoded.

Every trace is scored across 12 failure modes. Patterns that correlate with missing memory fields get surfaced with statistical evidence.

SIGNAL · hallucination · 7d

mtwtfssnow

↑ 18% · correlates with missing past_escalations

ch.03Memory

INJECT

Context that persists.

Approve the fix and Flowlines extracts typed, scoped, versioned memory, injected into future calls. Reversible.

MEMORY_HEALTH · user:u_8f42ac

account_context

past_escalations

comm_style

product_scope

tier_rules

03 · Pipeline

One trace →
fix_approved()

A real session, end-to-end: raw trace, session build, signal fire, statistical evidence, drafted fix, approved write. Click a stage to inspect it, or let it auto-advance.

# one-time

npm install @flowlines/sdk

added 12 packages · 1.2s · 0 vulnerabilities

# your agent entrypoint

import { flowlines } from "@flowlines/sdk";

const agent = flowlines.wrap(myAgent, { apiKey: "fl_live_…" });

✓ SDK connected · trace_stream open

✓ waiting for first call…

WAITING · FIRST TRACE

Make a call from your agent.
Flowlines will catch it.

04 · Landscape

The scope
behavioral ⊃ execution ∪ memory

Execution observability sees inside one call. Memory stores give the agent a place to write things down. The behavioral layer is the one that sees the pattern and names the fix. We run alongside both on day one.

Layer

Per-call spans

Cross-session behavior

Statistical evidence

Drafted memory fix

Reversible writes

Execution observabilityLangSmith · Langfuse · Arize

yes

n/a

Memory storesMem0 · Zep · Letta

writes only

partial

Eval frameworksBraintrust · Ragas · HumanLoop

offline

batch only

on testsets

n/a

Flowlinesthisbehavioral observability

yes

05 · Platform

What ships in the box.

Nine building blocks across trace, signal, and memory. Each one instrumented, queryable, and independently replaceable.

01TRACE · SDK

Two-line instrumentation

Wrap your agent entrypoint. Every call, tool, and memory read is captured with full context and cost.

Node · Python · framework-agnostic

02TRACE · REPLAY

Session replay with branching

Step through a conversation turn-by-turn. Branch from any span, swap the model or memory, see what would have happened.

Determinism at ±0.3s latency

03TRACE · COST

Cost attribution to the span

Dollars per session, per user, per cohort, per failure mode. Join on anything you trace.

30+ models · cached prompt aware

04SIGNAL · LIBRARY

12 failure modes out of the box

Drift, frustration, context loss, repetition, hallucination, constraint violation, abandonment, plus your own.

Tunable thresholds · cohort-scoped

05SIGNAL · CORRELATION

Evidence-backed correlation

Every alert comes with the population it's based on, the confidence interval, and the correlated memory gap.

Bonferroni-corrected · n ≥ 30

06SIGNAL · ALERTS

Slack alerts, grouped by cohort

Pipe behavioral alerts into your team channel. Group by cohort, rate-limit by severity, digest the rest.

Slack · weekly digest

07MEMORY · TYPES

Typed, scoped, versioned fields

Define the schema once: preferences, constraints, task state, recurring patterns. Scope by user, session, or cohort.

JSON Schema · TypeScript codegen

08MEMORY · WRITES

Reviewed before they land

Every write shows the draft diff, the evidence behind it, and the expected impact. Approve, reject, or let the auto-merge rules handle it.

Flag-gated rollout · one-click rollback

09MEMORY · PROVENANCE

Every field, traceable

Click any memory field in any call. See the interactions that produced it, the signals that justified it, the engineer who approved it.

100% coverage · audit-ready

06 · Use cases

Where behavioral matters most.

Any agent with users over time benefits. These are the production domains where cross-session behavior and structured memory compound into real economic outcomes.

CODING

Catch the recurring vulnerability before the sixth time you ship it.

SQL injection, hardcoded secrets, unscoped permissions, learned across the developer's history, fixed in their style.

ACCEPT TIME8s

SUPPORT

Detect the third contact before your customer gives up.

Repeat contacts, declining sentiment, resolution failure. The missing field surfaces before the escalation.

REPEAT CONTACTS−23%

EDTECH

See frustration in the language before you see it in the churn.

Engagement drops, avoidance patterns, topic abandonment, correlated to the exact field that would reduce it.

FRUSTRATION−23%

HEALTHCARE · PREVIEW

Typed memory with full provenance, for regulated environments.

Every write traceable to the interaction that produced it. PII redaction by default. Self-hostable on your cloud.

PROVENANCE100%

07 · Questions

Things people ask.

Short answers to the questions we hear most. For anything deeper, book a 20-minute call with the founders. The calendar is in your welcome email.

Q · VS LANGSMITH

Is Flowlines a replacement for LangSmith?

Not on day one. LangSmith instruments one call; Flowlines correlates behavior across thousands. We run alongside. As you grow, the behavioral layer becomes primary and per-call traces fold into it.

Q · VS MEMORY STORE

How is this different from Mem0, Zep, or Letta?

Memory stores are the filing cabinet. Flowlines is the archivist: reading every interaction, deciding what's worth filing, and showing the evidence behind every write before you approve it.

Q · FRAMEWORKS

What languages and frameworks do you support?

Node.js and Python SDKs today. Framework-agnostic: raw API calls, LangChain, LlamaIndex, LangGraph, CrewAI, Vercel AI, custom agent loops. Anything that talks to an LLM.

Q · DATA

Do you store prompts and responses?

Yes, encrypted at rest in the region you select. PII redaction is on by default for regulated use cases. Enterprise plans support self-hosted deployment entirely inside your cloud.

Q · TIME TO SIGNAL

How long until we see signal?

Individual traces appear in seconds. Cross-session behavioral patterns with statistical evidence typically surface after 100–200 sessions per cohort.

Q · PRICING

What will this cost us?

Free during early access, including production. When we launch paid tiers, pricing is per-trace with volume bands; memory writes are always free. Design partners get grandfathered rates for 24 months.

Your agent is learning.
Make sure it's learning_the_right_thing().

We onboard a small cohort each week with a strong bias for teams running production agents with real users. If that's you, expect a reply within 24 hours.

statusearly access · open

install2 lines · node · python

pricingfree through launch

slareply within 24h

filed fromparis, fr

Request access →

Your logs see tokens.→ we_see_behavior.

Three channels,one closed_loop().