Conversation Memory: How Falcon Builder Gives Your AI Workflows Persistent Context

Most workflow automation tools treat every execution as a blank slate. An AI prompt fires, generates a response, and forgets everything. The next time the same user sends a message, the AI has no idea what was said before. If you're building a support bot, an intake assistant, or any workflow that handles conversations, this is a problem.

We built Conversation Memory to solve it. It's a persistent storage layer for multi-turn conversation history, designed specifically for workflow automation. Your AI nodes can remember what was said in previous executions, maintain context across hours or days, and automatically compress older messages to stay within LLM context limits.

This post walks through the architecture, how it works in practice, and the design decisions that shaped it.

The Problem: Stateless AI in a Stateful World

Consider an SMS support bot built with Falcon Builder. A customer texts “What are your hours?” — the workflow triggers, the AI Prompt node generates a response, and the bot replies. Simple enough. But then the customer follows up: “Do you offer weekend appointments?”

Without memory, the AI has no context. It doesn't know the user already asked about hours. It doesn't know what it said in its previous response. The follow-up question might reference something from the prior exchange, and the AI can't connect the dots. Every execution is isolated.

The traditional workaround is to store messages in your own database and manually reconstruct the conversation before each AI call. That works, but it requires multiple nodes, custom code, and careful attention to token limits. We wanted something that takes one checkbox.

Two Ways to Use Memory

Conversation Memory is available in two forms, depending on how much control you want:

1. Built-in to AI Prompt (Zero Extra Nodes)

The AI Prompt node has a built-in Enable conversation memory toggle. When you turn it on and provide a Conversation ID, the node automatically:

Loads recent messages from that conversation before calling the LLM
Includes an optional summary of older messages for extended context
Stores both the user prompt and the assistant response after the call
Triggers auto-summarization when the unsummarized message count exceeds a threshold

This is the simplest path. You configure it once, and every subsequent execution automatically carries the full conversation context. No additional nodes required.

2. Standalone Conversation Memory Node

For workflows that need more control, the dedicated Conversation Memory node (🧠) gives you three explicit actions:

Retrieve — fetch the last N messages and an optional summary, storing them in a variable (e.g., {{memory.messages}}) for use in downstream nodes
Store — save a message with a specific role (user, assistant, or system) and content, supporting {{variable}} interpolation for dynamic content
Clear — permanently delete all messages and summaries for a conversation, useful for resetting threads

This is useful when you need to store messages at specific points in a workflow, retrieve memory for non-AI nodes (like populating an email body with conversation history), or clear conversations based on a condition.

Conversation Scoping

Every conversation is uniquely identified by three keys: workspace ID, workflow ID, and conversation ID. The first two are automatic — they come from the workflow context. The third is where the design gets interesting.

The Conversation ID is a string you define, and it supports variable interpolation. This means you can scope conversations dynamically based on the trigger data:

{{trigger.from}} — for SMS bots, each phone number gets its own conversation thread automatically
{{$json.sessionId}} — for webhook-driven chatbots, scope by the session ID from the request body
{{trigger.sender}} — for email workflows, each sender maintains a separate conversation
A static string like global — for workflows where all users share the same thread (e.g., a team assistant)

This scoping model means a single workflow can handle thousands of independent conversations simultaneously without any additional configuration. The phone number +1-555-0123 has its own thread, +1-555-0456 has another, and they never cross.

Auto-Summarization: Staying Within Token Limits

Long-running conversations hit a practical ceiling: LLM context windows. If a customer has exchanged 200 messages with your bot, you can't send all of them with every new prompt — you'd blow through the context limit and the API call would fail.

Conversation Memory solves this with auto-summarization. When the number of unsummarized messages exceeds a configurable threshold (default: 20), the system automatically:

Takes the older messages that are above the “recent” window
Sends them to the LLM with a summarization prompt
Stores the compressed summary as a separate record
Marks the original messages as summarized (they're retained, not deleted)

When the conversation is loaded for the next execution, the AI receives: the summary of older context, plus the most recent N messages in full. This gives the LLM a sense of the overall conversation trajectory while keeping the token count manageable.

Think of it like a colleague catching up on a long email thread — they read a summary of the early messages and the full detail of the recent ones.

Under the Hood

The storage layer is built on two database tables scoped to workspaces:

conversation_messages
  ├── id (UUID)
  ├── workspaceId
  ├── workflowId
  ├── conversationId
  ├── role (user | assistant | system)
  ├── content (text)
  ├── metadata (JSONB)
  └── createdAt

conversation_summaries
  ├── id (UUID)
  ├── workspaceId
  ├── workflowId
  ├── conversationId
  ├── summary (text)
  ├── messagesStart / messagesEnd (range covered)
  ├── messageCount
  └── createdAt

Composite indexes on (workspaceId, workflowId, conversationId) ensure fast lookups even with millions of messages across different conversations. The summary table tracks which message range it covers, so summaries can be chained as conversations grow very long.

Here's the flow when an AI Prompt node with memory enabled executes:

Workflow triggers (e.g., incoming SMS)
    |
    v
AI Prompt node starts
    |
    +-- Resolve conversationId (e.g., {{trigger.from}} → "+15550123")
    +-- Load conversation history
    |     +-- Fetch latest summary (if exists)
    |     +-- Fetch last N messages (default: 10)
    |
    v
Build LLM messages array
    +-- System prompt (your configured prompt)
    +-- [Summary] "Previous context: ..."
    +-- [Message 1] { role: "user", content: "..." }
    +-- [Message 2] { role: "assistant", content: "..." }
    +-- ...
    +-- [Current] { role: "user", content: "{{trigger.body}}" }
    |
    v
Call LLM → get response
    |
    v
Store messages
    +-- Store user message (trigger body)
    +-- Store assistant message (LLM response)
    +-- Check if summarization threshold exceeded
    |     +-- If yes: summarize older messages
    |
    v
Return AI response to workflow

Real-World Use Cases

SMS Support Bot

A medical clinic uses a Twilio SMS trigger to receive patient questions. The AI Prompt node has memory enabled with {{trigger.from}} as the Conversation ID. When a patient texts “What vaccines do you offer?” followed by “Which ones are covered by insurance?”, the AI knows “which ones” refers to vaccines because it has the full conversation history. Each patient's phone number maintains an independent thread.

Webhook Chatbot with Session Management

A SaaS product embeds a chat widget that calls a Falcon Builder webhook. The session ID from the widget becomes the Conversation ID. The AI maintains context for the entire support session, can reference earlier messages, and the conversation is automatically cleared when the session expires via a scheduled workflow that calls the Memory node with the “clear” action.

Email Follow-Up Agent

An agency uses an email trigger to process client requests. The AI Prompt uses {{trigger.sender}} as the Conversation ID. When a client sends a follow-up email referencing a previous request, the AI has the full email thread context and can generate a coherent reply without the client needing to restate their issue.

Design Decisions

A few choices we made and why:

Workspace-scoped, not global. Conversations are isolated by workspace. If you have multiple workspaces (Enterprise), conversations in one can't leak into another. This is critical for multi-tenant deployments.
Messages are retained after summarization. When messages are summarized, the originals aren't deleted. The summary is an overlay, not a replacement. This means you can always audit the full conversation history if needed.
The standalone node exists alongside the built-in toggle. We could have only offered the built-in AI Prompt integration. But workflows that need to store messages at specific points (e.g., after a human review step, or before branching logic) need explicit control. The standalone Conversation Memory node provides that without complicating the common case.
No TTL by default. Conversations persist until explicitly cleared. This is intentional — for support bots and ongoing relationships, you want history to survive indefinitely. If you need auto-expiry, a scheduled workflow can clear conversations older than a threshold using the “clear” action.

What's Next

Conversation Memory is the foundation for richer AI agent capabilities in Falcon Builder. We're exploring retrieval-augmented generation (RAG) integration where memory works alongside document context, multi-agent conversations where different AI Prompt nodes in the same workflow share a conversation thread, and analytics dashboards that surface conversation patterns and common user intents across your workflows.

The goal is to make every AI workflow in Falcon Builder feel like it's talking to a person who remembers — not a stateless function that starts over every time.

Try Conversation Memory

Conversation Memory is available on every Falcon Builder plan, including Free. Enable it in any AI Prompt node with a single toggle, or drop in the standalone Conversation Memory node for full control.

Get started free Read the docs See pricing