Agents
Agents are the core building block of Callem Studio. An agent is an AI-powered voice assistant configured with a system prompt, a voice, a language model, and optional integrations. The agent configuration is split across multiple tabs, each controlling a different aspect of its behavior.Creating an Agent
- Navigate to Build > Agents
- Click New Agent (for prompt-based) or New Workflow Agent (for flow-based)
- Name your agent and start configuring
Organizing with Folders
As your agent library grows, you can group agents into folders for easier navigation.- Create a folder: click the + button in the agent sidebar and select Folder
- Move an agent: drag and drop an agent onto a folder, or use the folder icon on hover to pick a destination
- Rename / delete: right-click (or hover) a folder header to rename or delete it
Agent Tabs
Each agent has the following configuration tabs:| Tab | Purpose |
|---|---|
| Model | System prompt, LLM provider/model, temperature, knowledge bases, tools |
| Voice | TTS voice selection, STT configuration, VAD, interruptions, background sound |
| Call Analysis | Post-call analysis fields and prompt |
| Advanced | Safety features, privacy, voicemail detection, observability |
Model Tab
The Model tab controls the agent’s “brain” — what it says, how it reasons, and what data it has access to.Welcome Message
The first thing the agent says when a call begins. Leave empty for the agent to wait for the caller to speak first.System Prompt
The most critical configuration. It defines the agent’s identity, instructions, tone, and boundaries.Example System Prompt
LLM Configuration
| Parameter | Description | Default |
|---|---|---|
| Provider | The LLM provider (e.g. OpenAI, Anthropic, Azure) | — |
| Model | Specific model version (e.g. gpt-4o, gpt-4o-mini) | — |
| Temperature | Controls response randomness. 0 = deterministic, 1 = creative. Lower values are recommended for consistent voice agents. | 0.7 |
| Max Tokens | Maximum number of tokens the LLM can generate per response. Higher values allow longer answers but increase latency and cost. | 1024 |
For voice agents, a temperature between 0.3 and 0.5 usually gives the best balance between natural-sounding and reliable responses. Higher values can cause the agent to go off-script.
Knowledge Base
Link one or more knowledge bases to give the agent access to domain-specific content. When a caller asks a question, the agent retrieves the most relevant content from the linked knowledge bases and uses it to answer. Retrieval Settings (available when at least one knowledge base is linked):| Setting | Description | Default |
|---|---|---|
| Max documents | Maximum number of text chunks to retrieve per query (1-10). Higher = more context but more tokens. | 6 |
| Relevance threshold | Minimum similarity score (0-1) for a chunk to be included. Higher = fewer but more relevant matches. | 0.6 |
Tools
Link tools (custom functions) that the agent can invoke during a call. When the LLM determines that a tool should be called based on the conversation context, it passes the extracted parameters to the tool’s endpoint and uses the result to continue the conversation.Voice Tab
The Voice tab controls how the agent sounds and how it processes caller speech.Text-to-Speech (TTS)
Choose the voice the agent uses to speak. The available voices are loaded dynamically from configured providers. You can filter voices by language, gender, and provider. Some voices offer additional settings (like model selection) that appear automatically when relevant.Speech-to-Text (STT)
Configure how caller speech is transcribed:| Setting | Description |
|---|---|
| Provider | The STT engine used for transcription |
| Language | Primary language for transcription accuracy |
Voice Activity Detection (Semantic VAD)
Semantic VAD uses the LLM to determine when the caller has finished speaking, rather than relying solely on silence detection. This produces more natural conversations, especially when callers pause mid-sentence.| Setting | Description | Default |
|---|---|---|
| Enabled | Toggle semantic VAD on/off | Off |
| Config | Additional VAD configuration parameters | — |
Silence Confirmation Delay (End of Speech Threshold)
The silence duration (in seconds) before the system considers the caller has finished speaking. Lower values make the agent more responsive but may cut the caller off. Higher values are more patient but introduce latency.| Setting | Value |
|---|---|
| Minimum | 0.5s |
| Maximum | 5s |
| Default | 1.5s |
| Range | Behavior |
|---|---|
| 0.5–1.0s | Very responsive, risks interrupting pauses |
| 1.0–2.0s | Good balance for most use cases |
| 2.0–5.0s | Very patient, best for elderly callers or complex topics |
Interruption Config
Controls whether and how the caller can interrupt the agent while it’s speaking.| Setting | Description | Min | Max | Default |
|---|---|---|---|---|
| Enabled | Allow caller to interrupt the agent | — | — | Off |
| Minimum words | Minimum words the caller must say before the interruption is registered | 1 | 10 | 3 |
| Minimum duration | Minimum duration (in seconds) of caller speech before interruption triggers | 1 | 60 | 15 |
Inactivity Timeout (Reminder Message)
Time in seconds before the agent sends a reminder message if the caller remains silent.| Setting | Value |
|---|---|
| Minimum | 6s |
| Maximum | 30s |
| Default | 15s |
Voice Speed
Adjust the speaking speed of the agent’s voice.1.0 is normal speed. Available when the TTS provider supports speed adjustment (displayed dynamically based on provider settings).
Background Sound
Add ambient background audio to make calls feel more natural (e.g. office ambiance, call center sounds).| Setting | Description | Default |
|---|---|---|
| Enabled | Toggle background sound on/off | Off |
| Sound | Select from a library of pre-configured ambient sounds | — |
| Volume | Adjust the volume level (0–100%) | 50% |
Call Analysis Tab
Call Analysis is a post-conversation processing feature that uses an LLM to extract structured data from conversation transcripts after the conversation ends. It runs for both voice calls and chat sessions. This is useful for automatically categorizing conversations, extracting key information, and feeding data into your analytics dashboards.How It Works
- After a call ends, the full transcript is sent to an LLM along with your analysis prompt
- The LLM extracts the requested fields based on the prompt and field definitions
- The extracted data is stored as
callAnalysison the call record - This data can be used in analytics charts, alert rules, and API responses
Analysis Fields
Define the fields you want to extract from each call. Each field has a type:| Type | Description | Example |
|---|---|---|
| Text | Free-form text extraction | Summary, key takeaways, caller concerns |
| Selector | One value from a predefined list | Sentiment (positive, neutral, negative), Lead quality (hot, warm, cold) |
| Number | Numeric value extraction | NPS score, satisfaction rating (1-5) |
| Boolean | True/false determination | Appointment booked, issue resolved, callback requested |
Analysis Prompt
A custom prompt sent to the LLM along with the call transcript. Use it to give the LLM context about your business and instructions on how to analyze each field.Minimum Duration
Calls shorter than this duration (in seconds) will not be analyzed. This avoids wasting LLM calls on hang-ups, wrong numbers, or very short interactions.Advanced Tab
LLM Safety Features
| Feature | Description |
|---|---|
| Guardrails | Enables safety filters on LLM responses to prevent harmful, inappropriate, or off-topic content |
Privacy
| Feature | Description |
|---|---|
| Anonymization | Automatically anonymizes personal data (names, phone numbers, addresses) in stored conversations |
Voice Mail Detection
Automatically detects when a call reaches a voicemail or IVR system and hangs up to avoid wasting resources.| Setting | Description | Default |
|---|---|---|
| Enabled | Toggle voicemail detection on/off | Off |
| Initial detection delay | Seconds to wait before first detection attempt | 5 |
| Detection retry interval | Seconds between each detection retry | 5 |
| Max detection retries | Maximum number of detection attempts before giving up | 60 |
Observability
Connect external observability platforms to trace and debug your agent’s LLM interactions:| Provider | Fields |
|---|---|
| Langfuse | Public Key, Secret Key, Base URL |
| Langsmith | API URL, API Key |
Best Practices
Write specific, actionable prompts
Write specific, actionable prompts
Vague prompts like “be helpful” lead to inconsistent behavior. Describe exact scenarios: “When a caller asks about pricing, quote from the attached knowledge base. Never invent prices.”
Lower the temperature for production agents
Lower the temperature for production agents
A temperature of 0.3-0.5 gives reliable, consistent responses. Only use higher values for creative or exploratory use cases.
Use call analysis to measure quality
Use call analysis to measure quality
Define analysis fields like
sentiment, issueResolved, and callbackRequested to automatically track call quality at scale.Test with real conversations
Test with real conversations
After configuring your agent, call the assigned phone number yourself. Listen for unnatural pauses, incorrect responses, or missed instructions. Use the built-in chat or webcall features for faster iteration.
Start simple, iterate
Start simple, iterate
Begin with a minimal prompt and a single knowledge base. Add tools, analysis fields, and advanced settings once the basic flow works well.