Skip to main content

Agents

Agents are the core building block of Callem Studio. An agent is an AI-powered voice assistant configured with a system prompt, a voice, a language model, and optional integrations. The agent configuration is split across multiple tabs, each controlling a different aspect of its behavior.

Creating an Agent

  1. Navigate to Build > Agents
  2. Click New Agent (for prompt-based) or New Workflow Agent (for flow-based)
  3. Name your agent and start configuring

Organizing with Folders

As your agent library grows, you can group agents into folders for easier navigation.
  • Create a folder: click the + button in the agent sidebar and select Folder
  • Move an agent: drag and drop an agent onto a folder, or use the folder icon on hover to pick a destination
  • Rename / delete: right-click (or hover) a folder header to rename or delete it
Folders are flat (one level deep). Deleting a folder moves its agents back to the root. The collapsed/expanded state of each folder is remembered across sessions.

Agent Tabs

Each agent has the following configuration tabs:
TabPurpose
ModelSystem prompt, LLM provider/model, temperature, knowledge bases, tools
VoiceTTS voice selection, STT configuration, VAD, interruptions, background sound
Call AnalysisPost-call analysis fields and prompt
AdvancedSafety features, privacy, voicemail detection, observability

Model Tab

The Model tab controls the agent’s “brain” — what it says, how it reasons, and what data it has access to.

Welcome Message

The first thing the agent says when a call begins. Leave empty for the agent to wait for the caller to speak first.

System Prompt

The most critical configuration. It defines the agent’s identity, instructions, tone, and boundaries.
Write your system prompt as if you were briefing a real employee. Be specific about scenarios, responses, and what the agent should avoid. The more precise you are, the more reliable the agent’s behavior will be.
Example System Prompt
You are Sophie, a friendly receptionist at Acme Dental Clinic.
Your role is to answer incoming calls, schedule appointments, 
and answer questions about our services.

Rules:
- Always greet the caller warmly
- Ask for their name and reason for calling
- If they want an appointment, use the book_appointment tool
- Never provide medical advice
- If unsure, offer to transfer to the office manager

LLM Configuration

ParameterDescriptionDefault
ProviderThe LLM provider (e.g. OpenAI, Anthropic, Azure)
ModelSpecific model version (e.g. gpt-4o, gpt-4o-mini)
TemperatureControls response randomness. 0 = deterministic, 1 = creative. Lower values are recommended for consistent voice agents.0.7
Max TokensMaximum number of tokens the LLM can generate per response. Higher values allow longer answers but increase latency and cost.1024
For voice agents, a temperature between 0.3 and 0.5 usually gives the best balance between natural-sounding and reliable responses. Higher values can cause the agent to go off-script.

Knowledge Base

Link one or more knowledge bases to give the agent access to domain-specific content. When a caller asks a question, the agent retrieves the most relevant content from the linked knowledge bases and uses it to answer. Retrieval Settings (available when at least one knowledge base is linked):
SettingDescriptionDefault
Max documentsMaximum number of text chunks to retrieve per query (1-10). Higher = more context but more tokens.6
Relevance thresholdMinimum similarity score (0-1) for a chunk to be included. Higher = fewer but more relevant matches.0.6

Tools

Link tools (custom functions) that the agent can invoke during a call. When the LLM determines that a tool should be called based on the conversation context, it passes the extracted parameters to the tool’s endpoint and uses the result to continue the conversation.

Voice Tab

The Voice tab controls how the agent sounds and how it processes caller speech.

Text-to-Speech (TTS)

Choose the voice the agent uses to speak. The available voices are loaded dynamically from configured providers. You can filter voices by language, gender, and provider. Some voices offer additional settings (like model selection) that appear automatically when relevant.

Speech-to-Text (STT)

Configure how caller speech is transcribed:
SettingDescription
ProviderThe STT engine used for transcription
LanguagePrimary language for transcription accuracy

Voice Activity Detection (Semantic VAD)

Semantic VAD uses the LLM to determine when the caller has finished speaking, rather than relying solely on silence detection. This produces more natural conversations, especially when callers pause mid-sentence.
SettingDescriptionDefault
EnabledToggle semantic VAD on/offOff
ConfigAdditional VAD configuration parameters
Enable Semantic VAD for complex conversations where callers may pause to think. Disable it for simple, fast-paced interactions (e.g. appointment confirmations) where silence-based detection is sufficient.

Silence Confirmation Delay (End of Speech Threshold)

The silence duration (in seconds) before the system considers the caller has finished speaking. Lower values make the agent more responsive but may cut the caller off. Higher values are more patient but introduce latency.
SettingValue
Minimum0.5s
Maximum5s
Default1.5s
RangeBehavior
0.5–1.0sVery responsive, risks interrupting pauses
1.0–2.0sGood balance for most use cases
2.0–5.0sVery patient, best for elderly callers or complex topics

Interruption Config

Controls whether and how the caller can interrupt the agent while it’s speaking.
SettingDescriptionMinMaxDefault
EnabledAllow caller to interrupt the agentOff
Minimum wordsMinimum words the caller must say before the interruption is registered1103
Minimum durationMinimum duration (in seconds) of caller speech before interruption triggers16015

Inactivity Timeout (Reminder Message)

Time in seconds before the agent sends a reminder message if the caller remains silent.
SettingValue
Minimum6s
Maximum30s
Default15s

Voice Speed

Adjust the speaking speed of the agent’s voice. 1.0 is normal speed. Available when the TTS provider supports speed adjustment (displayed dynamically based on provider settings).

Background Sound

Add ambient background audio to make calls feel more natural (e.g. office ambiance, call center sounds).
SettingDescriptionDefault
EnabledToggle background sound on/offOff
SoundSelect from a library of pre-configured ambient sounds
VolumeAdjust the volume level (0–100%)50%

Call Analysis Tab

Call Analysis is a post-conversation processing feature that uses an LLM to extract structured data from conversation transcripts after the conversation ends. It runs for both voice calls and chat sessions. This is useful for automatically categorizing conversations, extracting key information, and feeding data into your analytics dashboards.

How It Works

  1. After a call ends, the full transcript is sent to an LLM along with your analysis prompt
  2. The LLM extracts the requested fields based on the prompt and field definitions
  3. The extracted data is stored as callAnalysis on the call record
  4. This data can be used in analytics charts, alert rules, and API responses

Analysis Fields

Define the fields you want to extract from each call. Each field has a type:
TypeDescriptionExample
TextFree-form text extractionSummary, key takeaways, caller concerns
SelectorOne value from a predefined listSentiment (positive, neutral, negative), Lead quality (hot, warm, cold)
NumberNumeric value extractionNPS score, satisfaction rating (1-5)
BooleanTrue/false determinationAppointment booked, issue resolved, callback requested
Field names cannot be changed after creation because they serve as identifiers in dashboards, filters, and alert rules. Choose descriptive, stable names.

Analysis Prompt

A custom prompt sent to the LLM along with the call transcript. Use it to give the LLM context about your business and instructions on how to analyze each field.

Minimum Duration

Calls shorter than this duration (in seconds) will not be analyzed. This avoids wasting LLM calls on hang-ups, wrong numbers, or very short interactions.

Advanced Tab

LLM Safety Features

FeatureDescription
GuardrailsEnables safety filters on LLM responses to prevent harmful, inappropriate, or off-topic content

Privacy

FeatureDescription
AnonymizationAutomatically anonymizes personal data (names, phone numbers, addresses) in stored conversations

Voice Mail Detection

Automatically detects when a call reaches a voicemail or IVR system and hangs up to avoid wasting resources.
SettingDescriptionDefault
EnabledToggle voicemail detection on/offOff
Initial detection delaySeconds to wait before first detection attempt5
Detection retry intervalSeconds between each detection retry5
Max detection retriesMaximum number of detection attempts before giving up60

Observability

Connect external observability platforms to trace and debug your agent’s LLM interactions:
ProviderFields
LangfusePublic Key, Secret Key, Base URL
LangsmithAPI URL, API Key
This sends all LLM calls (prompts, responses, tool calls) to your observability platform for detailed tracing and debugging.

Best Practices

Vague prompts like “be helpful” lead to inconsistent behavior. Describe exact scenarios: “When a caller asks about pricing, quote from the attached knowledge base. Never invent prices.”
A temperature of 0.3-0.5 gives reliable, consistent responses. Only use higher values for creative or exploratory use cases.
Define analysis fields like sentiment, issueResolved, and callbackRequested to automatically track call quality at scale.
After configuring your agent, call the assigned phone number yourself. Listen for unnatural pauses, incorrect responses, or missed instructions. Use the built-in chat or webcall features for faster iteration.
Begin with a minimal prompt and a single knowledge base. Add tools, analysis fields, and advanced settings once the basic flow works well.