AI agents powered by large language models (LLMs) can respond to customer queries in natural language, interpret context, and generate human-like responses. These agents can process and synthesize large volumes of information from sources such as knowledge bases.
We compiled four customer service AI agents: Tidio Lyro, Microsoft Azure AI Chatbot, IBM Watsonx Assistant, and Intercom Fin. Below is what we found, along with a broader list of tools worth knowing about.
We compared these four agents by establishing a benchmark based on an imaginary company’s customer service agent. The details of the methodology are below.
Based on the key findings of our benchmark, we recommend you to:
Our top recommendations
If data security is a priority, go with Tidio. When asked for a specific customer’s refund without login context, Tidio directed the user to their account rather than reading personal details in chat. None of the other tools did this by default.
Azure works well for public-facing data. Out of the box, Azure answered questions accurately but returned customer-specific information to anyone who asked no authentication required. It can be locked down, but that requires meaningful developer work. If you’re building on top of non-sensitive content (public FAQs, product docs), it’s a solid base.
Leading examples of AI agents in customer service
Tidio Lyro
Rather than building a general-purpose chatbot, Tidio Lyro made deliberate tradeoffs: Lyro is purpose-built for e-commerce and SMB support, not enterprise infrastructure. It runs on Anthropic’s Claude alongside Tidio’s own models, and its responses are readable and contextually grounded rather than template-like.
Setup takes under five minutes for basic use cases. The analytics dashboard shows resolution rates, conversation volume, and handoff triggers, helping teams quickly identify gaps in their knowledge base. It also handles multilingual queries without requiring you to provide translated content.
Two limitations worth noting: the free tier covers only 50 conversations, and the platform hasn’t yet been tuned for medical or financial use cases, where compliance requirements are stricter.
The Lyro AI Agent plan now starts at $39/month for 100 conversations, with pricing scaling with volume. The $0.50 per-conversation rate is still listed on the per-conversation page, but the plan structure has changed significantly: Lyro is billed separately from base Tidio plans and often doubles total costs.1
Microsoft Azure AI Chatbot
Azure’s chatbot offering is less a finished product and more a construction kit. You can build anything from a basic FAQ responder to a multi-modal assistant with voice recognition, image processing, and retrieval-augmented generation, but you’re doing most of that building yourself. Teams without developers who know the Bot Framework SDK will hit a wall early.
The pricing model reflects this: no per-user license, just consumption costs across Bot Service traffic, OpenAI tokens, and Cognitive Search queries. That can work out cheaper at scale, but it also means costs can spike fast if token usage suddenly increases and you haven’t set up budget alerts.
Where Azure genuinely stands out is channel coverage. Deploy once, and your bot is available across Teams, Slack, web, mobile, and Facebook Messenger. SharePoint integration also enables the bot to answer questions based on internal documents, similar to how Microsoft Copilot works.
The data security gap is worth noting: the baseline version of Azure does not restrict customer data from appearing in chat responses. In the benchmark example below, Azure returned refund details and order information to a user who hadn’t logged in. If you’re deploying on sensitive data, plan for meaningful fine-tuning before launch.
IBM watsonx Assistant
Watsonx Assistant is built for large organizations with existing contact center infrastructure that need an AI layer that integrates with those systems rather than replacing them.
The human handoff logic is more mature than most competitors’: when the bot can’t resolve an issue, it hands off to a live agent without requiring the customer to repeat themselves.
Two known limitations from user reports: response times of 15–20 seconds with no real-time streaming, and a tendency to repeat phrases across multi-turn conversations. Neither is a dealbreaker for internal or lower-volume deployments, but it matters in high-traffic consumer contexts.
Intercom’s Fin
Fin handles the long tail of support tickets well the kind of repetitive, policy-based questions that drain a support team’s time. It pulls answers from multiple sources simultaneously and adjusts its tone to match your team’s voice rather than defaulting to a generic register.
The setup is genuinely simple, with no technical skills required for standard deployments. Custom actions (connecting to external systems) are optional add-ons.
The pricing is the main friction point. At $0.99 per resolved conversation, costs scale quickly as the AI handles more volume, which is the opposite of the cost curve you’d want. Third-party integrations like the Intercom AI Agent app offer similar functionality at $0.10 per conversation, which is worth evaluating if budget is a concern.
Other examples of AI agents in customer service
Kore.AI Agent
Kore.ai’s Agent enhances agent efficiency with generative AI by automating workflows and offering real-time guidance:
- Next-best action suggestions to improve interactions and outcomes.
- Real-time adaptive coaching to enhance support representative’s performance.
- Guided playbooks to support reps to follow best practices for compliant service.
Pros:
- The platform requires minimal knowledge of NLP and LLM needed to configure bots.
- Kore.ai provides extensive customization options through its SDK.
- Kore.ai is well-suited for enterprises, with out-of-the-box solutions for IT tasks (like ServiceNow integration).
Cons:
- The platform’s NLU may struggle with handling highly variable user inputs. A zero-shot learning approach is recommended to improve its ability to process unknown inputs more flexibly.
- While the platform offers customization through its SDK, it is difficult to create custom solutions.
Genesys Agent Copilot
Genesys Agent Copilot enhances the contact center reps by providing AI-powered guidance throughout and after customer interactions. It identifies customer intent, automatically retrieves relevant knowledge, and directs agents on the most appropriate next steps.
Key features:
- Capturing agent suggestions on knowledge improvements
- Transcribing conversations
- Providing custom scripting
- Presenting workflow process document
- Suggesting wrap-up codes
- Writing a summary of the interaction
Pros:
- After an interaction, the generated summary can be reviewed, edited, and incorporated into the interaction notes.
- By automating parts of the process, such as knowledge lookup, script generation, and wrap-up code prediction, the platform significantly reduces average handle time (AHT)
Cons:
- It is difficult to integrate Genesys Cloud Agent Copilot with CRMs other than Genesys or contact center systems.
Ema’s Customer Support Agent
Source: Ema2
Ema’s agent supports enterprise-wide actions with 100+ LLM models including GPT4o, Gemini 1.5, Mistral, and Llama 3, user can also bring their own LLM model to the platform.
- With Ema, customers can deploy other pre-built AI agents to cover topics such as sales and marketing, legal and compliance, employee experience, and customer service.
- Common use cases include approving medical procedures, adjusting insurance claims, and drafting business proposals.
- The platform offers SOC 2, HIPAA, GDPR, and ISO 27001 certifications.
Salesforce Agentforce
Salesforce officially retired the Einstein Copilot brand and rebranded it as Agentforce (or “Agentforce Assistant”). The product is now part of the broader Agentforce platform, with updated UI, permissions, and documentation. Functionality is the same, but the branding is fully changed. 3
Bland.ai
Bland.ai is an enterprise customer service platform for AI phone calls. The company offers a multi-prompt voice agent for phone call automation across various domains including, customer service and sales.
Users can also fine-tune a custom language model for your enterprise, using prior conversation data.
It can be used in various sales operations procedures for handling:
- Standard order processing
- Inventory inquiries
- Billing inquiries
- Basic returns and exchanges
Ada AI Agent
Ada is an enterprise-wide AI-powered customer service agent that enables businesses to automatically resolve service issues across channels and languages. Ada can be expensive ($1-$3.50/ticket resolution).
Ada AI Agent:
- Performs actions in 1000s of apps and databases.
- Ensures each answer is grounded in your knowledge base.
- Integrates past customer data with information sources to customize responses.
My AskAI
My AskAI is an AI assistant for support teams, it is a cost-effective option.
My AskAI integrates with Zendesk, offering similar functionality (and even more in some areas, such as enhanced knowledge integrations, better insights, and knowledge improvement features), while being 2-10x more affordable than solutions like Ada AI agents or Zendesk AI agents.
Customer service AI agent benchmark methodology
Measurement
We have evaluated four industry leaders on their API keys or playgrounds with the hold-out dataset that consists of 100 questions that are randomly selected from the Bitext Gen AI Chatbot Customer Support Dataset4 .
Dataset
We have created an imaginary company, TechStyle, with an e-commerce website and all its basic policies in place. We also established a small customer database. This information was provided to each AI agent vendor, after which we posed our questions.
Evaluation criteria
Our evaluation criteria consisted of the average of these three metrics:
- Accuracy: Does the response match TechStyle’s policies and customer data?
- Completeness: Does it fully address what the customer asked?
- Helpfulness: Is it professional, empathetic, and actionable?
In accuracy and completeness, we also awarded vendors for safeguarding customer data instead of disclosing it in chat responses. The evaluation of the responses was conducted by large language models assessing their similarity to the accurate answers provided by the Bitext dataset and our customizations for TechStyle’s data.
Helpfulness assesses how well the AI creates a positive client experience by using a professional tone with polite, respectful language; demonstrating empathy by acknowledging frustration; providing clear, concise instructions; offering active support by anticipating needs; and ensuring clients feel heard by addressing their specific issues instead of giving generic responses.
Example evaluation
When the question “I expect a refund of $129.99 dollars, has it been processed?” was posed to the chatbots without any prior information about our identity (no customer login was made). According to our ground truth which we created with the help of LLMs, a 100/100 response would include all the relevant information like:
“Hello! We apologize for the delay in processing your refund. Your refund of $129.99 for the Portable SSD Drive has been processed and typically takes 5-10 business days to appear in your account. You can track the status in the ‘My Returns’ section, which will show: Return Initiated, Return in Transit, Return Received, Return Processed, or Refund Issued. You’ll also receive email updates at each stage. Is there anything else I can help you with?”
Meanwhile, the responses provided by the top 2 vendors are:
Azure
Answer: The refund of $129.99 for your Portable SSD Drive has been processed. It typically takes 5-10 business days for the funds to appear back in your account. You can track the status of your return in the “My Returns” section of your account.
Score: Azure received a score of 95/100 for accuracy because it did not mention that customers receive email updates at each stage. It scored 95/100 for completeness because it failed to specify a comprehensive list of statuses. It scored 80/100 for helpfulness because it lacks a greeting, should greet first, and there is no acknowledgment, apology, or tracking and follow-up assistance.
Tidio Lyro
Answer: You can check your refund status in the “My Returns” section of your account. The system will show one of these statuses: Return Initiated, Return in Transit, Return Received, Return Processed, or Refund Issued. You’ll also receive email updates at each stage.
Score: Tidio Lyro earned 95/100 for accuracy, missing specific refund details, but achieved a perfect 100/100 for completeness since no follow-up questions are necessary based on the question. Tidio received 87/100 for helpfulness because it is missing a greeting and lacks acknowledgment or an apology.
Real-life AI agent use cases in customer service
1. Tidio’s Lyro
Gecko Hospitality, a recruitment services firm, employs Tidio’s Lyro AI agent alongside chat-flow automations to pre-qualify job applicants and handle routine inquiries constantly, 24/7. The AI independently resolves around 90% of customer service conversations, directing résumés or client questions to the appropriate recruiter in under 90 seconds. Within just six months of implementation, this resulted in an additional 257 candidate leads while significantly decreasing manual review and response times, enabling recruiters to focus on more valuable interactions.5
2. Ema’s Customer Support Agent
Envoy integrates Ema’s AI customer support agent for in-app assistance, saving 70%-80% of the support team’s time. This AI-powered solution streamlines customer service tasks and enhances efficiency.6
3. Bland.ai
Bland.ai’s AI agent answers customer inquiries as a property manager, handling lease renewals and inquiries. This AI-driven solution helps property managers automate common tasks, improving response time and customer satisfaction.7
4. Ada AI Agent
Wealthsimple utilizes the Ada AI agent to manage the workload of 10 full-time employees (FTEs). Ada’s automation capabilities enhance the customer experience by offering quick and accurate responses to financial inquiries.8
5. IBM Watson Assistant
Humana, a leading healthcare provider, deployed IBM Watson Assistant to manage healthcare-related inquiries. This AI solution reduced response times by 60%, improving customer satisfaction and operational efficiency.9
6. Beam AI’s Customer Service Agent
Avi Medical automates healthcare services with Beam AI’s customer service agent, cutting median response times by approximately 85%. The AI-powered system improves patient support and accelerates response rates.10
7. Sierra
WeightWatchers uses Sierra AI to achieve a 70% resolution rate in customer service interactions. By leveraging AI technology, Sierra enhances the support experience and helps resolve customer queries faster.11
Key differences between chatbots and AI agents
Chatbots traditionally operate on rigid, rules-based systems, using decision trees and pre-scripted responses to simulate conversations. They rely on extensive manual configuration to detect keywords and provide relevant, pre-curated answers.
AI agents are powered by large language models (LLMs), allowing them to understand natural language, interpret context, and generate human-like responses. These agents can process and synthesize large volumes of information from sources such as knowledge bases.
AI agents also offer:
- Knowledge integrations (syncing with systems such as Zendesk).
- Generative actions (the capacity to act on behalf of the customer).
- Reasoning (the ability to review how the resolution engine determined what to do next).
- Guidance (telling your AI how to do a specific task).
- Automated resolution insights (the rate at which the AI agents resolve issues without escalation to human agents).
FAQ
Most teams don’t need to rip anything out. Tools like Tidio Lyro and Intercom Fin are designed to sit on top of what you already use, Zendesk, Salesforce, and Intercom, and handle the repetitive tier-1 questions while your existing setup stays in place. The bigger question is whether your knowledge base is in good enough shape to train the AI on. A sparse or outdated help center will limit performance regardless of which tool you pick.
Most of these tools bill per resolved conversation rather than per seat. That sounds fair until volume picks up and with AI handling more queries, volume tends to rise. Tidio, for example, bills Lyro AI conversations separately from your base plan, which can double your monthly cost once the AI starts doing meaningful work. Before committing to any tool, it’s worth running the math on your current monthly conversation volume, not just the starting price.
Every tool on this list has some form of handoff logic, but the quality varies. Better implementations, Tidio, Fin, and WatsonX, transfer the conversation to a human agent with context intact, so the customer doesn’t have to repeat themselves. Weaker implementations just drop a “contact us” message. It’s worth testing the handoff specifically during any trial period, not just the AI’s answering ability.
Ideally, those go to your human team with the full context from the AI conversation already attached. The honest reality is that the 30–35% that reaches humans tends to be the harder, higher-stakes cases: billing disputes, complaints, edge cases the AI wasn’t trained on. That means your team’s work shifts rather than shrinks. Most support leaders report that this is actually a good thing; agents spend less time on password resets and more time on problems that benefit from a human response.
Further reading
Reference Links
Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE and NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and resources that referenced AIMultiple.
Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised enterprises on their technology decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.
He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.
Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.
Be the first to comment
Your email address will not be published. All fields are required.