We spent the last quarter testing AI agents across coding, customer service, sales, research, and business workflows. Not reading vendor marketing, actually using these tools daily to see what delivers and what’s hype.
Despite talk about “autonomous AI,” most tools today are co-pilots, not autopilots. They handle research and automate repetitive tasks, but still require human decision-making.
Examples of popular agentic-style platforms and tools
- Tidio’s Lyro: SMB-centric agentic live chat
- Creatio: Enterprise workflow automation
- Cursor: AI code editing
- Otter.ai: AI note-taking
- OpenAI Frontier: Enterprise agent management and orchestration
- Kiro (AWS): Spec-driven agentic IDE and autonomous coding agent
- Averi: AI marketing content creation
- Make (Celonis): Scalable low-code automation
- Kompas AI: Deep research and report generation
- LangGraph: Production-grade complex agentic workflow generation
- Beam AI: Document-heavy workflows
- Relevance AI: Embedded analytics + decision flows
- IBM watsonx Orchestrate: Enterprise-grade orchestration
What Is an AI Agent?
An AI agent loops. That’s the core difference from a chatbot.
Source: GitHub1
However, there is no strict definition of what an “Agent” can be; it can be defined in several ways:
- Traditional AI defines agents as: Systems that interact with their environment.
- Some analytics firms define agents as: Fully autonomous systems that operate independently over extended periods, using tools such as functions or APIs to engage with their surroundings and make decisions based on context and goals.2
- Others use the term to describe as: More prescriptive implementations that follow predefined workflows.3
Here are the factors that cause an AI system to be considered more agentic:
Here is a real-world example and conversation of an open source software agent managing deployments at Humanlayer:4
Source: GitHub 5
Capabilities of agentic AI systems
Adapted from: Cobus Greyling6
Read more: Enterprise AI agents, AI agent builders, large action models (LAMs), and agentic AI in cybersecurity.
Coding Agents
Cursor remains the most widely adopted among individual developers. It’s the baseline everyone compares against. In 2025-2026 Reddit threads, even people who prefer other tools mention Cursor as their reference point.
- Smooth IDE integration (feels like native VSCode)
- Fast context switching between files
- “Flow” prioritized over raw intelligence
- Cursor launched Composer 1.5, a proprietary agentic model with adaptive thinking scaled to task complexity, running at approximately 2x the speed of Claude Sonnet 4.5.
- The 2026 release also added parallel subagents for discrete subtasks, BugBot for automated PR-level code review,7
- Cursor Blame (Enterprise) for per-line AI attribution and image generation within the agent. Salesforce reported 30%+ velocity gains after deploying Cursor across 20,000 developers.8
Where it struggles:
- Cursor restructured its billing, moving from request-based to a credit-based model with two separate usage pools: Auto + Composer (higher limits) and API usage. Plans now run from $20/month (Pro), $60/month (Pro+, 3x usage), and $200/month (Ultra). Cost management has become more complex, not simpler, particularly for teams running heavy multi-file agent workflows.
- Less capable than Claude for architectural reasoning
- Can hallucinate on complex codebases
Claude Code crossed $500M in annualized run-rate revenue as of September 2025, approximately four months after full launch, making it one of the fastest-growing developer tools Anthropic has shipped. Enterprises represent 80% of Anthropic’s overall business. 9
In January 2026, Anthropic launched Claude Cowork, a macOS desktop agent built on Claude Code’s foundations, designed for non-technical users. It uses folder-permission access, allowing Claude to read, write, and execute multi-step file tasks without command-line knowledge. Notably, Claude Code wrote the entire Cowork application in approximately 1.5 weeks via autonomous coding, a widely cited proof point for agentic software development.
On January 30, 2026, Anthropic added a plugin system to Cowork, enabling department-level automation via custom MCP integrations, sub-agents, and slash commands.10
Anthropic also launched interactive apps directly inside the Claude chat interface, including Slack, Canva, Figma, Box, and Clay, enabling Claude to take actions inside these platforms without leaving the conversation.11
GitHub Copilot underwent a major expansion in 2026, shifting from a code-suggestion tool to a full multi-agent development environment. The January 14 CLI update introduced four specialized parallel agents:
- Explore (fast codebase Q&A without cluttering main context),
- Task (automated test and build execution with smart output summarization),
- Code-review (surfacing logic and security issues, not style preferences). These agents run concurrently, compressing what previously required sequential handoffs into parallel execution.12
Emerging tools generating real discussion:
Kiro (AWS): Launched in preview in July 2025, Kiro is a spec-driven agentic IDE that converts natural language prompts into structured requirements, technical design documents, and sequenced implementation tasks. At AWS re: Invent in December 2025, Amazon unveiled an expanded Kiro autonomous agent capable of working independently for days with persistent cross-session context, supported by two companion agents: AWS Security Agent (identifies vulnerabilities as code is written) and a DevOps Agent (performance testing and compatibility checking before code goes live).13
- In January 2026, Amazon mandated internal adoption of Kiro over Claude Code, with approximately 70% of its software engineers having used Kiro at least once. However, approximately 1,500 Amazon engineers signed an internal forum post supporting Claude Code, citing Kiro’s performance shortfalls as a productivity impediment. This created a visible conflict: AWS sales engineers who sell Claude Code via Amazon Bedrock cannot officially use it in their own production work.14
Business Workflow Agents
OpenAI Frontier: Enterprise Agent Management
OpenAI launched Frontier in 2026 as an open, end-to-end platform for enterprises to build, deploy, and manage AI agents across models from any vendor.
HP, Intuit, Oracle, State Farm, Thermo Fisher, and Uber are among the first adopters. Frontier is OpenAI’s direct answer to IBM watsonx Orchestrate, Relevance AI, and Salesforce Agentforce in enterprise agent orchestration.
Concurrently, OpenAI deprecated its Swarm framework and launched a unified, provider-agnostic Agents SDK supporting 100+ LLMs, signaling a consolidation from experimental tooling toward production-grade infrastructure.15
Key capabilities:
- Defined agent identity with explicit permissions and role-based guardrails for regulated environments.
- Built-in quality evaluation and feedback loops to help agents improve over time
- A shared business context layer connecting data warehouses, CRMs, and internal apps so agents understand enterprise-specific workflows
- A runtime deployable on-premise, on enterprise cloud, or OpenAI-hosted16
IBM Watsonx Orchestrate targets enterprise-grade orchestration with governance and security built in. Designed for regulated industries where audit trails and compliance matter.
Comes with enterprise overhead:
- Longer implementation timelines
- Higher cost
- Requires IBM ecosystem buy-in
Relevance AI combines embedded analytics with decision flows. Succeeds by deeply integrating with common enterprise platforms (Salesforce, Slack, Notion, Google Analytics).
Customer Service Agents
Tidio’s Lyro focuses on SMB live chat with agentic capabilities.
Real performance from users:
- Handles 70-80% of common questions without human intervention
- Gets better with feedback over the first few months
- Falls apart on nuanced questions requiring empathy
Not good for: Complex customer situations requiring judgment calls.
Salesforce Agentforce has emerged as an enterprise-grade customer service agent platform, reaching $500M+ in annual recurring revenue with 330% year-over-year growth. In a production deployment at UCSF Health, Agentforce Voice achieved 88% task coverage using simulation-based training, significantly above the 60-70% typical of traditional approaches. 17
The broader pattern holds across platforms: customer service agents consistently perform well on high-volume, repetitive inquiries but struggle with tasks that require judgment, empathy, or multi-party context.
Research and Analysis
Kompas AI specializes in deep research and report generation.
What makes it different:
- Actually reads and synthesizes academic papers
- Maintains citations properly
- Continuous monitoring for new publications
- Integrates with arXiv, PubMed, SSRN
Trade-off:
- Slower than general-purpose AI
- Optimizes for accuracy over speed
- More expensive per query
Beam AI handles document-heavy workflows.
Otter.ai remains solid for meeting notes but hasn’t evolved much beyond transcription + basic summarization.
Use cases of AI agents
AI agents are used across many roles and industries. Below, I’ve listed some of the most common ways AI agents are being put to work:
- Developers
- SecOps assistants
- Human-like gaming characters
- Content creators
- Insurance assistants
- Human resources (HR) assistants
- Customer service assistants
- Research assistants
- Computer users
- AI agent builders
Note that some of these are agentic use cases, as Agentic AI encompasses and extends traditional AI agents by adding autonomy, memory, reasoning, and goal-directed behavior.
What Differentiates Actually Useful Agents
1. Autonomy vs. Control Trade-off
The biggest decision: How much independence do you actually want?
Co-pilot agents (Cursor, Otter, most business tools) maintain human oversight at key decisions. They handle research and execution but require approval before critical actions.
Strategic automation (n8n, Make) follows predefined workflows with minimal real-time decision-making. Predictable and reliable but can’t adapt when encountering unexpected scenarios.
Rule-based systems respond to triggers without contextual understanding. Not really “agentic” but valuable for straightforward automation.
Most companies in 2026 use Level 2-3 agents. Full autonomy (Level 4) creates more problems than it solves unless you’ve built extensive guardrails.
2. Specialized vs. General-Purpose
Specialized agents embed deep domain knowledge. They understand industry workflows, terminology, and compliance requirements.
Higher success rates within their domain. Completely unsuitable for adjacent use cases.
Horizontal platforms (LangGraph, watsonx Orchestrate, Relevance AI) provide flexible frameworks for building custom agents. They sacrifice domain optimization for versatility.
LangGraph focuses on the production-grade generation of multi-agent workflows. Powerful for developers building complex systems, but requires technical expertise.
Relevance AI targets business users with pre-built templates and easier configuration.
Research agents (Kompas AI) optimize for accuracy and thoroughness over speed. Slower but more reliable for knowledge work.
3. Integration Depth
Anthropic donated MCP to the Linux Foundation’s Agentic AI Foundation, making it a vendor-neutral open standard under the same independent governance model as Kubernetes and Node.js. MCP now has 10,000+ published servers and 97 million monthly SDK downloads, with first-class support across Claude, Cursor, GitHub Copilot, Gemini, VS Code, and ChatGPT.
Native platform integrations distinguish business-focused agents. Beam AI (documents), Relevance AI (analytics) succeed by deeply integrating with Salesforce, Slack, Notion, Google Analytics.
Value comes less from AI capabilities, more from seamless data flow.
API-first architectures (n8n, Make) enable custom integrations but require technical expertise. Support hundreds of pre-built connectors while allowing custom nodes.
Standalone tools (coding agents, cybersecurity agents) optimize for specific technical ecosystems rather than broad compatibility.
4. Security and Compliance
Production deployment requirements create major architectural differences.
Enterprise-grade agents (IBM WatsonX, healthcare agents) prioritize:
- Security certifications (SOC 2, ISO 27001)
- Audit trails
- Compliance frameworks (GDPR, HIPAA)
- Role-based access control
- Data encryption
- Governance workflows
Infrastructure overhead increases costs but enables deployment in regulated industries.
Developer-centric tools (LangGraph, coding agents) focus on debugging, logging, and integration with version control systems. Serve technical users who implement their own security.
Consumer-focused tools often lack enterprise compliance features entirely.
The Governance Problem Nobody Solved Yet
Governance tooling is beginning to catch up. Several concrete solutions shipped:
- Cisco AI Agent Monitor for Splunk Observability Cloud real-time tracking of agent workflow quality, cost per run, and behavioral anomalies, entering public testing. 18
- OpenAI Frontier each agent is assigned a defined identity with explicit permissions, audit trails, and guardrails, modeled on how companies manage human employee access19
- Agentic AI Foundation (AAIF), OpenAI, Anthropic, and Block co-founded a Linux Foundation-backed consortium in December 2025 to establish open, vendor-neutral governance standards for agentic AI. AWS, Google, Microsoft, Bloomberg, and Cloudflare joined as Platinum members. Anthropic donated MCP to the foundation, ensuring it remains an open industry standard rather than a proprietary protocol20
What Works, What Doesn’t (Real Examples)
What Actually Works Today
Coding assistance at Level 3: Cursor + Claude Code combination used by thousands of developers. Cursor for flow and rapid iteration, Claude for hard problems.
Typical workflow:
- Use Cursor for 80% of coding (feature implementation, refactoring)
- When stuck, escalate to Claude Code for architectural reasoning
- Let agent run tests, iterate on failures
- Human reviews final output before merge
Sales outreach automation: AI agents qualify leads, book meetings, and send follow-ups. Companies report 2-3x increase in sales team productivity.
Klarna deployed sales agents handling initial outreach and qualification. Human reps focus on complex deals and relationship building.
Customer service for common questions: Agents handling 70-80% of routine inquiries during off-hours. Customer satisfaction scores improved because responses are instant instead of “we’ll get back to you tomorrow.”
Research synthesis: Academic researchers using agents to scan new papers, extract relevant sections, maintain citation databases. Saves hours of manual literature review.
What Doesn’t Work Yet
Fully autonomous deployment: Level 4 agents deploying code to production without human approval. Too risky for most companies. Even with extensive testing, edge cases cause problems.
Exception: Simple, well-bounded systems where failures are recoverable.
Complex customer situations: Agents fall apart when empathy, judgment, or nuanced understanding is required. “I understand you’re frustrated” from an agent feels hollow.
Multi-stakeholder decision-making: Agents can’t navigate office politics, understand unspoken context, or read between lines in business negotiations.
Creative strategy: Agents can execute tactics but don’t develop novel strategic approaches. They optimize within given parameters but don’t question the parameters themselves.
The Cost Reality
Everyone talks about agent capabilities. Few discuss economics.
Direct costs:
- Model API calls: $0.003-0.10 per 1K tokens (varies by model)
- Tool execution: APIs, data sources, integrations
- Infrastructure: Hosting, compute for self-hosted systems
Hidden costs:
- Context window usage accumulates fast with multi-turn conversations
- Failed execution attempts (agent tries, fails, retries you pay for each attempt)
- Debugging and refinement time
- Governance and security infrastructure
- Training team to work effectively with agents
Leading organizations treat agent cost optimization as first-class architectural concern. They build economic models into agent design rather than retrofitting cost controls after deployment.
Example optimization strategies:
- Route simple queries to smaller, cheaper models
- Use prompt caching aggressively (90% cost reduction for repeated context)
- Implement circuit breakers to stop runaway agents
- Monitor token usage per task, optimize prompts
- Batch requests when latency isn’t critical
If you are looking into the infrastructure that powers web-capable agentic AI, here are our latest benchmarks:
- Remote browsers: How browser infrastructure enables agents to interact with the web securely.
- Browser MCP benchmark: Top MCP servers for tool use and web access.
A structural shift is also underway in how vendors price agentic tools. Cursor’s move to a dual-pool credit system, and Anthropic’s bundling of Claude Code into Team plan seats, both reflect the market normalizing agentic AI as a line-item infrastructure cost rather than a per-query expense. Leading engineering organizations now model token spend at the workflow level, not per individual prompt.21
Further reading
Reference Links
Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE and NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and resources that referenced AIMultiple.
Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised enterprises on their technology decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.
He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.
Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.
Be the first to comment
Your email address will not be published. All fields are required.