AIMultipleAIMultiple
No results found.

Agentic AI Benchmarks: Proprietary- Open Source AI Agents & Performance

Agentic AI includes agents that execute complex tasks with minimal human supervision. We evaluated the most popular AI agents, open-source AI agent frameworks, customer service AI agents, and the performance of popular LLMs as AI agents.

AI Agents Benchmark Results

We tested leading AI agents across a benchmark that has actual workflow automation needs, including navigating complex interfaces, making precise edits, and completing multi-step processes.

Customer Service AI Agents

We evaluated four industry leaders on their API keys or playgrounds using a hold-out dataset of 100 questions randomly selected from Bitext Gen AI Chatbot Customer Support Dataset. We created an imaginary company, TechStyle, an e-commerce site with standard policies, and established a small customer database. This info was shared with each AI vendor before we posed our questions.

AI Agent Performance Benchmark

Our benchmark includes five tasks of increasing difficulty and complexity designed for a human to test success rates with business-specific tasks. The goal of the benchmark is to evaluate document processing by AI agents. We used eighteen different large language models as AI agents.

Open-source web agents: WebVoyager accuracy benchmark

WebVoyager benchmark evaluates web agents on 15 real-world websites, including Google, GitHub, and Wikipedia. It includes tasks like searching, clicking, navigating, and submitting forms across 643 task instances. Accuracy is measured by successful completion, compared to standard outputs.

Explore Agentic AI Benchmarks: Proprietary- Open Source AI Agents & Performance

Authorization for AI Agents: Permit.io, Descope & more

Agentic AISep 26

I have been exploring agent identity and the authentication/authorization platforms that could support it, while also examining how standards like OAuth 2.0 and frameworks such as Keycloak might apply.  Below, I listed the best AI agent–specific platforms and features, categorized by their primary focus.

Read More
Agentic AISep 25

How We Moved from LLM Scorers to Agentic Evals?

Evaluating LLM applications primarily focuses on testing an application end-to-end to ensure it performs consistently and reliably. We previously covered traditional text-based LLM evaluation methods like BLEU or ROUGE. Those classical reference-based NLP metrics are useful for tasks such as translation or summarization, where the goal is simply to match a reference output.

Agentic AISep 22

40+ Agentic AI Use Cases with Real-life Examples

Autonomous generative AI agents execute complex tasks with little or no human supervision. Agentic AI differs from chatbots and co-pilots. Unlike traditional AI, particularly generative AI, which often requires human intervention in complex workflows, agentic AI can autonomously navigate and optimize processes thanks to its decision-making capabilities and goal-directed behavior.

Agentic AISep 17

Agentic AI Architecture for Industrial Systems

Agentic AI allows natural language interaction with industrial systems, enabling users to query data and receive actionable insights. We will outline a reference architecture designed for industrial environments, describe how task-specific agents and tools can be orchestrated. We will also explore current state of natural language interfaces (NLIs) in industrial systems.

Agentic AISep 15

Top 8 Agentic CRM Platforms

Agentic CRM platforms are customer relationship management tools that can plan, execute, and adjust workflows autonomously without constant human supervision. Here are the main agentic CRM platforms, ranked by their actual capabilities, market presence, and real-world results.

Agentic AISep 9

The 7 Layers of Agentic AI Stack

The rise of agentic AI has introduced a technology stack that extends well beyond simple calls to foundation-model APIs. Unlike traditional software stacks, where value often concentrates at the application tier, the agentic AI stack distributes value more unevenly. Some layers offer strong opportunities for differentiation and moat building, while others are rapidly becoming commoditized.

Agentic AISep 9

Agentic Mesh: The Future of Scalable AI Collaboration

While much has been written about agent architectures, real-world production-grade implementations remain limited. Building on my earlier post about A2A fundamentals, this piece highlights the agentic AI mesh, a concept introduced in a recent McKinsey.

Agentic AISep 5

AI Identities: The Role of Agentic Systems in Governance

Agentic AI systems are rapidly emerging in enterprise environments. To govern them safely, each agent needs to be recognized as a first-class identity with its own credentials, permissions, and audit trail.

Agentic AISep 5

Top 10+ Agentic Orchestration Frameworks & Tools

79% of executives are already adopting AI agents, although 19% of firms struggle with coordination. They cannot manage agents across different applications. Agentic orchestration offers the solution. Explore agentic orchestration is, its patterns, and the top frameworks that enable multi-agent collaboration.

Agentic AIAug 23

Optimizing Agentic Coding: How I use Claude Code

AI coding tools have become indispensable for many tasks. In our tests, popular AI coding tools like Cursor have been responsible for generating over 70% of the code required for tasks. With AI agents still being relatively new, I observed some useful patterns in my workflow that I want to share.

Agentic AIAug 21

4 Agentic AI Design Patterns & Real-World Examples

Agentic AI design patterns enhance the autonomy of large language models (LLMs) like Llama, Claude, or GPT by leveraging tool-use, decision-making, and problem-solving. This brings a structured approach for creating and managing autonomous agents in several use cases.

FAQ