AI Agents
AI agents are software systems that use reasoning, planning, and tools to assist or automate complex tasks. We compare the top open-source and commercial agents.
15 Threats to the Security of AI Agents
Even a few years ago, the unpredictability of large language models (LLMs) would have posed serious challenges. One notable early case involved ChatGPT’s search tool: researchers found that webpages designed with hidden instructions (e.g., embedded prompt-injection text) could reliably cause the tool to produce biased, misleading outputs, despite the presence of contrary information.
Agentic AI for Cybersecurity: Use Cases & Examples
Agentic AI refers to AI systems that combine models like large language models (LLMs) with automated workflows, tool integration, and decision support. These systems assist security teams in SecOps and AppSec by analyzing alerts, automating routine tasks, and supporting investigative work. Agentic AI tools generally operate under human oversight.
Local AI Agents: Goose, Observer AI, AnythingLLM
Local AI agents are often described as offline, on-device, or fully local. We spent three days mapping the ecosystem of local AI agents that run autonomously on personal hardware without depending on external APIs or cloud services.
Best 7 AI Test Agents for QA
We evaluated AI testing platforms embedded with AI agents; most were overhyped Selenium/Playwright with marketing. A few were capable of writing/maintaining test cases or visual testing, though even these tools still have notable limitations. From these, we selected 7 platforms and categorized them by their primary focus areas.
Mobile AI Agents Tested Across 65 Real-World Tasks
We spent 3 days benchmarking four mobile AI agents (DroidRun, Mobile-Agent, AutoDroid, and AppAgent) across 65 real-world tasks using an Android emulator with applications such as calendar management, contact creation, photo capture, audio recording, and file operations.
AI Agents: Operator vs Browser Use vs Project Mariner
AI agents are increasingly marketed as end-to-end digital workers, but real-world performance can vary widely depending on the task, tools, and execution environment. To understand what these systems can genuinely deliver today, we conducted hands-on benchmarking across practical business scenarios.