Services
Contact Us
Ekrem Sarı

Ekrem Sarı

AI Researcher
31 Articles
Stay up-to-date on B2B Tech

Ekrem is an AI Researcher at AIMultiple, focusing on intelligent automation, GPUs, AI Agents, and LLMOps for RAG frameworks.

Professional Experience

During his tenure as an Assessor at Yandex, he evaluated search results using proprietary frameworks and automated protocols. He implemented QA testing through data annotation, relevance scoring, and user intent mapping across 10,000+ queries monthly, while conducting technical assessments, including performance monitoring and spam detection using ML feedback loops.

Research Interest

At AIMultiple, his research is centered on the MLOps lifecycle and the performance and benchmarking of end-to-end AI systems. He contributes to a wide range of projects, including Retrieval-Augmented Generation (RAG) optimization, extensive Large Language Model (LLM) benchmarking, and the design of agentic AI frameworks. Ekrem specializes in developing data-driven methodologies to measure and improve AI technology performance across critical operational metrics like accuracy, efficiency, API cost, and scalability.

His analysis covers the entire technology stack, from foundational components like embedding models and vector databases to the high-performance GPU and cloud infrastructure required for deploying AI agents.

Education

Ekrem holds a bachelor's degree from Hacettepe Üniversitesi and a master's degree from Başkent Üniversitesi.

Latest Articles from Ekrem

Enterprise SoftwareJun 13

Email Archiving Software Benchmark

We provisioned a Microsoft 365 tenant, populated it with a 10,000-mail synthetic corpus and 1,700 attachments across 8 file-type subtypes, then benchmarked NinjaOne SaaS Archiver, Barracuda Cloud Archiving Service, Acronis Cyber Protect Cloud Email Archiving, and MailPiler on the same tenant against 10 dimensions covering ingestion, search, attachment recall, export, immutability, legal hold, audit, encryption,

AIJun 11

Text-to-SQL: Comparison of LLM Accuracy

I have relied on SQL for data analysis for 18 years, beginning in my days as a consultant. Translating natural-language questions into SQL makes data more accessible, allowing anyone, even those without technical skills, to work directly with databases.

AIJun 10

Top 20+ Agentic RAG  Frameworks

Agentic RAG enhances traditional RAG by boosting LLM performance and enabling greater specialization. We conducted a benchmark to assess its performance on routing between multiple databases and generating queries. Explore agentic RAG frameworks and libraries, key differences from standard RAG, benefits, and challenges to unlock their full potential.

AIJun 10

Benchmark of 40+ LLMs in Finance: Claude Fable 5 & GPT-5

We evaluated 40+ LLMs in finance on 238 hard questions from the FinanceReasoning benchmark to identify which models excel at complex financial reasoning tasks like statement analysis, forecasting, and ratio calculations. LLM finance benchmark overview We evaluated LLMs on 238 hard questions from the FinanceReasoning benchmark (Tang et al.).

CybersecurityJun 10

DLP Software Benchmark

We benchmarked Acronis DeviceLock DLP and ManageEngine DLP Plus on identical Windows Server 2022 VMs with 28 scenarios: 23 data leak tests (including 12 adversarial evasion files), 3 agent security tests, and 2 tests under high CPU and memory consumption.

AIJun 3

RAG Observability Tools Benchmark

We benchmarked four RAG observability platforms on a 7-node LangGraph pipeline across three practical dimensions: latency overhead, integration effort, and platform trade-offs. Latency overhead metrics Metrics explained: Mean is the average latency across 150 measured graph.invoke() calls. LLM-judge evaluations run after the timer stops. Median is the 50th percentile latency.

AIJun 3

RAG Frameworks: LangChain vs LangGraph vs LlamaIndex

We benchmarked 5 RAG frameworks: LangChain, LangGraph, LlamaIndex, Haystack, and DSPy, by building the same agentic RAG workflow with standardized components: identical models (GPT-4.1-mini), embeddings (BGE-small), retriever (Qdrant), and tools (Tavily web search). This isolates each framework’s true overhead and token efficiency.

Agentic AIMay 25

Agentic Search in 2026: Benchmark 8 Search APIs for Agents

Agentic search plays a crucial role in bridging the gap between traditional search engines and AI search capabilities. Search APIs are the first layer of an agentic tool, where performance caps the quality of everything downstream.

CybersecurityMay 22

Backup software benchmark: Acronis vs NinjaOne vs Comet vs MSP360

We benchmarked Acronis Cyber Protect Cloud Backup, Comet Backup, MSP360 Managed Backup, and NinjaOne Backup on identical AWS infrastructure. Each vendor ran a file-mode backup of the same 625,946-file / 50 GB workload and a full image backup of the system disk, then restored the 15 GB medium subdirectory.

AIMay 20

Cloud GPU Rental Price Index

On-demand rates for the newest-generation cloud GPUs (B200, B300, MI300X, RTX 5090) roughly doubled over the past year, while mainstream cards (H100, H200, A100) held a tight band. We compile the GPU index monthly from 58 providers and 17 GPU models, covering on-demand, spot, and 1-year reserved tiers.