Ekrem Sarı
Ekrem is an AI Researcher at AIMultiple, focusing on intelligent automation, GPUs, AI Agents, and LLMOps for RAG frameworks.
Professional Experience
During his tenure as an Assessor at Yandex, he evaluated search results using proprietary frameworks and automated protocols. He implemented QA testing through data annotation, relevance scoring, and user intent mapping across 10,000+ queries monthly, while conducting technical assessments, including performance monitoring and spam detection using ML feedback loops.Research Interest
At AIMultiple, his research is centered on the MLOps lifecycle and the performance and benchmarking of end-to-end AI systems. He contributes to a wide range of projects, including Retrieval-Augmented Generation (RAG) optimization, extensive Large Language Model (LLM) benchmarking, and the design of agentic AI frameworks. Ekrem specializes in developing data-driven methodologies to measure and improve AI technology performance across critical operational metrics like accuracy, efficiency, API cost, and scalability.His analysis covers the entire technology stack, from foundational components like embedding models and vector databases to the high-performance GPU and cloud infrastructure required for deploying AI agents.
Education
Ekrem holds a bachelor's degree from Hacettepe Üniversitesi and a master's degree from Başkent Üniversitesi.Latest Articles from Ekrem
Agentic Search in 2026: Benchmark 8 Search APIs for Agents
Agentic search plays a crucial role in bridging the gap between traditional search engines and AI search capabilities. Search APIs are the first layer of an agentic tool, where performance caps the quality of everything downstream.
Benchmark of 40+ LLMs in Finance: Gemini 3.5 Flash, Claude Opus 4.7 & Grok 4.3
We evaluated 40+ LLMs in finance on 238 hard questions from the FinanceReasoning benchmark to identify which models excel at complex financial reasoning tasks like statement analysis, forecasting, and ratio calculations. LLM finance benchmark overview We evaluated LLMs on 238 hard questions from the FinanceReasoning benchmark (Tang et al.).
Backup software benchmark: Acronis vs NinjaOne vs Comet vs MSP360
We benchmarked Acronis Cyber Protect Cloud Backup, Comet Backup, MSP360 Managed Backup, and NinjaOne Backup on identical AWS infrastructure. Each vendor ran a file-mode backup of the same 625,946-file / 50 GB workload and a full image backup of the system disk, then restored the 15 GB medium subdirectory.
Cloud GPU Rental Price Index
On-demand rates for the newest-generation cloud GPUs (B200, B300, MI300X, RTX 5090) roughly doubled over the past year, while mainstream cards (H100, H200, A100) held a tight band. We compile the GPU index monthly from 58 providers and 17 GPU models, covering on-demand, spot, and 1-year reserved tiers.
Multimodal Embedding Models: Apple vs Meta vs OpenAI
Multimodal embedding models excel at identifying objects but struggle with relationships. Current models struggle to distinguish “phone on a map” from “map on a phone.” We benchmarked 7 leading models across MS-COCO and Winoground to measure this specific limitation. To ensure a fair comparison, we evaluated every model under identical conditions using NVIDIA A40 hardware and bfloat16 precision.
Top 20+ Agentic RAG Frameworks
Agentic RAG enhances traditional RAG by boosting LLM performance and enabling greater specialization. We conducted a benchmark to assess its performance on routing between multiple databases and generating queries. Explore agentic RAG frameworks and libraries, key differences from standard RAG, benefits, and challenges to unlock their full potential.
Cloud GPU Pricing, Performance & Provider Comparison
Cloud GPU list prices for the same model can differ several times over from one provider to another. We curated the lowest rate, provider, market range, and median for 40+ GPU configurations across all three pricing tiers, plus a throughput-per-dollar benchmark on 10 models.
Reranker Benchmark: Top 8 Models Compared
We benchmarked 8 reranker models on ~145k English Amazon reviews to measure how much a reranking stage improves dense retrieval. We retrieved top-100 candidates with multilingual-e5-base, reranked them with each model, and evaluated the top-10 results against 300 queries, each referencing concrete details from its source review. The best reranker lifted Hit@1 from 62.
Hybrid RAG: Boosting RAG Accuracy
Dense vector search is excellent at capturing semantic intent, but it often struggles with queries that demand high keyword accuracy. To quantify this gap, we benchmarked a standard dense-only retriever against a hybrid RAG system that incorporates SPLADE sparse vectors.
Top 60+ Cloud GPU Providers in 2026
Cloud GPU providers fall into three tiers. Hyperscalers run broad cloud platforms with GPU rental as one product among many. Specialist neoclouds focus on GPU and AI infrastructure as their core product. Community marketplaces aggregate inventory from many small operators, often at the floor of the published price spread.
AIMultiple Newsletter
1 free email per week with the latest B2B tech news & expert insights to accelerate your enterprise.