Ekrem Sarı
Ekrem is an AI Researcher at AIMultiple, focusing on intelligent automation, GPUs, AI Agents, and LLMOps for RAG frameworks.
Professional Experience
During his tenure as an Assessor at Yandex, he evaluated search results using proprietary frameworks and automated protocols. He implemented QA testing through data annotation, relevance scoring, and user intent mapping across 10,000+ queries monthly, while conducting technical assessments, including performance monitoring and spam detection using ML feedback loops.Research Interest
At AIMultiple, his research is centered on the MLOps lifecycle and the performance and benchmarking of end-to-end AI systems. He contributes to a wide range of projects, including Retrieval-Augmented Generation (RAG) optimization, extensive Large Language Model (LLM) benchmarking, and the design of agentic AI frameworks. Ekrem specializes in developing data-driven methodologies to measure and improve AI technology performance across critical operational metrics like accuracy, efficiency, API cost, and scalability.His analysis covers the entire technology stack, from foundational components like embedding models and vector databases to the high-performance GPU and cloud infrastructure required for deploying AI agents.
Education
Ekrem holds a bachelor's degree from Hacettepe Üniversitesi and a master's degree from Başkent Üniversitesi.Latest Articles from Ekrem
DLP Software Benchmark
We benchmarked Acronis DeviceLock DLP and ManageEngine DLP Plus on identical Windows Server 2022 VMs with 28 scenarios: 23 data leak tests (including 12 adversarial evasion files), 3 agent security tests, and 2 tests under high CPU and memory consumption.
Embedding Models: OpenAI vs Gemini vs Voyage
We benchmarked 15 English text-embedding models and a BM25 baseline on over 500 manually curated queries across three retrieval domains: legal contracts (CUAD), customer support (IBM TechQA), and healthcare (MedRAG PubMed). Voyage-3.5 ranks first overall. Perplexity Embed V1 0.6b reaches the upper-mid tier at the lowest price point in our benchmark.
Open Source Embedding Models Benchmark for RAG
We benchmarked 14 open-source embedding models, self-hosted on a single H100, across 500+ manually curated retrieval queries spanning legal contracts, customer support tech notes, and medical abstracts. NVIDIA Llama-Embed-Nemotron-8B leads in accuracy. On cost, Google’s EmbeddingGemma-300m runs roughly 4x cheaper than Nemotron at the cost of a small accuracy loss.
Graph Database Benchmark: Neo4j vs FalkorDB vs Memgraph
We benchmarked Neo4j, FalkorDB, and Memgraph on a synthetic graph derived from 120,000 Amazon product reviews (381K nodes, 804K edges).
LLM Inference Engines: vLLM vs LMDeploy vs SGLang
We benchmarked 3 leading LLM inference engines on NVIDIA H100: vLLM, LMDeploy, and SGLang. Each engine processed identical workloads: 1,000 ShareGPT prompts using Llama 3.1 8B-Instruct to isolate the true performance impact of their architectural choices and optimization strategies.
Top Vector Database for RAG: Qdrant vs Weaviate vs Pinecone
Vector databases power the retrieval layer in RAG workflows by storing document and query embeddings as high‑dimensional vectors. They enable fast similarity searches based on vector distances.
Benchmark of 39 LLMs in Finance: Claude Opus 4.7, Gemini 3.1 Pro & More
We evaluated 39 LLMs in finance on 238 hard questions from the FinanceReasoning benchmark to identify which models excel at complex financial reasoning tasks like statement analysis, forecasting, and ratio calculations. LLM finance benchmark overview We evaluated LLMs on 238 hard questions from the FinanceReasoning benchmark (Tang et al.).
Top 20+ Agentic RAG Frameworks
Agentic RAG enhances traditional RAG by boosting LLM performance and enabling greater specialization. We conducted a benchmark to assess its performance on routing between multiple databases and generating queries. Explore agentic RAG frameworks and libraries, key differences from standard RAG, benefits, and challenges to unlock their full potential.
Text-to-SQL: Comparison of LLM Accuracy
I have relied on SQL for data analysis for 18 years, beginning in my days as a consultant. Translating natural-language questions into SQL makes data more accessible, allowing anyone, even those without technical skills, to work directly with databases.
Hybrid RAG: Boosting RAG Accuracy
Dense vector search is excellent at capturing semantic intent, but it often struggles with queries that demand high keyword accuracy. To quantify this gap, we benchmarked a standard dense-only retriever against a hybrid RAG system that incorporates SPLADE sparse vectors.
AIMultiple Newsletter
1 free email per week with the latest B2B tech news & expert insights to accelerate your enterprise.