Services
Contact Us
No results found.
Ekrem Sarı

Ekrem Sarı

AI Researcher
26 Articles
Stay up-to-date on B2B Tech

Ekrem is an AI Researcher at AIMultiple, focusing on intelligent automation, GPUs, AI Agents, and LLMOps for RAG frameworks.

Professional Experience

During his tenure as an Assessor at Yandex, he evaluated search results using proprietary frameworks and automated protocols. He implemented QA testing through data annotation, relevance scoring, and user intent mapping across 10,000+ queries monthly, while conducting technical assessments, including performance monitoring and spam detection using ML feedback loops.

Research Interest

At AIMultiple, his research is centered on the MLOps lifecycle and the performance and benchmarking of end-to-end AI systems. He contributes to a wide range of projects, including Retrieval-Augmented Generation (RAG) optimization, extensive Large Language Model (LLM) benchmarking, and the design of agentic AI frameworks. Ekrem specializes in developing data-driven methodologies to measure and improve AI technology performance across critical operational metrics like accuracy, efficiency, API cost, and scalability.

His analysis covers the entire technology stack, from foundational components like embedding models and vector databases to the high-performance GPU and cloud infrastructure required for deploying AI agents.

Education

Ekrem holds a bachelor's degree from Hacettepe Üniversitesi and a master's degree from Başkent Üniversitesi.

Latest Articles from Ekrem

AIApr 16

Hybrid RAG: Boosting RAG Accuracy

Dense vector search is excellent at capturing semantic intent, but it often struggles with queries that demand high keyword accuracy. To quantify this gap, we benchmarked a standard dense-only retriever against a hybrid RAG system that incorporates SPLADE sparse vectors.

AIApr 15

Reranker Benchmark: Top 8 Models Compared

We benchmarked 8 reranker models on ~145k English Amazon reviews to measure how much a reranking stage improves dense retrieval. We retrieved top-100 candidates with multilingual-e5-base, reranked them with each model, and evaluated the top-10 results against 300 queries, each referencing concrete details from its source review. The best reranker lifted Hit@1 from 62.

AIApr 15

Compare Relational Foundation Models

We benchmarked SAP-RPT-1-OSS against gradient boosting (LightGBM, CatBoost) on 17 tabular datasets spanning the semantic-numeral spectrum, small/high-semantic tables, mixed business datasets, and large low-semantic numerical datasets. Our goal is to measure where a relational LLM’s pretrained semantic priors may provide advantages over traditional tree models and where they face challenges under scale or low-semantic structure.

AIApr 15

Multimodal Embedding Models: Apple vs Meta vs OpenAI

Multimodal embedding models excel at identifying objects but struggle with relationships. Current models struggle to distinguish “phone on a map” from “map on a phone.” We benchmarked 7 leading models across MS-COCO and Winoground to measure this specific limitation. To ensure a fair comparison, we evaluated every model under identical conditions using NVIDIA A40 hardware and bfloat16 precision.

AIApr 15

Top 10 Multilingual Embedding Models for RAG

We benchmarked 10 multilingual embedding models on ~606k Amazon reviews across 6 languages (German, English, Spanish, French, Japanese, Chinese). We generated 1,800 queries (300 per language), each referencing concrete details from its source review.

AIApr 15

LLM Quantization: BF16 vs FP8 vs INT4

We benchmarked Qwen3-32B at 4 precision levels (BF16, FP8, GPTQ-Int8, GPTQ-Int4) on a single NVIDIA H100 80GB GPU. Each configuration was evaluated on 2 benchmarks (~12.2K questions) covering knowledge and code generation, plus 2,000+ inference runs to measure throughput. Int4 is 2.

AIApr 15

GPU Concurrency Benchmark: H100 vs H200 vs B200 vs MI300X

I have spent the last 20 years focusing on system-level computational performance optimization. We benchmarked the latest NVIDIA GPUs, including the NVIDIA’s H100, H200, and B200, and AMD’s MI300X, for concurrency scaling analysis. Using the vLLM framework with the gpt-oss-20b model, we tested how these GPUs handle concurrent requests, from 1 to 512.

AIApr 15

Multi-GPU Benchmark: B200 vs H200 vs H100 vs MI300X

For over two decades, optimizing compute performance has been a cornerstone of my work. We benchmarked NVIDIA’s B200, H200, H100, and AMD’s MI300X to assess how well they scale for Large Language Model (LLM) inference. Using the vLLM framework with the meta-llama/Llama-3.1-8B-Instruct model, we ran tests on 1, 2, 4, and 8 GPUs.

AIMar 27

Graph RAG vs Vector RAG Benchmark

Vector RAG retrieves documents by semantic similarity. Graph RAG adds a knowledge graph on top of it, extracts entities and relationships from your documents, stores them in a graph database, and uses graph traversal alongside vector search at query time.

AIMar 23

RAG Observability Tools Benchmark

We benchmarked four RAG observability platforms on a 7-node LangGraph pipeline across three practical dimensions: latency overhead, integration effort, and platform trade-offs. Latency overhead metrics Metrics explained: Mean is the average latency across 150 measured graph.invoke() calls. LLM-judge evaluations run after the timer stops. Median is the 50th percentile latency.