Ekrem Sarı

AI Researcher

32 Articles

Stay up-to-date on B2B Tech

Ekrem is an AI Researcher and Data Analyst at AIMultiple. He designs and runs hands-on benchmarks for AI and LLM systems.

Professional Experience

At AIMultiple, Ekrem benchmarks end-to-end AI systems and builds the data workflows and dashboards used to track benchmark and product metrics. His benchmarks cover embedding and reranker models, vector and graph databases, inference engines, quantization, GPU concurrency and multi-GPU scaling, cloud GPU pricing and providers, text-to-SQL, and RAG and agentic RAG frameworks.

Before AIMultiple, he worked as an Assessor at Yandex, where he evaluated search quality and labeled large volumes of data against detailed guidelines to support ranking and model quality.

Research Interest

Ekrem's work focuses on the MLOps and LLMOps lifecycle and on measuring the performance of AI systems. He compares models, frameworks, and infrastructure on metrics such as accuracy, throughput, API cost, and scalability, across the stack from embedding models and vector databases to GPU and cloud infrastructure. His MSc thesis automates systematic literature reviews with a RAG-based pipeline.

Education

Ekrem holds a BA from Hacettepe University and is completing an MSc at Başkent University.

Latest Articles from Ekrem

Benchmark

Jul 17

Top 20+ Agentic RAG Frameworks

Agentic RAG enhances traditional RAG by boosting LLM performance and enabling greater specialization. We conducted a benchmark to assess its performance on routing between multiple databases and generating queries. Explore agentic RAG frameworks and libraries, key differences from standard RAG, benefits, and challenges to unlock their full potential. We used our agentic RAG benchmark methodology…

Benchmark

Jul 16

Cloud GPU Pricing, Performance & Provider Comparison

Cloud GPU list prices for the same model can differ several times over from one provider to another. We curated the lowest rate, provider, market range, and median for 40+ GPU configurations across all three pricing tiers, plus a throughput-per-dollar benchmark on 10 models. See the most cost-effective GPU for your workload across 13 hyperscaler…

Feature Comparison

Jul 16

Cloud GPU Rental Price Index

On-demand rates for the newest-generation cloud GPUs (B200, B300, MI300X, RTX 5090) roughly doubled over the past year, while mainstream cards (H100, H200, A100) held a tight band. We compile the GPU index monthly from 67 providers and 17 GPU models, covering on-demand, spot, and 1-year reserved tiers. The chart shows the monthly median posted…

Insight

Jul 12

LLM VRAM Calculator for Self-Hosting

Self-hosting an LLM means running inference on hardware the operator controls rather than via a third-party API, which changes the cost, data control, and privacy profile. Whether a model runs at all depends on memory. The calculator estimates the VRAM or unified memory a model needs to run locally, based on the model, its precision,…

Benchmark

Jul 10

Benchmark of 40+ LLMs in Finance: Claude Fable 5 & GPT-5.6 Sol

We evaluated 40+ LLMs in finance on 238 hard questions from the FinanceReasoning benchmark to identify which models excel at complex financial reasoning tasks like statement analysis, forecasting, and ratio calculations. We evaluated LLMs on 238 hard questions from the FinanceReasoning benchmark (Tang et al.).37 This subset targets the most challenging financial-reasoning tasks, assessing complex,…

Benchmark

Jul 3

Open Source Embedding Models Benchmark for RAG

We benchmarked 14 open-source embedding models, self-hosted on a single H100, across 500+ manually curated retrieval queries spanning legal contracts, customer support tech notes, and medical abstracts. NVIDIA Llama-Embed-Nemotron-8B leads in accuracy. On cost, Google’s EmbeddingGemma-300m runs roughly 4x cheaper than Nemotron at the cost of a small accuracy loss. nDCG@3: Normalized discounted cumulative gain…

Benchmark

Jul 2

Compare Relational Foundation Models

We benchmarked SAP-RPT-1-OSS against gradient boosting (LightGBM, CatBoost) on 17 tabular datasets spanning the semantic-numeral spectrum, small/high-semantic tables, mixed business datasets, and large low-semantic numerical datasets. Our goal is to measure where a relational LLM’s pretrained semantic priors may provide advantages over traditional tree models and where they face challenges under scale or low-semantic structure.…

Enterprise Software

Benchmark

Jul 2

Email Archiving Software Benchmark

We provisioned a Microsoft 365 tenant, populated it with a 10,000-mail synthetic corpus and 1,700 attachments across 8 file-type subtypes, then benchmarked NinjaOne SaaS Archiver, Barracuda Cloud Archiving Service, Acronis Cyber Protect Cloud Email Archiving, and MailPiler on the same tenant against 10 dimensions covering ingestion, search, attachment recall, export, immutability, legal hold, audit, encryption,…

Enterprise Software

Benchmark

Jul 2

Top Serverless Functions: Vercel vs Azure vs AWS

Serverless functions enable developers to run code without having to manage a server. This allows them to focus on writing and deploying applications while infrastructure scaling and maintenance are handled automatically in the background. In this benchmark, we evaluated 7 popular cloud service providers following our methodology to test their serverless function performance. We measured…

Benchmark

Jul 2

Multimodal Embedding Models: Apple vs Meta vs OpenAI

Multimodal embedding models excel at identifying objects but struggle with relationships. Current models struggle to distinguish “phone on a map” from “map on a phone.” We benchmarked 7 leading models across MS-COCO and Winoground to measure this specific limitation. To ensure a fair comparison, we evaluated every model under identical conditions using NVIDIA A40 hardware…

1 2 3 4

Stay ahead of the curve with

AIMultiple Newsletter

1 free email per week with the latest B2B tech news & expert insights to accelerate your enterprise.