RELC-Bench (RELC-Bench: Retrieval on Long Context Benchmark) aims to measure a model’s ability to find and extract a specific numeric value from one or more documents within its context. It tests whether the model can remember and retrieve a specific fact it just saw in the input.
Results
Methodology
Question format
A natural-language question asking for one numeric metric. Example:
Q: What was the Revenue for Q1 2026 Adobe (ADBE)?
Expected: $6.40 billion
Data source
The script parses the Takeaways section of each Motley Fool earnings transcript and extracts all numeric metrics. For each metric, the script verifies the number appears verbatim in the post-Takeaways transcript body (the actual conference call text), so the model has to read the real conversation, not the summary bullet. Summary bullets are removed from the texts.
Scoring rule
- Each item has a list of target values; the first is the primary target (the headline answer to the question)
- Score = 1.0 if the primary target matches any number in the prediction
- Score = 0.0 otherwise
- Refusals (“I don’t know”) score 0.0
What good performance looks like
Phase 1 ≥ 85% (model reliably finds metrics in a single doc).
Phase 2 ≥ 90% (model navigates to the target in a haystack without distraction).
Position-invariant scores indicate true long-context capability; declining scores by depth indicate “lost in the middle.”
Item count
100 direct-recall items spread across 14 transcripts.
Further reading
- Cognitive Agents: Creating a Mind with LangChain
- 5 Open-Source Agentic AI Frameworks
- AI Apps with MCP Memory Benchmark & Tutorial
- Code Execution with MCP: A New Approach to AI Agent Efficiency
Cite this research
Pick the format that matches where you're publishing. Pasting the link version into your CMS preserves the backlink.
@misc{alper2026,
author = {Alper, Şevval and Kalelioğlu, Berk},
title = {{RELC-Bench: Retrieval on Long Context Benchmark}},
year = {2026},
month = may,
howpublished = {\url{https://aimultiple.com/ai-memory}},
note = {AIMultiple. Retrieved May 26, 2026}
}
Be the first to comment
Your email address will not be published. All fields are required.