Services
Contact Us
No results found.

RELC-Bench: Retrieval on Long Context Benchmark

Şevval Alper
Şevval Alper
updated on May 26, 2026

RELC-Bench (RELC-Bench: Retrieval on Long Context Benchmark) aims to measure a model’s ability to find and extract a specific numeric value from one or more documents within its context. It tests whether the model can remember and retrieve a specific fact it just saw in the input.

Results

Loading Chart

Methodology

Question format

A natural-language question asking for one numeric metric. Example:

Q: What was the Revenue for Q1 2026 Adobe (ADBE)?
Expected: $6.40 billion

Data source

The script parses the Takeaways section of each Motley Fool earnings transcript and extracts all numeric metrics. For each metric, the script verifies the number appears verbatim in the post-Takeaways transcript body (the actual conference call text), so the model has to read the real conversation, not the summary bullet. Summary bullets are removed from the texts.

Scoring rule

  • Each item has a list of target values; the first is the primary target (the headline answer to the question)
  • Score = 1.0 if the primary target matches any number in the prediction
  • Score = 0.0 otherwise
  • Refusals (“I don’t know”) score 0.0

What good performance looks like

Phase 1 ≥ 85% (model reliably finds metrics in a single doc).
Phase 2 ≥ 90% (model navigates to the target in a haystack without distraction).
Position-invariant scores indicate true long-context capability; declining scores by depth indicate “lost in the middle.”

Item count

100 direct-recall items spread across 14 transcripts.

Further reading

Cite this research

Pick the format that matches where you're publishing. Pasting the link version into your CMS preserves the backlink.

Şevval Alper and Berk Kalelioğlu (2026) - "RELC-Bench: Retrieval on Long Context Benchmark". Published online at AIMultiple.com. Retrieved May 26, 2026, from: https://aimultiple.com/ai-memory [Online Resource]

Alper, Ş., & Kalelioğlu, B. (2026, May 26). RELC-Bench: Retrieval on Long Context Benchmark. AIMultiple. https://aimultiple.com/ai-memory

@misc{alper2026,
  author = {Alper, Şevval and Kalelioğlu, Berk},
  title  = {{RELC-Bench: Retrieval on Long Context Benchmark}},
  year   = {2026},
  month  = may,
  howpublished    = {\url{https://aimultiple.com/ai-memory}},
  note   = {AIMultiple. Retrieved May 26, 2026}
}
Şevval Alper
Şevval Alper
AI Researcher
Şevval is an AIMultiple AI researcher specializing in LLMs, AI agents and quantum technologies.
View Full Profile
Technically reviewed by
Berk Kalelioğlu
Berk Kalelioğlu
AI Researcher
Berk is an AI Researcher at AIMultiple, focusing on agentic ai systems and language models.
View Full Profile

Be the first to comment

Your email address will not be published. All fields are required.

0/450