AI Memory

AI memory allows models and agents to recall past interactions, adapt over time, and reason more effectively. We examined the most popular LLMs' ability to store information in both long-term and short-term memory, along with their context window capabilities.

VELC-Bench: Verification on Long Context Benchmark

AI Memory

Benchmark

Jul 22

The model’s ability to locate a specific metric in context, compare its value to a claim, and confirm or reject it. This tests fine-grained value matching under long-context conditions. The model must both retrieve the value and perform a precise comparison. The models are tested in the following context windows: claude-fable-5 scores 90.0% on verify…

AI Memory

Benchmark

Jul 7

RELC-Bench: Retrieval on Long Context Benchmark

RELC-Bench (RELC-Bench: Retrieval on Long Context Benchmark) aims to measure a model’s ability to find and extract a specific numeric value from one or more documents within its context. It tests whether the model can remember and retrieve a specific fact it just saw in the input. claude-fable-5 scores 97.0% on the 100 direct-recall items,…