AI Memory
AI memory allows models and agents to recall past interactions, adapt over time, and reason more effectively. We examined the most popular LLMs' ability to store information in both long-term and short-term memory, along with their context window capabilities.
VELC-Bench: Verification on Long Context Benchmark
The model’s ability to locate a specific metric in context, compare its value to a claim, and confirm or reject it. This tests fine-grained value matching under long-context conditions. The model must both retrieve the value and perform a precise comparison.
RELC-Bench: Retrieval on Long Context Benchmark
RELC-Bench (RELC-Bench: Retrieval on Long Context Benchmark) aims to measure a model’s ability to find and extract a specific numeric value from one or more documents within its context. It tests whether the model can remember and retrieve a specific fact it just saw in the input.