LLM-Compressive:
Longitudinal Evaluation of LLMs via Data Compression

Paper  /  Code  /  知乎

Outline

  1. Intro
  2. Issues
  3. Benchmark Performance
  4. Context Length Performance

1. Intro:

LLM-Compressive evaluates LLMs via data compression on data collected every month from 2017 to 2024.

We currently have sources include Code, Wikipedia, Math, arXiv, BBC News, Images, and Audio .

All y-axis represent compression ratio (%, lower is better). Models maintain a constant compression ratio over time demonstrate good generalization, while models that degrade over time demonstrate overfitting.

2. Issues:

If you have problems or want to request results of a new model, please head to our project page and open an issue.

3. Benchmark Performance:

4. Context Length Performance: