Skip to content
Eremie HQ

Global Football Data Lake

An open 14-year football dataset, ML-ready.

โ˜… OPEN SOURCE2025

A unified, deduplicated dataset of 367,530 matches across 102 leagues with 182K players and ~10M appearances, enriched with Glicko-2 team ratings and odds, published in CSV and Parquet with full schema docs.

Stack

PythonpandasPolarsPyArrowParquetGlicko-2

Engineering highlights

  • Normalised multiple sources (API-Football, Football-Data, The Odds API) into one relational schema with full entity relationships.
  • Computed Glicko-2 strength ratings by chronologically replaying all match history through a Plackett-Luce variant.
  • Published CSV (1.1GB) + Parquet (201MB) for match prediction, goals modelling, and market research.
Eremie Gillowei ยท Preston, UK
eremiehq.com