Global Football Data Lake
An open 14-year football dataset, ML-ready.
โ
OPEN SOURCE2025
A unified, deduplicated dataset of 367,530 matches across 102 leagues with 182K players and ~10M appearances, enriched with Glicko-2 team ratings and odds, published in CSV and Parquet with full schema docs.
Stack
PythonpandasPolarsPyArrowParquetGlicko-2
Engineering highlights
- Normalised multiple sources (API-Football, Football-Data, The Odds API) into one relational schema with full entity relationships.
- Computed Glicko-2 strength ratings by chronologically replaying all match history through a Plackett-Luce variant.
- Published CSV (1.1GB) + Parquet (201MB) for match prediction, goals modelling, and market research.