Türkçe: README.tr.md
Python tool to download football match data from SofaScore public HTTP APIs, store it locally (JSON and CSV), and browse it through a terminal UI or a web dashboard.
This project is not affiliated with SofaScore. Use reasonable request rates and comply with applicable terms and laws.
- Leagues — Configure tournament IDs; optional remote search when adding leagues (web).
- Seasons & schedule — Fetch season lists and match lists; filter by league, season, date.
- Match details — Statistics, lineups, incidents, H2H, and related JSON slices; optional parallel fetching with progress and cancel (web).
- Web UI — Dashboard, leagues, schedule (with fetch wizard), match view, stats, settings (env-backed, tabs, backup/restore/clear), real-time scraper status (SSE).
- Terminal UI — Interactive menu for the same operations without the browser.
- Automation — Headless flags for CI/scripts (
--update-all,--fetch-mode,--league-id,--csv-export, paths). - Export — Processed “all matches” CSV and API export endpoints.
- Python 3.10+ (3.11+ recommended).
- Git — required for the one-line
curl | bashinstaller (clones this repo); optional if you already extracted or cloned the project manually. - Network access to SofaScore.
Official repository: github.com/tunjayoff/sofascore_scraper.
Linux / macOS / Git Bash
Already cloned:
chmod +x scripts/install.sh # once
./scripts/install.shOne-liner (clones tunjayoff/sofascore_scraper, creates .venv, installs dependencies, copies .env):
curl -fsSL https://raw.githubusercontent.com/tunjayoff/sofascore_scraper/main/scripts/install.sh | bash- Optional first argument: target folder name (default
sofascore_scraper), or setSOFASCORE_SCRAPER_DIR. - To use another fork as default clone source:
export SOFASCORE_SCRAPER_REPO=https://github.com/YOU/fork.gitbeforecurl | bash, or pass a full git URL as the first argument tobash -s:
curl ... | bash -s -- https://github.com/YOU/fork.git [folder] - Override the built-in default URL only if needed:
SOFASCORE_SCRAPER_DEFAULT_REPO.
Windows — PowerShell (clone is automatic if you are not already inside the repo):
Set-ExecutionPolicy -Scope CurrentUser RemoteSigned # if needed, once
Invoke-RestMethod https://raw.githubusercontent.com/tunjayoff/sofascore_scraper/main/scripts/install.ps1 | Invoke-ExpressionOr after cloning:
.\scripts\install.ps1Explicit clone URL / folder:
.\scripts\install.ps1 -RepoUrl https://github.com/tunjayoff/sofascore_scraper.git -InstallDir sofascore_scraperFrom CMD: scripts\install.bat. Environment overrides: SOFASCORE_SCRAPER_REPO, SOFASCORE_SCRAPER_DIR, SOFASCORE_SCRAPER_DEFAULT_REPO.
Prerequisites: Git (for the one-liner / clone path), Python 3.10+ on PATH. The scripts print clear errors if git, python, venv, or pip install fails (e.g. missing python3-venv on Debian/Ubuntu).
git clone https://github.com/tunjayoff/sofascore_scraper.git
cd sofascore_scraper
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txtCopy environment defaults and adjust:
cp .env.example .envSee .env.example for all keys. Common ones:
| Variable | Purpose |
|---|---|
DATA_DIR |
Root folder for stored data (default data). Web app reads this via ConfigManager. |
LANGUAGE |
en or tr for UI strings. |
MAX_CONCURRENT |
Parallel detail requests cap. |
USE_PROXY / PROXY_URL |
Optional HTTP proxy. |
FETCH_ONLY_FINISHED |
Prefer finished matches when fetching lists. |
RATE_LIMIT_* / SERVER_ERROR_* |
Circuit breaker thresholds when many errors occur. |
Tuning for the web UI (timeouts, retries, logging) is exposed under Settings; writing settings updates .env.
One line per league: numeric SofaScore unique tournament ID and a display name (format created/maintained by the app). The ID appears in SofaScore tournament URLs (e.g. .../premier-league/17 → 17).
CLI override:
python main.py --config /path/to/leagues.txt --data-dir /path/to/data--config / --data-dir apply to interactive and headless modes. The web server loads the singleton ConfigManager from the project .env (DATA_DIR, etc.); align paths so the web UI and CLI see the same data if you use both.
Web (recommended for most users)
- Finish Installation and Configuration (
pip install,cp .env.example .env). Optionally setLANGUAGE=enortrandDATA_DIRif you want data somewhere other than./data. - Start the server:
python main.py --weband openhttp://127.0.0.1:8000. - Leagues — Add at least one tournament: use search (SofaScore) or enter the numeric tournament ID from the SofaScore URL. Save.
- Schedule — Choose a league (and season if needed). Use Fetch (wizard) for a guided run—pick leagues, seasons, and whether you want full sync or details only—or use the buttons for broader one-shot updates.
- While data is downloading, a progress card shows status; you can usually still navigate the site. If something stays stuck, check Settings → performance (concurrency, timeouts) and logs.
- Click a match row to open the match page (stats, lineups, etc.). If details are missing, use the actions on that page or run another details fetch from Schedule.
- Stats summarises disk usage and coverage; Settings edits
.env(tabs for general, network, performance, data tools). Use backup/restore before risky clean operations.
Terminal menu
Run python main.py and work through the numbered menus: manage leagues, refresh seasons, fetch match lists, fetch details, run stats, or export CSV. The flow matches the web conceptually but without the wizard—use prompts to choose leagues and options.
Tips
- First-time full fetch for a big league can take a long time; start with one league and a few recent seasons from the wizard.
- If you hit rate limits or many errors, lower MAX_CONCURRENT and raise waits slightly in Settings; avoid
--ignore-rate-limitunless you know what you are doing. - For the same dataset in web and CLI/headless, keep
DATA_DIRin.envaligned with--data-dirwhen you use the command line.
python main.pypython main.py --webDefault URL: http://127.0.0.1:8000 (bind 0.0.0.0:8000). Health: GET /health.
Background jobs report status via GET /api/scrape/status and GET /api/scrape/stream (SSE). Heavy API work runs off the asyncio event loop so the UI stays responsive during long fetches.
At least one of --update-all or --csv-export is required with --headless. Otherwise the process exits with code 2.
| Flag | Meaning |
|---|---|
--headless |
No terminal menu |
--update-all |
Run a fetch pipeline |
--fetch-mode full |
Seasons + match lists + details (default) |
--fetch-mode details |
Match details only (uses existing schedule/summary CSVs) |
--league-id ID |
Limit --update-all to one configured league |
--csv-export |
Build/export processed CSV dataset |
--ignore-rate-limit |
Disable circuit breaker (use with care) |
Examples:
python main.py --headless --update-all
python main.py --headless --update-all --fetch-mode details --league-id 52
python main.py --headless --csv-export --data-dir ./dataExit codes: 0 success (or APP_EXIT_CODE if set by scraper), 1 unexpected error, 2 headless with no action.
python main.py --helpTypical structure:
data/
├── seasons/ # Season metadata per league
├── matches/ # Match list / summary CSVs by league & season
├── match_details/ # Per-match JSON folders (basic, stats, lineups, …)
│ └── processed/ # Aggregated CSV exports
└── datasets/ # Reserved / auxiliary
Exact paths may vary slightly by league naming and migrations.
All routes are prefixed with /api unless noted.
- Leagues: list, create, delete, search (local / remote), seasons, refresh seasons, missing-details.
- Matches: paginated schedule, single-match JSON, on-demand fetch for one match.
- Scraper:
POST /api/fetch(body: modefullordetails, optional league and wizardselections), cancel, status, SSE stream. - Dashboard / stats / settings: JSON for the web UI; settings mirror
.envkeys. - Data: backup zip, clear scopes, CSV export.
OpenAPI: GET /docs when the server is running.
Run the web app with auto-reload (as started by main.py --web):
uvicorn src.web.app:app --reload --host 0.0.0.0 --port 8000Contributions are welcome. You can help in several ways:
- Bug reports — Open an issue with steps to reproduce, expected vs actual behaviour, OS/Python version, and relevant
.envflags (redact secrets). - Feature ideas — Suggest use cases and constraints; maintainers may triage and discuss scope in the issue.
- Pull requests — Fork the repo, use a focused branch, keep changes small and on-topic, and describe what and why in the PR. Match existing code style; avoid drive-by refactors. If you touch user-visible text, consider
locales/en.jsonandlocales/tr.json. - Docs & translations — Improvements to these READMEs or locale strings are appreciated.
There is no separate contributor agreement beyond the MIT license on your submissions. Be respectful in issues and reviews. If you are unsure whether an idea fits, open an issue first.
MIT — see LICENSE.