A structured study of ML model fragility across 5 independent failure axes on an e-commerce dataset.
Central thesis: A model can be statistically better while being systemically less reliable.
| # | Axis | What varies | Frozen |
|---|---|---|---|
| 1 | Capacity | Model tier (LR → DT → RF → XGB → LGB) | Everything else |
| 2 | Fidelity | Label noise / feature noise levels | Model, features, seed |
| 3 | Stability | Random seeds / bootstrap draws | Model, data, features |
| 4 | Representation | PCA variance / top-k features | Model, noise, seed |
| 5 | Temporal | Train/test time window | Model, features, noise |
# 1. Clone
git clone https://dagshub.com/Y-R-A-V-R-5/FragileML.git
cd FragileML
# 2. Install dependencies
pip install -r requirements.txt
# 3. Add your dataset
# Place dataset.csv at: data/dataset.csv
# (file is gitignored — not tracked in the repo)
# 4. (Optional) connect DagsHub for experiment tracking
dagshub login# Axis 1 — full model sweep (default)
python scripts/run_axis.py
# Axis 2 — label noise (default) or feature noise or both
python scripts/run_axis.py --axis fidelity --task C4
python scripts/run_axis.py --axis fidelity --fidelity-mode feature_noise
python scripts/run_axis.py --axis fidelity --fidelity-mode both --models LR RF XGB
# Axis 3 — bootstrap (default) or seed variation or both
python scripts/run_axis.py --axis stability --task C4
python scripts/run_axis.py --axis stability --stability-mode seed
python scripts/run_axis.py --axis stability --stability-mode both
# Axis 4 — PCA (default) or top-k or both
python scripts/run_axis.py --axis representation --task C4
python scripts/run_axis.py --axis representation --repr-mode topk
python scripts/run_axis.py --axis representation --repr-mode both
# Axis 5 — temporal drift
python scripts/run_axis.py --axis temporal --task C4
# See plan without running
python scripts/run_axis.py --dry-run
python scripts/run_axis.py --axis stability --dry-run
# Track to DagsHub
python scripts/run_axis.py --track
python scripts/run_axis.py --axis fidelity --fidelity-mode both --track| Alias | Full name | Tier |
|---|---|---|
LR |
Logistic Regression | 1 — linear |
LinReg |
Linear Regression | 1 — linear |
DT |
Decision Tree | 2 — single tree |
RF |
Random Forest | 3 — ensemble |
XGB |
XGBoost | 4 — boosting |
LGB |
LightGBM | 4 — boosting |
The dataset (data/dataset.csv) is not tracked in git (244 MB).
Place your dataset at data/dataset.csv before running any experiments.
Column schema is described in config/data.yaml.
FragileML/
├── config/
│ ├── data.yaml # Dataset schema, tasks, preprocessing
│ ├── models.yaml # Model registry (4 tiers, 6 models)
│ └── axes/ # One config per axis
├── src/
│ ├── axes/ # 5 axis runners + BaseAxis
│ ├── data/ # Loading, splitting, noise, representation
│ ├── metrics/ # Performance, stability, calibration, reporter
│ ├── models/ # Registry, evaluator
│ └── utils/ # Constants, logger, seed
├── scripts/
│ └── run_axis.py # Main entry point
├── data/ # Dataset goes here (gitignored)
└── artifacts/ # Experiment outputs (gitignored)