A comprehensive, hands-on curriculum for mastering Machine Learning with Python
From mathematical foundations to production deployment
Getting Started β’ Learning Path β’ Architecture β’ Technologies
- Why This Project?
- Project Goals
- System Architecture
- Learning Path
- Technology Stack
- Project Timeline
- Getting Started
- Project Structure
- Current Progress
Learning Machine Learning is challenging due to:
| Challenge | Impact | Our Solution |
|---|---|---|
| Fragmented Resources | Learners jump between tutorials without cohesion | Unified, progressive curriculum |
| Theory-Practice Gap | Math concepts don't connect to code | Every concept has implementation |
| No Production Focus | Tutorials don't cover real-world deployment | End-to-end projects with deployment |
| Outdated Content | Many resources use deprecated libraries | Modern stack (PyTorch 2.0+, Python 3.12) |
| Missing Testing | No emphasis on code quality | TDD approach with pytest |
This study guide provides a structured, progressive path from Python basics to deploying production ML systems:
flowchart LR
subgraph Foundation["π§ Foundation"]
A[NumPy]
B[Pandas]
C[Visualization]
end
subgraph Classical["π Classical ML"]
D[Supervised]
E[Unsupervised]
end
subgraph Deep["π§ Deep Learning"]
F[Neural Nets]
G[CNNs]
H[RNNs]
end
subgraph Applied["π Applied"]
I[NLP]
J[Computer Vision]
K[Projects]
end
Foundation --> Classical
Classical --> Deep
Deep --> Applied
style Foundation fill:#1a1a2e,stroke:#00d4ff,color:#fff
style Classical fill:#1a1a2e,stroke:#00ff88,color:#fff
style Deep fill:#1a1a2e,stroke:#ff6b6b,color:#fff
style Applied fill:#1a1a2e,stroke:#ffd93d,color:#fff
mindmap
root((ML Mastery))
Foundations
NumPy Arrays
Pandas DataFrames
Data Visualization
Statistics
Classical ML
Regression
Classification
Clustering
Dimensionality Reduction
Deep Learning
Neural Networks
CNNs
RNNs/LSTMs
Transformers
Applications
NLP
Computer Vision
Time Series
Deployment
| Metric | Target | Measurement |
|---|---|---|
| Notebooks Completed | 50+ | Interactive Jupyter notebooks |
| Unit Tests | 90%+ coverage | pytest with coverage reports |
| Projects Built | 5+ end-to-end | From data to deployment |
| Code Quality | 100% type-hinted | mypy + pylint passing |
flowchart TB
subgraph Input["π₯ Input Layer"]
direction TB
NB["π Jupyter Notebooks"]
DATA["π Datasets"]
CFG["βοΈ Configs"]
end
subgraph Core["βοΈ Core Processing"]
direction TB
SRC["π Source Code"]
UTILS["π§ Utilities"]
MODELS["π€ Models"]
VIZ["π Visualization"]
end
subgraph Quality["β
Quality Assurance"]
direction TB
TESTS["π§ͺ Tests"]
LINT["π Linting"]
DOCS["π Documentation"]
end
subgraph Deploy["οΏ½οΏ½ Deployment"]
direction TB
DOCKER["π³ Docker"]
API["π API"]
end
Input --> Core
Core --> Quality
Quality --> Deploy
style Input fill:#2d3436,stroke:#00cec9,color:#fff
style Core fill:#2d3436,stroke:#6c5ce7,color:#fff
style Quality fill:#2d3436,stroke:#00b894,color:#fff
style Deploy fill:#2d3436,stroke:#e17055,color:#fff
flowchart TD
subgraph Root["π python-ML-learn"]
direction TB
subgraph Learning["π Learning Modules"]
F01["01-fundamentals/"]
F02["02-supervised-learning/"]
F03["03-unsupervised-learning/"]
F04["04-deep-learning/"]
F05["05-nlp/"]
F06["06-computer-vision/"]
F07["07-projects/"]
end
subgraph Source["π» Source Code"]
SRC_UTILS["src/utils/"]
SRC_MODELS["src/models/"]
SRC_DATA["src/data_processing/"]
SRC_VIZ["src/visualization/"]
end
subgraph Support["π§ Support"]
TESTS["tests/"]
DOCS["docs/"]
DOCKER["docker/"]
MEMORY["memory-bank/"]
end
end
style Root fill:#1e272e,stroke:#fff,color:#fff
style Learning fill:#2d3436,stroke:#74b9ff,color:#fff
style Source fill:#2d3436,stroke:#a29bfe,color:#fff
style Support fill:#2d3436,stroke:#55efc4,color:#fff
flowchart LR
subgraph Data["π Data Pipeline"]
RAW["Raw Data"]
CLEAN["Cleaned Data"]
FEAT["Features"]
end
subgraph Model["π€ Model Pipeline"]
TRAIN["Training"]
VAL["Validation"]
TEST["Testing"]
end
subgraph Output["π Output"]
PRED["Predictions"]
METRICS["Metrics"]
VIZ["Visualizations"]
end
RAW --> CLEAN
CLEAN --> FEAT
FEAT --> TRAIN
TRAIN --> VAL
VAL --> TEST
TEST --> PRED
TEST --> METRICS
METRICS --> VIZ
style Data fill:#2d3436,stroke:#00cec9,color:#fff
style Model fill:#2d3436,stroke:#6c5ce7,color:#fff
style Output fill:#2d3436,stroke:#fdcb6e,color:#fff
flowchart TB
subgraph P1["Phase 1: Foundation"]
direction LR
P1A["Week 1-2"]
P1B["Infrastructure<br/>& Setup"]
P1A --> P1B
end
subgraph P2["Phase 2: Fundamentals"]
direction LR
P2A["Week 3-5"]
P2B["NumPy, Pandas<br/>Statistics, Viz"]
P2A --> P2B
end
subgraph P3["Phase 3: Supervised"]
direction LR
P3A["Week 6-8"]
P3B["Regression<br/>Classification"]
P3A --> P3B
end
subgraph P4["Phase 4: Unsupervised"]
direction LR
P4A["Week 9-10"]
P4B["Clustering<br/>PCA, t-SNE"]
P4A --> P4B
end
subgraph P5["Phase 5: Deep Learning"]
direction LR
P5A["Week 11-13"]
P5B["Neural Nets<br/>CNN, RNN"]
P5A --> P5B
end
subgraph P6["Phase 6-9: Advanced"]
direction LR
P6A["Week 14-26"]
P6B["NLP, CV<br/>Projects, MLOps"]
P6A --> P6B
end
P1 --> P2 --> P3 --> P4 --> P5 --> P6
style P1 fill:#1e3a5f,stroke:#3498db,color:#fff
style P2 fill:#1e3a5f,stroke:#2ecc71,color:#fff
style P3 fill:#1e3a5f,stroke:#9b59b6,color:#fff
style P4 fill:#1e3a5f,stroke:#e74c3c,color:#fff
style P5 fill:#1e3a5f,stroke:#f39c12,color:#fff
style P6 fill:#1e3a5f,stroke:#1abc9c,color:#fff
π Phase 1: Foundation (Weeks 1-2)
| Topic | Description | Deliverable |
|---|---|---|
| Project Structure | Modular src layout | Folder hierarchy |
| Development Environment | VS Code + extensions | .vscode/settings.json |
| Docker Setup | Reproducible environment | Dockerfile, docker-compose.yml |
| Testing Framework | pytest configuration | conftest.py, pytest.ini |
Status: β Complete
π Phase 2: Core ML Fundamentals (Weeks 3-5)
| Topic | Key Concepts | Notebook |
|---|---|---|
| NumPy | Arrays, broadcasting, linear algebra | 01_numpy_fundamentals.ipynb |
| Pandas | DataFrames, cleaning, aggregation | 02_pandas_data_manipulation.ipynb |
| Visualization | matplotlib, seaborn, plotly | 03_data_visualization.ipynb |
| Statistics | Distributions, hypothesis testing | 04_statistics_for_ml.ipynb |
| Scikit-learn Intro | Pipelines, preprocessing, models | 05_sklearn_introduction.ipynb |
Status: β Complete (5 notebooks, 114 tests)
π Phase 3: Supervised Learning (Weeks 6-8)
| Algorithm | Mathematical Foundation | Implementation |
|---|---|---|
| Linear Regression |
|
From scratch + sklearn |
| Logistic Regression | Sigmoid, cross-entropy | Binary & multiclass |
| Decision Trees | Gini impurity, entropy | Visualization included |
| Random Forests | Bagging, feature importance | Hyperparameter tuning |
| SVM | Kernel trick, margin maximization | Multiple kernels |
| Gradient Boosting | Sequential ensembles | XGBoost, LightGBM |
Status: β Complete (5 notebooks, 15 tests)
π Phase 4: Unsupervised Learning (Weeks 9-10)
| Algorithm | Purpose | Implementation |
|---|---|---|
| K-Means | Centroid-based clustering | From scratch + sklearn |
| Hierarchical | Agglomerative clustering | Dendrograms |
| DBSCAN | Density-based clustering | Parameter tuning |
| PCA | Dimensionality reduction | From scratch + sklearn |
| t-SNE | Visualization | Perplexity tuning |
| Anomaly Detection | Outlier detection | Isolation Forest, LOF, One-Class SVM |
Status: β Complete (3 notebooks, 37 tests)
π Phase 5-9: Advanced Topics (Weeks 11-26)
| Phase | Topics | Hours |
|---|---|---|
| 5. Deep Learning | Neural nets, CNN, RNN, PyTorch | 90 |
| 6. NLP | Embeddings, BERT, Transformers | 70 |
| 7. Computer Vision | Object detection, segmentation | 70 |
| 8. Projects | End-to-end ML systems | 100+ |
| 9. MLOps | Deployment, monitoring, CI/CD | 40 |
flowchart TB
subgraph Languages["π Languages & Runtime"]
PY["Python 3.12+"]
JUP["Jupyter"]
end
subgraph DataScience["π Data Science"]
NP["NumPy"]
PD["Pandas"]
SP["SciPy"]
end
subgraph Visualization["π Visualization"]
MPL["Matplotlib"]
SNS["Seaborn"]
PLT["Plotly"]
end
subgraph ML["π€ Machine Learning"]
SK["scikit-learn"]
XG["XGBoost"]
LG["LightGBM"]
end
subgraph DL["π§ Deep Learning"]
PT["PyTorch"]
TF["TensorFlow"]
HF["Transformers"]
end
subgraph DevOps["π§ DevOps"]
DOC["Docker"]
GIT["Git"]
TEST["pytest"]
end
Languages --> DataScience
Languages --> Visualization
DataScience --> ML
ML --> DL
DL --> DevOps
style Languages fill:#2c3e50,stroke:#3498db,color:#fff
style DataScience fill:#2c3e50,stroke:#2ecc71,color:#fff
style Visualization fill:#2c3e50,stroke:#9b59b6,color:#fff
style ML fill:#2c3e50,stroke:#e74c3c,color:#fff
style DL fill:#2c3e50,stroke:#f39c12,color:#fff
style DevOps fill:#2c3e50,stroke:#1abc9c,color:#fff
| Technology | Version | Purpose | Why Chosen |
|---|---|---|---|
| Python | 3.12+ | Core language | Industry standard, rich ecosystem |
| NumPy | 2.4+ | Numerical computing | 10-100x faster than pure Python, vectorization |
| Pandas | 2.0+ | Data manipulation | Intuitive DataFrame API, SQL-like operations |
| scikit-learn | 1.3+ | Classical ML | Consistent API, comprehensive algorithms |
| PyTorch | 2.0+ | Deep learning | Dynamic graphs, Pythonic, research-friendly |
| TensorFlow | 2.13+ | Deep learning | Production-ready, TensorBoard, Keras API |
| Matplotlib | 3.7+ | Plotting | Highly customizable, publication quality |
| Seaborn | 0.12+ | Statistical viz | Beautiful defaults, statistical plots |
| Docker | Latest | Containerization | Reproducible environments |
| pytest | 7.4+ | Testing | Simple syntax, powerful fixtures |
flowchart LR
subgraph NumPy["NumPy Ecosystem"]
ARR["ndarray<br/>N-dimensional arrays"]
UFUNC["ufuncs<br/>Element-wise ops"]
LINALG["linalg<br/>Matrix operations"]
RAND["random<br/>Statistical sampling"]
end
subgraph Benefits["Why NumPy?"]
SPEED["β‘ 10-100x Faster"]
MEM["πΎ Memory Efficient"]
BROAD["π‘ Broadcasting"]
INTER["π Interoperability"]
end
NumPy --> Benefits
style NumPy fill:#2c3e50,stroke:#013243,color:#fff
style Benefits fill:#2c3e50,stroke:#4dabf7,color:#fff
Definition: NumPy is the fundamental package for scientific computing in Python.
Motivation: Python lists are slow for numerical operations. NumPy provides:
- Contiguous memory allocation
- Vectorized operations (no Python loops)
- C-level execution speed
Mechanism:
# Python list (slow)
result = [x ** 2 for x in range(1000000)] # ~200ms
# NumPy array (fast)
arr = np.arange(1000000)
result = arr ** 2 # ~2ms (100x faster!)Impact: Enables processing of large datasets that would be impractical with pure Python.
gantt
title ML Study Guide - 26 Week Timeline
dateFormat YYYY-MM-DD
section Phase 1
Infrastructure Setup :done, p1, 2025-12-16, 2w
section Phase 2
NumPy Fundamentals :done, p2a, after p1, 3d
Pandas & Data :active, p2b, after p2a, 1w
Visualization : p2c, after p2b, 5d
Statistics : p2d, after p2c, 4d
Feature Engineering : p2e, after p2d, 5d
section Phase 3
Linear Regression : p3a, after p2e, 5d
Logistic Regression : p3b, after p3a, 5d
Decision Trees : p3c, after p3b, 6d
SVM & Boosting : p3d, after p3c, 1w
section Phase 4
Clustering : p4a, after p3d, 1w
Dimensionality Reduction: p4b, after p4a, 1w
section Phase 5
Neural Networks : p5a, after p4b, 2w
CNN & RNN : p5b, after p5a, 2w
section Phase 6-9
NLP : p6, after p5b, 3w
Computer Vision : p7, after p6, 3w
Projects : p8, after p7, 4w
MLOps : p9, after p8, 2w
| Milestone | Target | Status | Progress |
|---|---|---|---|
| M1: Infrastructure | Week 2 | β Complete | ββββββββββββ 100% |
| M2: Fundamentals | Week 5 | β Complete | ββββββββββββ 100% |
| M3: Supervised | Week 8 | β Complete | ββββββββββββ 100% |
| M4: Unsupervised | Week 10 | β Complete | ββββββββββββ 100% |
| M5: Deep Learning | Week 13 | β Complete | ββββββββββββ 100% |
| M6: NLP | Week 16 | β Complete | ββββββββββββ 100% |
| M7: Computer Vision | Week 19 | β Complete | ββββββββββββ 100% |
| M8: Projects | Week 24 | β Complete | ββββββββββββ 100% |
| M9: MLOps | Week 26 | β Not Started | ββββββββββββ 0% |
| Requirement | Version | Check Command |
|---|---|---|
| Python | 3.8+ | python --version |
| pip | Latest | pip --version |
| Git | Latest | git --version |
| Docker (optional) | Latest | docker --version |
# 1. Clone the repository
git clone https://github.com/yourusername/python-ML-learn.git
cd python-ML-learn
# 2. Create virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Start Jupyter Lab
jupyter lab# Build and run with Docker Compose
cd docker
docker-compose up -d
# Access Jupyter Lab at http://localhost:8888# Run tests to verify setup
python -m pytest tests/ -v
# Expected output: All tests passingpython-ML-learn/
βββ π 01-fundamentals/ # NumPy, Pandas, Visualization (5 notebooks)
β βββ 01_numpy_fundamentals.ipynb
β βββ 02_pandas_data_manipulation.ipynb
β βββ 03_data_visualization.ipynb
β βββ 04_statistics_for_ml.ipynb
β βββ 05_sklearn_introduction.ipynb
βββ π 02-supervised-learning/ # Regression, Classification (5 notebooks)
β βββ 01_linear_regression.ipynb
β βββ 02_logistic_regression.ipynb
β βββ 03_decision_trees_random_forests.ipynb
β βββ 04_svm.ipynb
β βββ 05_gradient_boosting.ipynb
βββ π 03-unsupervised-learning/ # Clustering, PCA (3 notebooks)
β βββ 01_clustering.ipynb
β βββ 02_dimensionality_reduction.ipynb
β βββ 03_anomaly_detection.ipynb
βββ π 04-deep-learning/ # Neural Networks, CNN, RNN (5 notebooks)
β βββ 01_neural_network_fundamentals.ipynb
β βββ 02_pytorch_introduction.ipynb
β βββ 03_convolutional_neural_networks.ipynb
β βββ 04_recurrent_neural_networks.ipynb
β βββ 05_training_techniques.ipynb
βββ π 05-nlp/ # Text Processing, Transformers (5 notebooks)
β βββ 01_text_preprocessing.ipynb
β βββ 02_text_vectorization.ipynb
β βββ 03_word_embeddings.ipynb
β βββ 04_text_classification.ipynb
β βββ 05_transformers_introduction.ipynb
βββ π 06-computer-vision/ # Object Detection, Segmentation (5 notebooks)
β βββ 01_image_fundamentals.ipynb
β βββ 02_cnn_architectures.ipynb
β βββ 03_transfer_learning.ipynb
β βββ 04_object_detection.ipynb
β βββ 05_image_segmentation.ipynb
βββ π 07-projects/ # End-to-End Projects (5 notebooks)
β βββ 01_house_price_prediction.ipynb
β βββ 02_customer_churn_prediction.ipynb
β βββ 03_image_classification_app.ipynb
β βββ 04_sentiment_analysis_pipeline.ipynb
β βββ 05_recommendation_system.ipynb
βββ π 08-mlops/ # MLOps & Production (5 notebooks)
β βββ 01_model_serving_fastapi.ipynb
β βββ 02_docker_containerization.ipynb
β βββ 03_experiment_tracking.ipynb
β βββ 04_cicd_pipelines.ipynb
β βββ 05_model_monitoring.ipynb
β
βββ π» src/ # Source Code
β βββ utils/ # Utility functions
β β βββ timer.py # Performance timing
β β βββ numpy_helpers.py # NumPy utilities
β β βββ pandas_helpers.py # Pandas utilities
β β βββ stats_helpers.py # Statistical functions
β β βββ sklearn_helpers.py # Scikit-learn utilities
β β βββ visualization_helpers.py # Plotting utilities
β βββ ml_core/ # ML helper modules
β β βββ supervised.py # Supervised learning helpers
β β βββ unsupervised.py # Unsupervised learning helpers
β β βββ deep_learning.py # Deep learning helpers
β β βββ nlp.py # NLP helpers
β β βββ computer_vision.py # Computer vision helpers
β βββ models/ # ML model implementations
β βββ data_processing/ # Data pipelines
β βββ visualization/ # Plotting utilities
β
βββ π§ͺ tests/ # Test Suite
β βββ unit/ # Unit tests
β βββ integration/ # Integration tests
β
βββ π docs/ # Documentation
β βββ project-plan.md # Detailed project plan
β
βββ ποΈ memory-bank/ # Project Memory
β βββ change-log.md # Version history
β βββ architecture-decisions/ # ADRs
β
βββ π³ docker/ # Docker Configuration
β βββ Dockerfile
β βββ docker-compose.yml
β
βββ π data/ # Datasets
β βββ raw/ # Original data
β βββ processed/ # Cleaned data
β
βββ βοΈ configs/ # Configuration files
βββ π requirements.txt # Python dependencies
βββ π README.md # This file
All 9 phases of the Machine Learning curriculum have been completed!
pie title Project Completion by Phase
"Phase 1 - Infrastructure" : 100
"Phase 2 - Fundamentals" : 100
"Phase 3 - Supervised" : 100
"Phase 4 - Unsupervised" : 100
"Phase 5 - Deep Learning" : 100
"Phase 6 - NLP" : 100
"Phase 7 - Computer Vision" : 100
"Phase 8 - Projects" : 100
"Phase 9 - MLOps" : 100
| Module | Tests | Coverage | Status |
|---|---|---|---|
utils/timer.py |
14 | 95% | β |
utils/numpy_helpers.py |
24 | 100% | β |
utils/pandas_helpers.py |
21 | 100% | β |
utils/stats_helpers.py |
33 | 100% | β |
utils/sklearn_helpers.py |
29 | 100% | β |
utils/visualization_helpers.py |
31 | 100% | β |
ml_core/supervised.py |
15 | 100% | β |
ml_core/unsupervised.py |
37 | 100% | β |
ml_core/deep_learning.py |
42 | 100% | β |
ml_core/nlp.py |
53 | 100% | β |
ml_core/computer_vision.py |
63 | 100% | β |
Total Tests: 362 passing β
| Date | Version | Changes |
|---|---|---|
| 2025-12-22 | v2.0.0 | π Phase 9: MLOps & Production (model serving, Docker, CI/CD, monitoring) |
| 2025-12-22 | v1.12.0 | Phase 8: End-to-End Projects (5 comprehensive ML projects) |
| 2025-07-09 | v1.11.0 | Phase 7: Computer Vision (image fundamentals, CNN, detection, segmentation) |
| 2025-07-08 | v1.10.0 | Phase 6: NLP (text preprocessing, embeddings, transformers) |
| 2025-07-08 | v1.9.0 | Phase 5: Deep learning (PyTorch, CNN, RNN, training techniques) |
| 2025-07-08 | v1.8.0 | Phase 4: Unsupervised learning (clustering, PCA, anomaly detection) |
| 2025-07-08 | v1.7.0 | Phase 3: Supervised learning (regression, classification, SVM, boosting) |
| 2025-07-08 | v1.6.0 | Phase 2: Fundamentals complete (5 notebooks, helper modules) |
| 2025-07-08 | v1.0.0 | Initial project structure, Docker setup |
- π Understand the Math: Don't skip mathematical intuition
- π» Code from Scratch: Implement algorithms before using libraries
- π Visualize Everything: Use plots to understand behavior
- οΏ½οΏ½ Read Comments: Code is heavily documented
- π Practice Daily: Consistency is key
- π§ͺ Write Tests: Verify your implementations
flowchart LR
subgraph Daily["Daily (2-3 hours)"]
D1["π Theory<br/>30 min"]
D2["π» Coding<br/>90 min"]
D3["π Review<br/>30 min"]
end
subgraph Weekly["Weekly"]
W1["π 1-2 Notebooks"]
W2["π§ͺ Unit Tests"]
W3["π Mini Project"]
end
Daily --> Weekly
style Daily fill:#2c3e50,stroke:#3498db,color:#fff
style Weekly fill:#2c3e50,stroke:#2ecc71,color:#fff
| Resource | Link | Description |
|---|---|---|
| NumPy | numpy.org | Array computing |
| Pandas | pandas.pydata.org | Data analysis |
| scikit-learn | scikit-learn.org | Machine learning |
| PyTorch | pytorch.org | Deep learning |
| TensorFlow | tensorflow.org | Deep learning |
- οΏ½οΏ½ Kaggle Learn - Free micro-courses
- π Papers With Code - Research implementations
- π₯ 3Blue1Brown - Visual math explanations
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Write tests for new code
- Submit a pull request
MIT License - Feel free to use for personal learning.
Made with β€οΈ for Machine Learning Enthusiasts
β Star this repo if you find it helpful!