Music Classification and Recommendation System by spectogram

Self-Supervised Learning on Audio Spectrograms

A deep learning project that uses self-supervised contrastive learning on audio spectrograms to classify music and recommend similar songs based on learned embeddings.

📚 Documentation

Detailed documentation and guides have been moved to the docs/ directory:

Quickstart Guide: Get up and running quickly.
Audio Augmentation Guide: Specific guide for the audio augmentation pipeline.
Augmentation Summary: Technical details on the augmentation strategies.

🎯 Overview

This project implements a self-supervised learning approach to learn meaningful audio representations without requiring labeled data. By training a CNN encoder with contrastive learning on augmented spectrograms, the model learns to identify similar songs and can be used for:

Music Classification: Categorize songs by genre, mood, or style
Song Recommendation: Find similar songs based on audio features
Audio Similarity Search: Retrieve songs that sound alike

The system leverages the power of contrastive learning to create robust audio embeddings that capture the essence of musical content.

📁 Project Structure

Music Classification by spectogram/
│
├── README.md                          # This file
├── docs/                              # Documentation and guides
├── configs/                           # Configuration files (YAML)
│   ├── model_config.yaml              # CNN architecture settings
│   ├── training_config.yaml           # Training hyperparameters
│   └── data_config.yaml               # Data processing settings
│
├── notebooks/                         # Jupyter Notebooks
│   └── Music_Classification_Training_Colab.ipynb  # Main training pipeline
│
├── CNN/                               # Core source code
│   ├── models/                        # Encoder & Projection Head
│   ├── augmentation/                  # Audio augmentations
│   ├── data/                          # Dataset & Dataloaders
│   ├── training/                      # Training logic
│   └── recommendation/                # Recommendation engine
│
├── AudioToSpectogram/                 # Preprocessing scripts
│   ├── output/                        # Generated spectrograms (gitignored)
│   ├── output_mel/                    # Generated Mel-spectrograms (gitignored)
│   └── fma_small_dataset/             # Dataset directory (gitignored)
│
└── requirements.txt                   # Project dependencies

🚀 Getting Started

1. Installation

# Clone the repository
git clone <repository-url>
cd "Music Classification by spectogram"

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # or .venv\Scripts\activate on Windows

# Install dependencies
pip install -r requirements.txt

2. Configuration

The project is fully configurable via YAML files in the configs/ directory. Key settings include:

configs/data_config.yaml: Controls audio processing.
- Default: 3.0s duration, 128 mels, 22050Hz sample rate.
- Dataset: Points to AudioToSpectogram/fma_small_dataset.
configs/model_config.yaml: Defines the CNN architecture.
- Default: 4-block CNN encoder (64, 128, 256, 512 filters).
configs/training_config.yaml: Sets training hyperparameters.
- Default: 200 epochs, batch size 64, Adam optimizer (lr=0.001), NT-Xent loss (temp=0.5).

3. Training & Experimentation

The primary entry point for training and experimentation is the Jupyter Notebook:

notebooks/Music_Classification_Training_Colab.ipynb

This notebook covers the entire pipeline:

Data Loading: Loads audio/spectrograms using settings from data_config.yaml.
Augmentation: Visualizes and applies audio augmentations.
Model Initialization: Builds the CNN encoder defined in model_config.yaml.
Training: Runs the contrastive learning loop using training_config.yaml.
Evaluation: Visualizes training curves and embeddings.

To run it:

jupyter notebook notebooks/Music_Classification_Training_Colab.ipynb

🏗️ Pipeline Architecture

Pipeline Stages:

Waveform Augmentation: Apply pitch, tempo, noise, etc. to raw audio.
Spectrogram Conversion: Convert augmented audio to mel-spectrograms.
Spectrogram Augmentation: Apply masking and warping to the spectrogram.
CNN Encoder: Extract high-level audio features.
Contrastive Learning: Train with self-supervised contrastive loss.
Embeddings & Search: Generate embeddings and find similar songs.

✨ Key Features

Self-Supervised Learning

No labels required: Learn from raw audio data
Contrastive learning: SimCLR-based approach
Data efficiency: Learn robust representations with limited data

Advanced Two-Stage Data Augmentation

The system implements a comprehensive augmentation pipeline to ensure robust feature learning:

Stage 1: Waveform Augmentations (Raw Audio) Randomly selects 3 per sample:

Pitch Shift: ±1-3 semitones (Pitch invariance)
Tempo Stretch: ±5-12% speed change (Tempo invariance)
Gain Adjustment: ±3-6 dB volume change
Parametric EQ: Low-pass, high-pass, or bandpass filtering
Dynamic Range Compression: Reduces dynamic range
Environmental Noise: Adds background noise (SNR 10-30 dB)
Convolutional Reverb: Simulates room acoustics

Stage 2: Spectrogram Augmentations (Mel-Spectrogram) Randomly selects 2 per sample:

Time Masking: Masks random time steps (SpecAugment)
Frequency Masking: Masks random frequency bands (SpecAugment)
Time Warping: Deforms the time axis for robustness

Note: No axis flips or color jittering are used to preserve musical structure.

Robust CNN Encoder

Deep convolutional architecture optimized for spectrograms
Learns hierarchical audio features
Produces compact, discriminative embeddings

Efficient Recommendation System

Fast cosine similarity search
Scalable to large music databases
Real-time song recommendations

🤝 Contributing

Contributions are welcome! Please follow these steps:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Audio conversion code adapted from AudioToSpectogram
SimCLR paper: A Simple Framework for Contrastive Learning of Visual Representations
SpecAugment: SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

📧 Contact

For questions or feedback, please open an issue on GitHub.

Happy Music Coding! 🎵🎶

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Music Classification and Recommendation System by spectogram

Self-Supervised Learning on Audio Spectrograms

📚 Documentation

🎯 Overview

📁 Project Structure

🚀 Getting Started

1. Installation

2. Configuration

3. Training & Experimentation

🏗️ Pipeline Architecture

Pipeline Stages:

✨ Key Features

Self-Supervised Learning

Advanced Two-Stage Data Augmentation

Robust CNN Encoder

Efficient Recommendation System

🤝 Contributing

📄 License

🙏 Acknowledgments

📧 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
AudioToSpectogram		AudioToSpectogram
CNN		CNN
checkpoints		checkpoints
configs		configs
docs		docs
embeddings_db		embeddings_db
images		images
notebooks		notebooks
runs/fixed_20251121_134454		runs/fixed_20251121_134454
.gitignore		.gitignore
License		License
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Music Classification and Recommendation System by spectogram

Self-Supervised Learning on Audio Spectrograms

📚 Documentation

🎯 Overview

📁 Project Structure

🚀 Getting Started

1. Installation

2. Configuration

3. Training & Experimentation

🏗️ Pipeline Architecture

Pipeline Stages:

✨ Key Features

Self-Supervised Learning

Advanced Two-Stage Data Augmentation

Robust CNN Encoder

Efficient Recommendation System

🤝 Contributing

📄 License

🙏 Acknowledgments

📧 Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages