This codebase provides a framework for generating new dance routines using motion matching on the AIST++ dataset.
Follow these steps to set up the environment and prepare the data:
-
Set up the Python Environment: Set up your environment with Python 3.12. If you are using Conda, it's recommended to create a new environment:
conda create -n dance_identity python=3.12 conda activate dance_identity
-
Install Dependencies: Install all required packages from
requirements.txt:pip install -r requirements.txt
-
Prepare the Data and Models:
- Ensure your AIST++ dataset is located inside the
data/motions/folder. This step should already be done, just confirm thatdata/motions/is populated. - Download the SMPL models (Male, Female, and Neutral) from the official SMPL website and place them inside the
models/smpl/folder. Ensure the following 6 files are present in the directory so the scripts can properly load all genders and metadata:SMPL_FEMALE.npzSMPL_FEMALE.pklSMPL_MALE.npzSMPL_MALE.pklSMPL_NEUTRAL.npzSMPL_NEUTRAL.pkl
- Ensure your AIST++ dataset is located inside the
-
Create the Motion Index: Create the index of moves by running:
python create_index.py
-
Create the Motion Codebook: Create the discretized feature codebook by running:
python create_codebook.py
-
Create the Plausibilities Graph: Create the graph of transition plausibilities by running:
python create_plausibilities.py
Once your setup is complete and the index is created, you can generate new dance samples in two ways:
Mode A (Autonomous Exploration): Generate a random sequence of a given length:
python generate_sample.py --num_frames 1000Mode B (Guided Generation): Guide generation through specific codebook regions using a DNA sequence (a comma-separated string of region IDs):
python generate_sample.py --input_dna "114, 12, 125, 140, 57"Mode C (Text Guided Generation): Guide generation using a string of text. The text is trimmed and converted into bytes, which directly map to codebook regions:
python generate_sample.py --input_text "I love dance"All modes support an optional --gender argument (neutral, male, or female) to change the body model used during rendering. For example:
python generate_sample.py --input_text "I love dance" --gender femaleThe generated samples will be saved in the results/ directory. Each generated routine produces three files:
- A move sequence file (
.pkl) - A video visualization (
.mp4) - A run data file (
.json)
A benchmarking tool (run_analysis.py) has been added to evaluate the output against theoretical and art-science expectations.
# Run all analysis regimes (this may take a while depending on hardware)
python run_analysis.py
# Run a specific subset for debugging/iteration
# There are 3 parts total, split by data collection needs
python run_analysis.py --part 1The script extracts stylistic uniqueness, pathfinding capabilities, and physical correctness from 10k frame datasets and offline indexes.
- Metrics data are saved into text logs:
results/metrics/part_1.txt,results/metrics/part_2.txt, etc. - Generating plotting charts (KDE distribution, path density histograms, UpSet-style intersection bars, etc.) which export to
results/plots/.
Note that evaluation can be slow. The full evaluation can take as long as 4 hours on a consumer level CPU.
This repository contains code for a website about this project. All website code is in the webapp folder. To interact with the website code, switch into the webapp directory:
cd webappRead webapp/README.md for instructions on how to set up and run the website locally.
To build the searchable database of motions, the system first creates a unified index mapping all valid frames (create_index.py). Then, it processes these frames into abstract stylistic tokens (create_codebook.py):
- Feature Extraction: It reconstructs 3D representations using the SMPL body model and extracts local behavior features (joint poses, root velocities, and foot contacts) while discarding world-space position variations to make the data translation-invariant.
- Temporal Windowing: A sliding window captures chunks of movement (default 20 frames) representing the "stylistic future" of each frame.
- Dimensionality Reduction and Quantization: The high-dimensional temporal features are compressed into a 64D space using PCA, and subsequently assigned discrete region values (0 to 255) using K-Means clustering.
Output:
- The combined motion index is saved to
data/index/motion_index.npz, containing the concatenated SMPLposesandtransfor all dataset frames, along withfile_indicesandframe_indicesto map each frame back to its original source file and local frame number. - The codebook is saved to
data/index/codebook.npz. Every valid frame receives an integer token0-255reflecting its motion behavior cluster. The finalW-1(default 19) frames of any isolated clip are marked with a token of-1, signifying that they lack sufficient future frames to construct a full behavioral window. The downstream engine ignores these-1frames as selectable transitional targets.
The plausibility graph is generated using create_plausibilities.py. This graph encodes the physical plausibility of transitions between codebook regions. Each node represents a codebook region, and edges are weighted by the cost of transitioning between regions based on motion continuity metrics. This graph is critical for ensuring smooth transitions during guided generation (Mode B) and is saved as data/index/plausibility_graph.pkl.
The sample generation step (generate_sample.py) operates in two modes using motion matching and an offline plausibility graph:
- Mode A (Autonomous Exploration): The engine starts at a random valid frame and plays the motion. To maintain temporal consistency, it locks playback for a minimum of 30 frames. After this period, it searches the entire database to find the lowest-cost transition to a new sequence, continuously creating a novel, unbounded dance routine.
- Mode B (Guided Generation): The engine follows a user-provided "DNA" target sequence, represented as a list of codebook regions. It seamlessly transitions to the requested regions. If a direct transition to the next requested region is physically implausible within the search window, the system performs a safe jump and uses Dijkstra's algorithm on the precomputed plausibility graph (
create_plausibilities.py) to inject bridge regions, dynamically routing the choreography back to the user's intended DNA sequence. - Mode C (Text Guided Generation): The engine follows a target sequence generated by trimming the provided text and converting it into a string of bytes. Each byte perfectly maps to a 0-255 codebook region, providing a new way to interactively guide choreography through words.
In all modes, the final output includes a 3D rendered video, raw pose data, and a run_data.json file logging the exact sequence of regions executed.