Introduction
This is an ongoing project in collaboration with Jules Françoise, in which we are trying to teach a neural network to generate beat-synchronous dance movements for a given song, as well as matching movement patterns with the musical patterns. We have created a database of synchronized groove moves/songs for the training data.
Rather than a supervised approach, we are treating this as an unsupervised learning problem. For each song, we extract the audio descriptors and train a multi-modal neural network both the audio descriptors and joint rotations.
I will be updating this page as we make progress...
The Approach
Preliminary Results - April 2017
As submitted to the Workshop on Machine Learning for Creativity: PDF.
Learning and Generating Movement Patterns
-
FCRBM - Labeled Mocap Segments - No Audio
Hidden Units: 150 | Factors: 400 | Order: 6 | Frame Rate: 30
16-Dimensional, One-hot-encoded Labels
Pattern 4 •
Pattern 5 •
Pattern 6 •
Pattern 7 •
Pattern 8 •
Pattern 9 •
Pattern 10 •
Pattern 11 •
Pattern 12 •
Pattern 13 •
Pattern 14 •
Pattern 15
* The rest of the labels (1, 2, 3, and 16) either represented non-moving portions of the mocap sequence, e.g., the beginnning, or did not cause the model to learn any patterns.
Dancing with Training Songs
-
FCRBM - Cooked Features
Hidden Units: 500 | Factors: 500 | Order: 30 | Frame Rate: 60
Audio Features: 84-Dimensions:
low-level features (RMS level, Bark bands)
spectral features (energy in low/middle/high frequencies, spectral centroid, spectral spread, spectral skewness, spectral kurtosis, spectral rolloff, spectral crest, spectral flux, spectral complexity),
timbral Features (Mel-Frequency Cepstral Coefficients, Tristimulus),
melodic Features (pitch, pitch salience and confidence, inharmonicity, dissonance).
Based on audio track 1: Output 1 • Output 2 • Output 3
Based on audio track 2: Output 4 • Output 5 • Output 6
Based on audio track 3: Output 7 • Output 8 • Output 9
Dancing with Unheard Songs
-
FCRBM - Cooked Features
Hidden Units: 500 | Factors: 500 | Order: 30 | Frame Rate: 60
Audio Features: 84-Dimensions:
low-level features (RMS level, Bark bands)
spectral features (energy in low/middle/high frequencies, spectral centroid, spectral spread, spectral skewness, spectral kurtosis, spectral rolloff, spectral crest, spectral flux, spectral complexity),
timbral Features (Mel-Frequency Cepstral Coefficients, Tristimulus),
melodic Features (pitch, pitch salience and confidence, inharmonicity, dissonance).
Output 1 • Output 2 • Output 3 • Output 4 • Output 5 • Output 6
Publications
- Omid Alemi, Jules Françoise, and Philippe Pasquier. "GrooveNet: Real-Time Music-Driven Dance Movement Generation using Artificial Neural Networks". Accepted to the Workshop on Machine Learning for Creativity, 23rd ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Halifax, Nova Scotia - Canada. 2017. PDF.