Echo Nest & Columbia University Release Free Million Song Dataset
March 2011

image from A free Million Song Dataset has been released in an effort to empower the creativity of music software developers. To get the massive project off the ground, The Echo Nest provided the music analysis and metadata, Columbia University’s LabROSA (Laboratory for the Recognition and Organization of Speech and Audio) did the research, Infochimps is handling the hosting and there’s even some funding from the National Science Foundation.  What does all this mean for music?

Software and app developers, academic researchers, and data scientists can now use the Million Song Dataset to test theories and build algorithms for music recommendation, cultural analysis, and countless other purposes.

The dataset which brings together information ranging from loudness to length and realease date to  energy level.  No actual music is included in the song data;  but does include mapping to 7digital’s library of 30-second samples, allowing researchers to test their technologies with real songs.

Interested parties can go here for the code, instructions, benchmark results for example tasks (like automatic song tagging and artist recognition), artist mapping to Yahoo’s user ratings, and demonstrations of how to fetch audio snippets from 7digital and represent artists on a world map using the data, as well as a forum and FAQ.