I’m wondering how I can use cGPT in a particular usecase and if so how can I go about feeding training data to it?
Whati am trying to accomplish: I want to be able to supply cGPT with a music file (.ogg or .mp3) and get an accuracy of .001 BPM as to what the BPM of a song is. Huge bonus points if it can also print out at which second (down to .001 sec) where a BPM would change in a song.
No, nowhere near as accessible, but you can still learn it off the internet. Depends on how much effort you want to put into this project, really. The kind of thing you’re trying to do is pretty involved and will take a lot of trial and error, time, and effort to get working well. People have put in a lot of effort to make it easier but it’s not a trivial task.
If you’re really interested, I’d recommend looking into simple neural network tutorials on YouTube, specifically through tensorflow or (if you have institutional access) Matlab.
In my experience, at least for digitally produced music that has a constant tempo and a 4/4 measure, the DJ software will get it perfectly right more than 95% of the time. In those few cases where it fails, it seems to me that it's most often caused by bad/weird/artsy/interesting mixing choices in the production, where e.g. the bass notes are more preminent than the kick drum, confusing the algorithm with an irregular kind of waveform. I guess manually EQ'ing the audio file itself to make the drums more prominent than the bass notes, then letting the software analyse the BPM once again, could be a solution.
For non-quantized recordings with musically organic tempo changes, it's definitely a much different story...
The change of BPM (beats per minute) from one value to another can not be arbitrarily precise. At 60 BPM, there is only one per second, you want 0.001 s resolution, that is 1-thousandths of a beat. A 1 kHz tone only does one full wavelength in that time.
It also depends on how long the samples are. A 0.2 second sample of hardly going to give a BPM at all.
Maybe you can get down to fractions of delta-BPM at high initial BPM and long samples. But that is it.
Then there is the actually big question how it is even relevant? How would it be relevant if it is 60 or 60.001 BPM?
This application of deep learning would apply to music suitable for playing DDR/ITG/Stepmania/Stepmaniax/PIU etc.; essentially music gaming:
Most music that would be reasonably fun to play falls within 110-240BPM and runs between 2.5 and 7 minutes long. At 110BPM, a song with a coded 110BPM, but a true BPM of 110.001 will drift by roughly 2ms. Music games are predicated on timing precision down to 15ms as a minimum. I, myself, hit notes within a rough range of 6ms at my best (and I'm barely top 100 in the world).
You can produce the audio with arbitrary temporal precision, the issue is that this precision is simply impossible to reconstruct given the low number of "virtual sample points per time" (as in relevant for the BPM), same goes for the discussed wavelength of the actual sound, putting up yet another limit, where just measuring the frequency becomes less and less accurate/possible.