This post is the first in a series describing the process of making a tool for transcribing guitar music. This is clearly a task for ML, and so we need a dataset for training/validation. In the data preparation stage, obtaining scores aligned with music is incredibly challenging. To the best of my knowledge, the approach I present here is original.
We need a dataset of onsets in audio format labelled with the notes that begin at the onset.
Since such a dataset doesn’t exist, we must make it ourselves. Yes, we could use synthesised audio, but real-world data should result in better generalisation.
Now, having acquired hundreds of fingerstyle pieces from various artists in both audio and score format, how do we match these two together? This problem is an active area of research, and in its more general form can also be found in speech recognition systems.