Connectionist Temporal Classification (CTC) with Theano
This will be the first time I’m trying to present code I’ve written in an ipython notebook. The style’s different, but I think I’ll permanently switch to this method of presentation for code-intensive posts from now on. A nifty little tool that makes doing this so convenient is ipy2wp. It uses WordPress’ xml-rpc to post the HTML directly to the platform.
In any case, I’ve started working with the NUS School of Computing speech recognition group, and they’ve been using deep neural networks for classification of audio frames to phonemes. This requires a preprocessing step that aligns the audio frames to phonemes in order to reduce this to a simple classification problem.
CTC describes a way to compute the probability of a sequence of phonemes for a sequence of audio frames, accounting for all possible alignments. We can then define an objective function to maximise the probability of the phoneme sequence given the audio frame sequence from training data.