When it comes to classification, and machine learning in general, at the head of the pack there's often a Support Vector Machine based method. In this post we'll look at what SVMs do and how they work, and as usual there's a some example code. However, even a simple PHP only SVM implementation is a little bit long, so this time the complete source is available separately in a zip file.
So far when we've been looking at text we've been breaking it down into words, albeit with varying degrees of preprocessing, and using the word as our token or term. However, there is quite a lot of mileage in comparing other units of text, for example the letter n-gram, which can prove effective in a variety of applications.