In the last post we had a simple stepping algorithm, and a gradient descent implementation, for fitting a line to a set of points with one variable and one 'outcome'. As I mentioned though, it's fairly straightforward to extend that to multiple variables, and even to curves, rather than just straight lines.
I've had a couple of emails recently about the excellent Stanford Machine Learning and AI online classes, so I thought I'd put up the odd post or two on some of the techniques they cover, and what they might look like in PHP.
The web is a great place for people to express their opinions, on just about any subject. Even the professionally opinionated, like movie reviewers, have blogs where the public can comment and respond with what they think, and there are a number of sites that deal in nothing more than this. The ability to automatically extract people's opinions from all this raw text can be a very powerful one, and it's a well studied area - no doubt because of the commercial possibilities.
When it comes to classification, and machine learning in general, at the head of the pack there's often a Support Vector Machine based method. In this post we'll look at what SVMs do and how they work, and as usual there's a some example code. However, even a simple PHP only SVM implementation is a little bit long, so this time the complete source is available separately in a zip file.
Classification techniques are used for spam filters, author identification, intrusion detection and a host of other applications. They can be used to help organise data into a structure, or to add tags to allow users to find documents. While the latest classification algorithms are at the cutting edge of machine learning, there are still thousands of systems using simpler algorithms to great effect.