PHP/ir

Information Retrieval and other interesting topics

Monte Carlo Simulations

In: probability

07 Jul 2010

Monte Carlo simulations are a handy tool for looking at situations that have some aspect of uncertainty, by modelling them with a pseudo-random element and conducting a large number of trials. There isn’t a hard and fast Monte Carlo algorithm, but the process generally goes: start with a situation you wish to model, write a program to describe it that includes a random input, run that program many times, and look at the results.

Bayesian Opinion Mining

In: classification, probability

01 Jan 2010

The web is a great place for people to express their opinions, on just about any subject. Even the professionally opinionated, like movie reviewers, have blogs where the public can comment and respond with what they think, and there are a number of sites that deal in nothing more than this. The ability to automatically extract people's opinions from all this raw text can be a very powerful one, and it's a well studied area - no doubt because of the commercial possibilities.

Alternative Term Weighting

In: ranking, probability

11 Nov 2009

The term weighting and ranking function is at the core of any information retrieval system. The vector space model with the cosine similarity is maybe the best known and most widely used, but there are plenty of alternatives. We're looking at two here, the BM25 function based around a probabilistic model, and a function based around language modeling.