Revisiting the Dirichlet-Multinomial after 10 years

I recently shared the following to my research list:
(See the latest research at localmaxradio.com/labs)

It’s been over a decade since I started building what is now the Foursquare City Guide venue ratings. This is a system applied to restaurants, cafes, parks and the like to find out what’s good. One question that came up is “how many ratings do we need before we have enough information to know the quality of a place?”

In the pursuit of an answer, I made use of the Dirichlet-multinomial probability distribution. The idea was that given a bunch of count data, I can come up with a prior probability that would apply to each row (or venue) before seeing the ratings. This tells us how much we should let the data inform our beliefs.

In Bayesian terms, this is an “informative prior”.

I found this simple model to be useful in many other situations. A good example is calculating the seasonality of different food items presented at RecSys in 2014.

Because time is expensive, I wanted a method to calculate these priors quickly. So, I found a way to efficiently compress the data, thereby simplifying the computation. I ended up implementing this in a blazing fast Python script.

To make these ideas available quickly, I shared it on arxiv and gave a talk about it at the New York Machine Learning meetup. Almost ten years later I still get contacted regarding this work, so I decided to make some updates to the code and the paper, addressing a few mistakes and making the language more precise. The remaster! 

I also created an addendum detailing the transformation of equations for Newton’s Method. There, I describe how the Hessian Matrix can be inverted efficiently in this particular situation. An interesting twist to this algorithm is doing it in "log space" - which makes sense since Dirichlet parameters are always positive.

I love sharing insights about these subjects, and I hope you enjoyed this one. Have a great summer!