Filmaster.com, a social network for film lovers, has recently presented a new movie recommendations engine. The algorithm that generates recommendations is open source and has been released under AGPLv3 license.
The new algorithm works by fetching the current ratings from database, processing them using a teaching algorithm and then generates recommendations for each user/film pair and eventually stores them in the relational database (PostreSQL). The first and the last part of the process is obviously Filmaster-specific. The teaching algorithm, on the other side, is universal and can be easily used in any external open source project.
The recommendation engine has been designed and implemented based on the best practices of the Nexflix contest participants. The correctness of recommendations (computed using the recommendation engines test, is — according to Filmaster — almost 20% better than in case of the previous algorithm used. Precisely, the RMSE value based on ~250 thousands ratings is as follows:
- 1.55 for the old algorithm
- 1.30 for the new one
Here is how the programmers document how the algorithm works.
Movies have a number of different features, each of those can also be rated (e.g. the scripts or the level of violence, etc). Obviously every user has a different attitude to those features. One may enjoy violence, the other might not stand it. So we can also rate users’ preferences for each feature. Here we get our
M matrices where
f_num dimension is the number of features we consider. Notice that in
R' every guessed rating
(u,m) is now of form: sum of
U[u][f]*M[f][m] for every feature
f. So the higher the preference for feature
f is and the higher is its level in a movie, the higher guessed rating will be. We also need to scale it so it would be in a
<1,10> range. So all that needs to be done is to find the possibly best
None of the existing SVD algorithms seemed to fit, so teaching approach with a heuristics has been used instead. We start with both matrices
U,M filled with the same unsignificantly small real number in each entry. We will now modify it separately for each feature. We begin with teaching the first feature and then after a certain number of teaching cycles we proceed to the second etc. In a teaching cycle we compute the actual
R' matrix and then take all nonempty entries
R matrix and for each of them we use a given formula:
err = lrate * (R[u][m]-R'[u][m]) U[u][f] += err*M[f][m] M[f][m] += err*U[u][f]
lrate is a constant real number and
f is a number of presently computed feature.
We notice that when our guessed rating is too low, err will be positive so the entries in
M will increase. In the other case,
err is negative and entries decrease. During the next teaching cycles the err absolute value decreases, as our guess ratings are getting closer to the real ones. In the end going through more cycles does not have significant effect so we proceed to the next feature. This simple formula happens to give quite satisfactory results.
The algorithm was implemented solely by Jakub Tlałka, currently a mathematics and computer science student on University of Warsaw and part-time Filmaster developer.
Despite the fact that Filmaster is coded in Python using the Django Web Framework, the recommendation engine has been implemented in C++ for performance reasons. It outperforms the previous one (written in Python) by an order of magnitude.
You can test the film recommendation engine yourself by rating at least 20 movies on Filmaster (an account is required - Open ID and Facebook Connect can be used as well) and then switching to the film recommendations page to see the suggestions.
Alternatively, you can use the open source code to apply the algorithm on your own collection of data in any open source project you’re currently working on. If you do so or if you’d like to help enhancing the current alorithm, please leave a comment under this article and write to Filmaster project maintainers at email@example.com.
The author of this article is one of the founders of the Filmaster project, but not the author of the new algorithm.