Over the past few years I have become pretty obsessed with modern
board games. This may sound unusual to the uninitiated but modern
board games have come a long way in complexity, theme and just
overall entertainment. I also find that board games are a great
way to unplug these days and enjoy the company and conversation
that comes with people gathered together to play a game. It seemed
only natural to combine my interest in board games with my interest in data analysis.
The motivation behind this objective similarity service is that, while browsing
, I often find
myself wanting a list of similar games. I picture this being
something like a new section in the side banner that has other game
characteristics. I knew a potentially more useful list would be a 'recommendation
list' but without having direct, speedy access to the underlying database as well
as user ranked lists, it would very time consuming to gather the necessary data
for a recommendation system. With that constraint, I opted for an objective
similarity system that would return a list of games that are similarity based off
game characteristics rather than opinion. This is that attempt.
It's important to mention what this isn't - a recommendation system. Some
responses may not make immediate sense as to the game relation but under the
confines of the game characteristics used, there may be more apparent
similarities. This isn't to say that the system is always correct, far from it,
it's just that the results might not align with expectations because one might
think of the games differently than the model does.
The method being used here is quite straight forward when all is said and done.
This system uses Locality Sensitive Hashing to determine the similarity between
games. In particular it is taking into consideration the following game
characteristics (as defined by BoardGameGeek):
- minimum age
- minimum playtime
- maximum playtime
- maximum player count
For information around Locality Sensitive Hashing, I'd recommend this
A key thing to note is that the hash table was built with data pulled on
2018-06-27 and contained the top 6,000 or so games (ranked according to BGG).
This means that the system will only be able to return similar games if they
were in the the top 6,000 games at the time.
I hope to be compiling a more thorough walk through of how this
system was built.
I also hope to look into using the board game description along with some Natural
Language Processing (NLP) techniques as another approach to determining game