Background: The topic engine analyzes each book an author recommends and assigns it different topics based on that data. Those topics are then used to power the upcoming recommendation engine, search, and bookshelves.
What are we doing?
We’ve been working to improve this system and realized we need to redo the logic of the system. That is going to take some additional time but it is worth it. It will simplify the system while giving us the ability to “easily” add additional data sources down the road.
Simplification is also key in keeping Ben and Marton sane :). I view simplification like this scene in Peter Pan
, every time you simplify a developer and the project owner come back to life.
Right now we have 3 pools of data we use to assign topics to a book.
- NLP Pool - Machine generated.
- Admin Pool - When I add/change something (human curation).
The changes we are implementing over the coming weeks will add a new pool using data from the Library of Congress (not available for all books). And, we are going to vastly improve the logic of how it all works.
What will the end result be?
- More accurate topics for each book. This system runs the bookshelves which means they will get even more accurate.
- And, the new recommendation system we are launching soon will use this data as well, so that will be more accurate.
- Plus, the system is being simplified which means less upkeep for us behind the scenes.