I went through 1,800 Wikipedia topics to decide if they belong on Shepherd. Our machine learning system assigns topics to books, but I ultimately have to determine whether the topic is interesting or useful. I’ve cleared every topic that is assigned five books or more. I review this weekly to keep it updated. I also banned thousands of topics that don’t belong in the system. There are a lot of false flags around author names, book names, songs, bands, tv shows, and other weird stuff.
I went through 1,400 Library of Congress topics to map those to our Wikipedia topics. That was time-consuming, as sometimes you get a fragment like “Presidency of,” and you have to decide based on research where it should be assigned. Or you get a vague concept like “loss,” and you have to determine where the falls. I’ve cleared every topic that is assigned five books or more. I review this every week to keep it clean. I am working to clear everything assigned four books or more (450 LOC topics). There are 12,000 assigned at least one book, but I don’t think I’ll be able to do that. I’ll probably stop at three books.
The most time-consuming work is reviewing possible bookshelves and publishing those. I went through 800 pending bookshelves, published 452, deleted some, and have ~250 waiting for a feature or further review.
For example, I had a pending bookshelf around “chemistry,” but the problem is it was picking up the word chemistry around love/relationships. So that will have to wait until we have genre data to help fill it out.
What is the result of this work?
- The recommendation system quality improves.
- The quality of the bookshelves has improved.
- The search feature improves.
- I create more ways for people to find books they are interested in.