Research Article: “Analyzing Social Book Reading Behavior on Goodreads and How it Predicts Amazon Best Sellers” (Preprint)
The following article (preprint) will be published during 2019 in Influence and Behavior Analysis in Social Networks and Social Media published by Springer.
Analyzing Social Book Reading Behavior on Goodreads and How it Predicts Amazon Best Sellers
Suman Kalyan Maity
Kellogg School of Management/Northwestern University
Microsoft Research India
Indian Institute of Technology Kharagpur
September 19, 2018
A book’s success/popularity depends on various parameters – extrinsic and intrinsic. In this paper, we study how the book reading characteristics might influence the popularity of a book. Towards this objective, we perform a cross-platform study of Goodreads entities and attempt to establish the connection between various Goodreads entities and the popular books (“Amazon best sellers”). We analyze the collective reading behavior on Goodreads platform and quantify various characteristic features of the Goodreads entities to identify differences between these Amazon best sellers (ABS) and the other non-best selling books.
We then develop a prediction model using the characteristic features to predict if a book shall become a best seller after one month (15 days) since its publication.
On a balanced set, we are able to achieve a very high average accuracy of 88.72% (85.66%) for the prediction where the other competitive class contains books which are randomly selected from the Goodreads dataset. Our method primarily based on features derived from user posts and genre related characteristic properties achieves an improvement of 16.4% over the traditional popularity factors (ratings, reviews) based baseline methods.
We also evaluate our model with two more competitive set of books a) that are both highly rated and have received a large number of reviews (but are not best sellers) (HRHR) and b) Goodreads Choice Awards Nominated books which are non-best sellers (GCAN). We are able to achieve quite good results with very high average accuracy of 87.1% and as well a high ROC for ABS vs GCAN. For ABS vs HRHR, our model yields a high average accuracy of 86.22%.
Direct to Full Text Article
25 pages; PDF.
About Gary Price
Gary Price (firstname.lastname@example.org) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com. Gary is also the co-founder of infoDJ an innovation research consultancy supporting corporate product and business model teams with just-in-time fact and insight finding.