We’ll admit it’s practically impossible to count every book that has ever been written, but in order for Google Books to successfully catalogue the world’s supply of printed knowledge, the company needs an estimate of the amount of books it needs to scan. That’s why Google set out on the task to do just that.
well I own a few hundred of those 129,864,880 books |
Google admits their definition is imperfect, but it’s workable and similar to what ISBNs are supposed to represent. ISBN, or International Standard Book Numbers, are designed to be unique identifiers for books. Because they’ve only been around for 30-40 years and are used in mostly Western countries, they can’t be used by themselves. That’s why Google took data from the Library of Congress and others to find as many books as possible — one billion raw records by the company’s count.
Here’s where Google’s engineering talent comes into play. The company used countless algorithms to determine and discard duplicates in an effort that required more than 150 pieces of metadata related to the world’s books to evaluate whether each book record was unique or a duplicate of another. Analyzing this data resulted in 210 million unique books.
Next, Google subtracted the millions of microforms, audio recordings, maps, t-shirts, turkey probles (yes, turkey probes) and videos with IBSNs, arriving at a much more reasonable number of 146 million. Finally, the company removed 16 million government document volumes from their estimate, getting to the 129.8 million count they announced today. Of course, publishers are issuing new books even as this post is being typed, so the company is constantly recalculating the book count. CHECK OUT THE DETAILED GOOGLE BOOKS BLOG POST HERE
No comments:
Post a Comment