Sunday, August 26, 2007

NY Times Says... "Scan This Book!"


This blog has moved permanently to a new home! You will be redirected there in 30 seconds, or just go to:


Please update your bookmarks!


This article really interests me, but plainly because it shows that other people REALLY want the same thing I want- (I feel less crazy when people ask me why I'm scanning, now that the NY Times says it's OK :) It's a year old, but WAY relevant today- they explore the massive idea of the Internet becoming "the universal library".

Seriously though, this article is really interesting in a few areas, I'll call it 'The Pandora's Box Article'. The author blows the lid open on several issues, and follows what I see as some noteworthy academic trends in thinking (however flawed or naive the ideas are, they represent ideas that have major backing- it's the current discourse). I love some of the thoughts presented, but I hate much of the tone- and ideologically revealing vocabulary.:

New York Times, May 14, 2006
"Scan This Book!"

"The universal library should include a copy of every painting, photograph, film and piece of music produced by all artists, present and past. Still more, it should include all radio and television broadcasts. Commercials too. And how can we forget the Web? The grand library naturally needs a copy of the billions of dead Web pages no longer online and the tens of millions of blog posts now gone — the ephemeral literature of our time. In short, the entire works of humankind, from the beginning of recorded history, in all languages, available to all people, all the time."


One of my favorite stories is the one about the Tower of Babel.

Regardless of the usual techno-hubris I oozing out of this article, the author does hit on most of the big current thoughts in book/library digitization, let me rip through them (though each topic deserves deeper discussion):

1. Scanning the Library of Libraries
This is the tower of babel thing. The aim is admirable, but so much of the vocabulary used in this kind of description leads me to believe many people truly aren't fully comprehending the consequences, or the scale. Additionally, many of them feel there will be 'an end' to it all, but media breeds media- exponentially- it seems to have been that way forever. Additionally, every act of cataloguing leads to some sort of revision, in the context of the scope of understanding of it's creators. Therefore, once something like 'the universal library' exists, it will make itself obsolete by revealing it's own inadequacy.
Oh btw- we gotta' hire a LOT of people in China to scan everything at a cut rate so we can accomplish this humanitarian goal. Let's go, chop chop. :|

2. What Happens When Books Connect
Yeah, hyperlinks yadda yadda. But machine thinking sucks for deciding the relationships... (AI?)

3. Books: The Liquid Version
Search is good, byte streams beat paper in the usefulness category- hands down. This discussion I really find interesting, insomuch as it touches on the borders between the frame of a work which have been blurred with the internet. This is right in line with thoughts I've had about the beginning of the conceptual breakdown of the Filesystem metaphor, whose hierarchy is now being challenged with various technologies, (relational database design, the database storage of the BeOS Filesystem, the MacOSX 'Smart Folders' which are merely a relational collection, etc...). This stuff isn't new, expect a blog post in the future on this topic...

4. The Triumph of the Copy
Guttenberg, yeah! Then came copyright. These stats are really cool, and make me feel less crazy with my esoteric tastes:
"In the world of books, the indefinite extension of copyright has had a perverse effect. It has created a vast collection of works that have been abandoned by publishers, a continent of books left permanently in the dark. In most cases, the original publisher simply doesn't find it profitable to keep these books in print. In other cases, the publishing company doesn't know whether it even owns the work, since author contracts in the past were not as explicit as they are now. The size of this abandoned library is shocking: about 75 percent of all books in the world's libraries are orphaned. Only about 15 percent of all books are in the public domain. A luckier 10 percent are still in print. The rest, the bulk of our universal library, is dark."

5. The Moral Imperative to Scanning
Google's scan plan,

6. The Case Against Google
Oy vey, another case? The skinny is interesting. Google needs 'Good' (trusted) information to make it's search engine, (and ad-sense) better. So, when they scanned some 70% of the world's copyright-protected books for themselves, the world was too focused on how they accomplished this MASSIVE feat using nitty-gritty methodology, (and in astonishing speed- most of it done in 9 months or so). Then when they made the book search another 'Public Beta', everyone freaked out, and the mainstream (and market) focused on copyright/humanitarian issues with content production.
What most people missed, is that this massive archive was created TO MAKE THEIR SEARCH INDEXES BETTER. It's a taxonomy and statistics algorithms thing. Interestingly enough, they found the copyright loophole- nobody said they couldn't use copyrighted materials privately to make their information business make money! Very interesting trick, that most people still don't understand.

7. When Business Models Collide
Oy vey- this is a HUGE topic, this is DRM, Copyright, DMCA, RIAA, MPAA, Hollywood, and the Publishers, (he leaves out the Telcos- who vie for a massive stake in content).

8. Search Changes Everything
This is basically the discussion of an idea I think is really dirty, the idea that the search engine really helps us 'Find What we are looking for'. Kindof, nut this again is where I'll cite Babel and promise to elaborate in abother post- with one of my favorite stories from my misadventures in library science...

Look, I may sound negative, but I REALLY LIKE this author- this article puts him right in the middle of the things I think and care about in media, software, networks, and in general, information culture.

I've made a fistfull of blog post spawned from thoughts in this one- like I said at the start, this is 'The Pandora's Box Article' for me...

Labels: ,