Digitizing Books and Media: About This Blog

NOTE:

This blog has moved permanently to a new home! You will be redirected there in 30 seconds, or just go to:

http://BOOKS.DOTIKE.NET/

Please update your bookmarks!

--

I spent a few years with the majority of my books in boxes, and lived off my harddrives. Not only did I get very comfortable reading works on-screen, but I became addicted to how malleable the digitized content is. A simple text search is an astoundingly powerful thing.

Now I want MY BOOKSHELF searchable, hyper-linked, hackable.

--
This blog seemed like the most appropriate place to collect my notes, and document my progress, as I begin a very long-term project- digitizing my book collection. Eventually, I would like to ideally extend the project so that every piece of media I own is catalogued, and the content searchable in useful ways. I've been told by many people this is a fairly ambitious undertaking. After hacking around the idea a bit, I get it.

--
This project has 3 main components (which I'll treat separately):

1) Book Scanning
The act of physically scanning the books is itself a challenge. My aim here is completeness, while balancing various losses in the translation from physical to digitized form. Additionally, minimally damaging processes are my focus, as this is my personal book collection.
There is a great deal of tools and information, of all shapes and sizes, worth exploring- but at the end of the day, the job has to simply get done...

2) Book Processing
This encompasses storage, OCR processing, data storage formatting, and presentation formatting. My aim here is to siphon content value mechanically as much as possible, (e.g. I don't plan to proofread it all in my lifetime), however I do wish to treat some texts, even some fragments of texts, with incredible care.
Cumulative 'polishing' is my gameplan here- to constantly make it simple to update and clean the data as I use it.

3) Media Indexing
This is what I like to call 'The Amazon Challenge', to create a usable catalog of the book media, which naturally leads to catalog *all* my media- (just as it led Amazon to catalog nearly every product imaginable). I've got between 3 and 4TB of data CD's and DVD's stretching back through 15 years of my life, with all sorts of things I'd love to be able to find in there...

Each aspect of this project has it's own particularities- and after hacking around for a month or so on this, and spending a great deal of time researching various aspects of this project, I'm exited to start getting somewhere.

--
People have already repeatedly asked me, why the heck are you doing this?
First off, because it's fun.
I spend a lot of time in my books, I'm a pretty nerdy and a media junkie. My tastes are more and more esoteric as I get older, and I find mainstream internet companies and services understandably disappointing. For example, Amazon's suggestions can't figure me out at all- no matter how involved I get in their site, (or how many books I buy).

At this point in my life, I find myself referencing a wide array of my own materials all the time- and I constantly am frustrated by how much I miss. I believe I won't have time to re-read much of what's on my bookshelves in the rest of my life, yet so many works have relevant components I re-visit all the time. The more I reference works, the more I find other things in the materials which I really wanted perhaps a month earlier...

So with that, I see this as a way to make computing machines serve me better, the notes presented here I hope can help others who have similar aims and projects!

If you wish to contact me, please leave a comment on a particular post, and I'll try to get back to you!

Labels: 3 parts - Book Scanning, Book Processing, Media Indexing

Digitizing Books and Media

Sunday, August 26, 2007

About This Blog

About Me

Previous Posts