Monday, August 27, 2007

Scanning Books, (quell technolust)


This blog has moved permanently to a new home! You will be redirected there in 30 seconds, or just go to:


Please update your bookmarks!


(manufacturer demo image [Remember, all demos are rigged])

Clouds and thunder:
Thus the superior man
Brings order out of confusion.
- again, The I Ching, China, around 2850 BC'

This is a followup to my previous post on Book Scanning, where I started with a cheap little flatbed scanner. Now I'm going to focus on Speed, and go over what I've found out about Fully-Automated scanning solutions.

Don't get too exited about this post, none of these results are practical- but all of it interesting and worth mention- I'd simply be a fool not to at least explore every possibility at this stage... In the end, I found that most mechanical answers just don't seem to be all they're cracked up to be- robots will not make my problems go away, and it seems, they could distract me from my real goal- scanning my book collection. I could also be very wrong with any observations below, I'm weighting my practical experiences against whatever information I can find online, I HAVE NOT TRIED ANY OF THE SYSTEMS DISCUSSED BELOW! I'm just trying to apply common sense. Lets get on with it...

My little engine that could, a Canon CanoScan LiDE 500F:

To recap, good things about my current cheap flatbed setup are:
  • Simplicity. The scanner is powered completely by USB even, so only 1 cable!

  • More than adequate quality for scanning reflective materials

  • Canon Software is easy to setup for preset output, (300 DPI, full color)

  • It's a great start, for almost no cost.

Scanning problems I need to solve as I continue are:
  • Speed, currently between 8-14 seconds per scan, this could be much better.

  • A typical flatbed scanner is not ideal for fragile or rare materials (small-ish hardcover books are great on flatbeds btw, very durable!)

  • Large volumes have 'spine shadow' problems, and also focus problems if content lifts off the flatbed near the spine.

The Cut-Spine, Sheetfed Scanner Method

To get it out of the way, a solution to take seriously, is to cut the spine off the books- and use auto-page-feed scanners.

This approach is out of the question for me, as I plan to keep my books intact, however, it's the first approach people have brought up with me- based on Google's widely publicized successes implementing this technique- for their google books project.
For smaller-scale projects, any Kinkos can cut off the spine for $1 per cut, using big paper-ream cutters.
Once the pages are loose, there's TONS of 'Sheetfed' or 'Document' scanners on the market, which take reams of paper to scan. Prices range from $150 to $10k+, but from what I've seen, models in the $300 range seem to be more than adequate.

But, the benefits in speed here, are trumped by the value of my actual books. It made sense for Google, the vast destruction was an investment in making their search engine indexing better- (taxonomy and trust, right...)- but actually reading, or loaning out, my books collection is far more valuable than destroying them. And then there's my autographed copies, stuff written by friends and acquaintances... all priceless to me. I'd thought of re-packing the loose pages into some binding which I could shelve, but that's just silly- it's messy enough having a lot of books... Add to this the hassle of carting books back and forth to Kinkos, a good workout for any hacker nerd like me, but I'd rather be skateboarding :)

I'm not much for the 'rare book' scene, nor am I an archivist. I mean I really go after some relatively rare books, but I happily prefer 'reference copies', (rare stuff that's banged up with use or has notation). So this issue isn't about monetary value, it's about the fact that I simply want to keep using the books themselves- and preserve them within reason as I proceed. I must find balance speed gains with a less violent solution...

Mechanical Page Turner Systems
So my thinking next is, if I somehow can go full-auto with the scanning, I can spend time on the other aspects of this project... It seems my thought is not a new one, people have been trying to make good automated page-turners for A LONG TIME.

For example, take this mid-20th century example from engineering at MIT, (which cites some Da Vinci invention I can't find anywhere):
"Automatic page turner could help musicians and the disabled"
The article is about mechanical engineering professor Ernesto E. Blanco, and his fantastic page turner- (which seems not to exist as a product, and designs are not available). So it either is very special purpose, as it was created for turning sheet music, *or* it outright sucks.

Lego my ego, (is that a Beach Boys song?)

Some people's perseverance and ingenuity blows my mind, a man made a fully auto, page turning, book-scanner- out of Legos. Yes, Legos. Reading the webpage about it, about it nearly made me weep in awe.

Seriously folks, please go look at the webpage of MURANUSHI Takayuki, featuring his book scanner. I almost want to go to Japan, just to buy this guy a drink.

Now for me, I'm not about to dive into any Lego engineering, so lets move on to some commercial automated offerings.

With page-turning on the brain, this looks pretty darned cool- and has loads of buzz about it on the net:

Atiz BookDrive, a portable automatic book scanner

OK, lets look at the specs from their site:

For my project, it seems their machine meets my needs about as well (or poorly) as my cheap-o flatbed, except this thing is fully automated. Just imagine, put in a book, and press 'Go!'!
  • It can't handle my many fragile rare books.

  • It requires windows (I'll accept Mac, but would love UNIX/X11/Shell compatability)

  • Cruddy advertising rambles regarding Speed, (look at the PDF for full specs!)

    • 1 page at 300 DPI (color?) takes 32.5 sec., according to their docs.

    • I can't tell if this is for both open pages at once, (16 sec. per page then?)

  • Can't find many user reports regarding page jams, practical experiences

  • Internet Rumored Pricing, (have to email for quote) $35,000

  • Suffers from software/hardware integration problems. (I'll explain more about why they should be as separate as possible, once I start posting about Book Processing- the software part...)

I don't think this is a good solution for me at all. Perhaps it's great for a research library, with massive volumes of similar-sized textbooks to scan, and appropriate budgets for this kind of thing- but I still consider it an 'experimental' technology, until some independent source shows some real progress with it.
Someone said to me I'm just whining because I can't afford one. While that truly isn't my attitude, the price to value to risk ratio here is just outside the scope of sanity for me. (If this product really worked well, I may as well spend the next year or two working my knuckles to the bone, eating rice every day- and buy one, it will take me years to scan all my books anyhow).

Mucho Macho Product, the Orgasmatron of book scanning:

The Kirtas-Tech "APT 1200" and "APT 2400", models respectively. (Just like Atiz, awesome animations on their site)

From their website literature:
"The APT BookScan 1200™ is based on a disruptive digital imaging technology initially developed at Xerox PARC and protected by several patents which have already been issued and some pending."
The APT BookScan 1200 and 2400 cost $75,000 and $189,000 each, respectively, (again, web-rumors, you have to email their sales reps for a quote).
Disruptive digital imaging technology?!? Sounds kindof awesome- but does this mean it will eat my books, like paper jamming in a printer? What if my cat jumps in there while it's on?
I'm scared. However, this machine WOULD LOOK AWESOME in my living room. I love sexy industrial design. But seriously, I simply can't take this thing seriously.

Let's step aside with the mechanical page-turning idea:

On on the topic of mechanical page turning, a good friend said something like:
"I don't think it's really possible- at least not in a way which helps people scan books faster than a flatbed. Pages stick together and are inconsistent, different thicknesses, etc... Heck, the design of books is a somewhat poor user interface to begin with."

After handling a lot of books, and thinking a lot about handling them, I tend to think he's right- ESPECIALLY after watching one of the big-machines in action on YouTube, (important: note the human hand required to keep pages down for this particular book, what's the point?):

(I don't have $70k for this, but run with me here...) For the $70-189k price tag, I could just hire a small army of high-school kids in my Brooklyn neighborhood to do the flatbed scanning as an after-school job, and likely get WAY more speed and value for my scanned dollar, while investing in my neighborhood economy. I'm certain the high-school kids on my block would do a far better job than any state-of-the-art mechanical device,

Dontcha' love rigged demos? I couldn't resist this one :)

Copy-Stand-ish solutions:

There are tons of other high-end solutions for book scanning, but right off the bad they don't look like they solve my problems- as they are aimed at high-end conservation/archivist applications. Many of them look like fancy photo copy-stands, and look good for massive reprography applications as well. They also seem to go by the moniker "Face-Up Publication Scanners". One has to manually turns the pages, (good), but these all-in-one units are again prohibitively expensive (think $12,000 range).

I won't get too deep into this tangent, but here's a look at some well received models, for the record:

"Bookeye Color Planetary Scanner"

Minolta PS 7000 Digital Publication Scanner

Time to move beyond all this technolust and focus on solving the problem- finding better tools for Non-Destructive Home Book Scanning.

I'll focus in on more practical solutions, in my next posts regarding Scanning Books!

P.S. Regarding the Kraftwerk images in this post, I dearly love early Kraftwerk, (pre Autobahn album, 1975). Their work after 1975 ties directly into fueling what I feel is a totally annoying idealogy for the Western world, schizophrenic "Techno-Worship/Nihilism"- the ancestral ideals behind today's "Cult of the Algorithm". But this is a personal rant, and not really relevant in this blog about my book scanning... (or is it?) I'm merely stating, I don't believe in technology.