Blacklight Used to Extract Meaning from Cursive Script, Allowing Scanned Documents to be Searched
In 1973, a fire broke out at the St. Louis National Personnel Records Center, destroying 16 to 18 million military service records from 1912 to 1964. If these records had been digitized they’d have been safe, but not necessarily any more accessible. Scanned PDF images, the low-cost, high-speed method for digitizing images, can be duplicated and stored in many places. But you can’t find anything in them, except by a human being searching through the handwritten text by eye. And the 1940 U.S. Census, for example, consists of 3.6 million PDF images. Commercial services like Ancestry.com employ thousands of human workers who manually extract the meaning of a small, profitable subset of these images so they can be searched by computer, says Kenton McHenry of the National Center for Supercomputing Applications (NCSA), To read further, please visit http://psc.edu/index.php/88-biannual-report-and-science-book/biannual-report/937-breaking-out-of-the-digital-document-graveyard.