As you’ve doubtless noticed by now, I love digitization projects that allow, nay, desperately need random nerds such as ourselves to root around in historical archives. This one is easily my favorite because it gives you the chance to read entire issues of the weekly magazine Charles Dickens owned and edited for 20 years.
The magazine saw the debut in serial form of some of Dickens’ most famous works like Great Expectations and A Tale of Two Cities, but that’s just the tip of the iceberg. The journals contain articles about society and politics, exposés of appalling conditions in factories and prisons, dispatches from the front of mid-19th century conflicts, and the literary stylings of luminaries like Wilkie Collins, Elizabeth Barrett Browning and Elizabeth Gaskell. Every Christmas Dickens would collaborate with some of them to create seasonal plays and stories.
The journal began in 1850 as Household Words. Dickens owned half of it and his agents Forster and Wills owned another quarter. The remaining shares belonged to his publishers, Bradbury and Evans. That 25% was enough to guarantee them interference, so when they and Dickens had a falling out in 1859 the author decided to start a new magazine over which he would have complete creative control.
He took Bradbury and Evans to court (Chancery Court, no less, the systemic maelström at the center of Bleak House) to win back the rights to the trade name “Household Words” but wasted no time getting the new venture off the ground. All the Year Round debuted on April 30, 1859. One month later Dickens won his case and folded Household Words into All the Year Round. He continued to edit the magazine until his death in 1870.
There are 1,101 editions of Dickens’ weeklies, that’s 33,000 pages. Starting in 2006, a valiant team of three people at the University of Buckingham has been working on scanning and digitizing them all so that they can be readable and searchable on the Internet. It’s an immense project, however, because even a high-resolution document scan is still replete with OCR errors and extraneous data, and it takes one person a lot of time to copy-edit a billion words. They’ve only managed to go through about 15% of the archive thus far.
Enter the Online Text Correction (OTC) Project.
All though the image files were created using a state-of-the-art scanning device, the quality of the original journal pages varied and some contained paper folds, smudge marks, transparency, etc. and as a result the text files contain a number of errors that vary from file to file. This is the main dilemma that we are trying to correct. A secondary problem, relatively trivial, is that the text file contains unwanted information and styling, which can also be corrected at the same time as the actual mistakes.
We have decided to make a magazine, typically 24 pages long, the smallest unit of contribution and as a result we will have 1,101 units of work at the end of the day. So if we find around 1,000 volunteers to take on 1 or 2 magazines each, we will reach the target between us. We reckon that with a typical magazine, it will take about 10 minutes to review and correct each page = 240 minutes or 4 hours’ work).
I love this approach. It means that you get to read an entire issue cover to cover, fixing OCR and formatting errors. You get all the pleasure of curling up with a stack of old Dickens magazines while at the same time helping ensure they will be available at the click of a mouse in perpetuity. The goal is to get the entire archive online in time to launch the new Dickens Journals Online website by February 7, 2012, the bicentennial of the birth of Dickens.
To help them accomplish this laudable goal while getting the chance to immerse yourself in Victorian society, register on the OTC Project website. Once you’re logged in, you select an uncorrected issue from the Magazine Index and dig in. Scroll down on this page for details on how they want the text formatted and corrected.