This was my initial part 2 in a series about a project to read/import a large collection of home-made optical media. The canonical page for this material is now here, Part 1 was Imaging DVD-Rs: Overview and Step 1, the summary page for the whole project is imaging discs.

Last time we prepared for the import by gathering all our discs together and organising storage for them in two senses: real-world (i.e. spindles) and a more future-proof digital storage system for the data, in my case, a NAS. This time we're actually going to read some discs. I suggest doing a quick first pass over your collection to image all the trouble-free discs (and identify the ones that are going to be harder to read). We will return to the troublesome ones in a later part.

For reading home-made optical discs, you could simply use cp:

cp /dev/sr0 disc-image.iso

This has the attraction of being a very simple solution but I don't recommend it, because of a lack of options for error handling. Instead I recommend using GNU ddrescue. It is designed to be fault tolerant and retries bad sectors in various ways to try and coax every last byte out of the medium. Crucially, a partially imported disc image can be further added to by subsequent runs of ddrescue, even on a separate computer.

For the first import, I recommend the suggested options from the ddrescue manual:

ddrescue -n -b2048 /dev/cdrom cdimage.iso cdimage.log

This will create a cdimage.iso file, hopefully containing your data, and a map file cdimage.log, describing what ddrescue managed to achieve. You should archive both!

This will either complete reasonably quickly (within one to two minutes), or will run potentially indefinitely. Once you've got a feel for how long a successful extraction takes, I'd recommend terminating any attempt that lasts much longer than that, and putting those discs to one side in a "needs attention" pile, to be re-attempted later. If ddrescue does finish, it will tell you if it couldn't read any of the disc. If so, put that disc in the "needs attention" pile too.

commercially-made discs

Above, I wrote that I recommend this approach for home-made data discs. Broadly, I am assuming that such discs use a limited set of options and features available to disc authors: they'll either be single session, or multisession but you aren't interested in any files that are masked by later sessions; they won't be mixed mode (no Audio tracks); there won't be anything unusual or important stored in the disc metadata, title, or subcodes; etcetera.

This is not always the case for commercial discs, or audio CDs or video DVDs. For those, you may wish to recover more information than is available to you via ddrescue. These aren't my focus right now, so I don't have much advice on how to handle them, although I might in the future.

labelling and storing images

If your discs are labelled as poorly or inconsistently as mine, it might not be obvious what filename to give each disc image. For my project I decided to append a new label to all imported discs, something like "blahX", where X is an incrementing number. So, for a fourth disc being imported with the label "my files", the image name would be my_files.blah5.iso. If you are keeping the physical discs after importing them, You could also mark the disc with "blah5".

where are we now

You should now have a pile of discs that you have successfully imported, a corresponding collection of disc image files/ddrescue log file pairs, and possibly a pile of "needs attention" discs.

In future parts, we will look at how to explore what's actually on the discs we have imaged: how to handle partially read or corrupted disc images; how to map the files on a disc to the sectors you have read, to identify which files are corrupted; and how to try to coax successful reads out of troublesome discs.


Comments