PDF Recovery Overview

How it Works
The utility consists of a single class, "PdfRecovery" with a method "RecoverPdf" which accepts a byte array of the Compound File Binary file (in our case, the contents of the Oracle long-raw field). The OpenMCDF component is used to access the file contents in the byte array. The code loops through all available memory streams in the file, searching each for the presence of the PDF's start and end of file fields (%PDF and %%EOF). If found, the code truncates the contents by removing all bytes before the %PDF and after the final %%EOF and returning the result as a byte array. The resulting byte array is then saved to a file.

A second function "RecoverPdfWithStatus" returns a "PdfRecoveryStatus" object which contains the PDF document as a byte array (same as above) along with the PDF version and the name of the memory stream in which the file was found.

Memory Stream Names

The OpenMCDF provides a function for accessing memory streams by name, but we found we were unable to use that for two reasons. First, we've found documents under at least three different names and have been unable to come up with a way of determining which name should be used for a given file.

The second problem was that we found the stream names encoded by the Oracle Forms application were frequently preceded by non-character values that couldn't be used in a function call. For example we found the stream name " CompObj" in which the space before the "C" is a byte value of 0001h, which proved impossible to pass to the OpenMCDF function that would return that stream by name.

Last edited Jul 21, 2012 at 5:41 PM by DaddyUnit, version 1


No comments yet.