Project Description
This project is all about creating and sharing code used to recover PDF documents that have been uploaded and stored in Oracle databases using "Oracle Forms" applications.

Background

The Problem
How to extract thousands of Adobe Acrobat PDF files that had been uploaded to "long/raw" fields in an Oracle database through an Oracle Forms application and move them to BLOB fields in a new table for access by a new web-based application. The PDF files are stored in the database in an unfamiliar format. Simple binary extracts resulted in unusable files that can not be read with Adobe Acrobat. The PDF files 'can' be accessed individually through the Oracle Forms application, where they open in Adobe Acrobat, but we have no way to automate this process.

Oracle Unable to Provide a Solution
The Oracle Forms application also allowed users to upload Microsoft Word, Microsoft Excel and PDF documents and that the files that were somehow saved using OLE containers. Oracle provided us code to recover the Microsoft documents from the database and save them to a file system where they could be accessed normally through their Microsoft applications. Unfortunately when the same code was used to recover PDF documents, the resulting files could not be opened or viewed with Adobe Acrobat applications.

The Solution
After several attempts to analyze the data structure of the files and build several different decoders (some of which were previously posted on this site) we discovered that the files were saved in Microsoft's Compound File Binary format (MS-CFB). We were able to build a simple extraction utility using the OpenMCDF component.

Previous Content
Previous content on the site included a good deal of byte-by-byte analysis of the data structure and thoughts on how to extract the PDF files from it. With the discovery of the MS-CFB and the OpenMCDF component, much of that content is no longer relevant and has been removed.

Contents

  • PDF Recovery Overview
  • Oracle Forms Code Fragments of code in our Oracle Forms application used to upload and download PDF files. Also the code used to recover Microsoft Word and Excel files from the database to the file system.
  • Links to other resources

Last edited Jul 21, 2012 at 5:58 PM by DaddyUnit, version 19