Batch Extract Text from PDF

Batch extract text from PDF allows you to extract text from multiple PDF documents. For each document the batch process will output a separate text file with the text contents of that document.

Note: If the document does not contain text (for example: scanned documents or images) it will need to be OCR'd first so that there is text to extract.

 

How to Extract Text from a Batch of PDFs

  1. On the toolbar bar go to the Batch Tab > Convert to > Text
  2. Set the options for the batch process. Additional details for each of the settings are available below.
    • Using the File List select the files that need to be processed
    • Set the destination settings for the processed batch files
    • If needed, set any open passwords to be attempted when processing files
  3. Once all of the settings are complete, click on Start... to begin the batch process.

 

Batch Extract Text Settings

File List

Add Files - Displays a file chooser to add individual files to the list.

Add Folder - Displays a file chooser that adds the contents of a directory to the list.

Set Default Batch Directory - When checked, all files from the default batch directory will be added to the File List each time a batch dialog is opened.

Include Subfolders - When checked, will include any supported file types found within sub folder of the chosen default batch directory.

- Removes the selected file(s) from the list.

- Moves the selected file(s) up the list.

- Moves the selected file(s) down the list.

- Moves the selected file(s) to the top of the list.

- Moves the selected file(s) to the bottom of the list.

Save Files To

Destination Folder

Use Source Folder - When this option is selected, the original folder for the PDF document (in the batch process) will be used to save the output files.

Destination Folder - This option allows you to set a destination folder to place all of the processed files. You can type the destination manually or click on the "..." button to open a directory chooser to set the destination folder

  • Preserve Folder Structure: When checked, the output files will be placed within a new folder (within the specified destination folder) using the file's parent directory name.

File Name Pattern

Use Source Filename - Will save the document using the same original name. If another file exists in the directory, a number will be appended to the output file name, to avoid duplicate file names.

New Filename - When this option is selected, you will need to enter a new filename used for the output files. Each document name will have an incremental counter starting at zero appended to the file name entered in this field. Custom variables may also be used to further distinguish each of the file separations. The available variables are:

  • $filename - The file name (no extension) that the document was opened from
  • $counter - An automatically incrementing number
  • $day - The day of the month
  • $month - The current month, using two digits
  • $year - The current year, using four digits
  • $shortyear - The current year, using two digits
  • $second - The current second
  • $minute - The current minute
  • $hour - The current hour, 1-12
  • $ampm - AM or PM
  • $longhour - The current hour, 0-23

Overwrite Files - When set, if a file with the same name already exists in the directory it will be overwritten with the newly output document.

Note: This CAN NOT be undone. Make sure that you have all your settings correct prior to starting the batch process

Passwords to try when opening documents

To set a password click in the password field or on the Edit button. Then enter the password you want to be used. Do this for up to four passwords to try on password protected PDFs during the batch process.

Note: The passwords entered here will only be used for this batch process and will not be stored anywhere else. Passwords will have to be entered for each new batch process.