pdf2csv Convert | Help

Getting Started

PDF2CSV Convert is a single step financial data translator that extracts financial transactions from downloaded PDF statements and converts them into CSV format suitable for spreadsheets or finance applications. PDF2CSV Convert+ adds PDF+, MoneyThumb’s integrated text recognition module to handle scanned PDF statements. See the section on Working with Scanned Documents and PDF+ for information regarding PDF+.

Use PDF2CSV Convert to import transaction data from statements downloaded from your bank or brokerage into a spreadsheet program such as Microsoft Excel® to edit or process.

To get started just download your PDF file from your bank, and verify your desired date format with the Settings Button.

Then use the File menu, Open PDF File to run the conversion and create your CSV file in a single step.

Installation

  • Microsoft Windows® full install
    • Download PDF2CSV.exe for Windows, save the file to your computer, and run the installation program by double clicking the file.
    • If you do not have have Java installed it will be automatically downloaded during the installation.
  • Mac OS X® full install
    • Download PDF2CSV.dmg for Mac OS X, save the file to your computer. Locate the file in the download area, open it by double clicking, then and run the installer.app by double clicking it.
    • If you do not have have Java installed it will be automatically downloaded during the installation.
  • Portable Installation
    • Download PDF2CSV.zip and save the file to your computer.
      • If running on Mac OS X the unzip will be done automatically as part of the download, and PDF2CSVportable.jar should be in your user download folder.
      • If running on other operation systems run zip or winzip on PDF2CSV.zip and extract PDF2CSVportable.jar to a suitable folder such as C:\Program Files\MoneyThumb.
    • Make sure you have Java installed on your computer. If you do not have Java already installed, download it for free at www.java.com.

Entering License Information

On Microsoft Windows, the easiest way to enter the license is to copy the license file PDF2CSV.lic from the product confirmation e-mail to the same folder where you installed PDF2CSV Convert – i.e. C:\Program Files\MoneyThumb\PDF2CSV.

Otherwise enter the license by copying the license string (CTRL-C) from the confirmation e-mail and pasting it (CTRL-V) into the license dialog. To enter the license string manually from within the program select the License button, and paste (or type) the full license code into the dialog.

After you enter your license, your license email will be shown in the program title bar, and in About.

Running PDF2CSV Convert

If you have the Windows version installed simply run the program from the start menu or run PDF2CSV.exe. If you are running the portable version, double-click PDF2CSVportable.jar.


Settings Dialog

Use the Settings button to bring up the Settings dialog:

pdf2csv Convert Settings

PDF Settings

PDF Password

If your PDF statements have a password that you need to enter in order to view them, then use the setting for Set PDF Password. The password is not saved for security reasons, so you need to enter a password each time you start PDF2CSV Convert. However, if you are converting multiple statements that require the same password, the password will be applied to multiple conversions in the same session.

PDF Page Range

If your PDF statement has multiple accounts (such as a checking and a saving account) you can restrict the pages converted so that only a single account is processed. Enter values into the dialog for Only Convert Pages from … to to specify a page range. The first value is the page number of the first page to convert, the second value is the page number of the last page to convert. All other pages will be ignored.

Year

Many banks simply use the month and date for individual transactions, with the year being elsewhere on the statement. PDF2CSV Convert
determines the calendar year from other dates in the statements, but if the year is not found correctly, this is an override.

The Spacing Factor is for PDF Statements that have extremely wide or narrow text. PDF2CSV Convert will automatically determine a good value, but if your initial conversion has extra spaces where they shouldn’t be, or no spaces where they should be, this is a way to override the calculated value.

Page Width is used when the PDF Statement has two columns for transactions but has extra text outside the viewable area that is causing the second column to be unrecognized. Use this value to override the page width manually, for example to 8.5. The value is always in inches.

Transaction descriptions need alphabetic characters is normally on. Turn this off if the converter is not recognizing transactions that only have a number as the transaction description or payee.

Allow dates without any separator can be used for banks that use a month-day format of ‘mmdd’, without any space, dash, or slash between the month and day. Setting this option may cause non-date values such as check numbers to be interpreted as dates, so use with caution.

Process statement as a single currency column

This option is used very rarely, but is for the case where your PDF statement has different columns of currency values, but the different columns do not identify debits versus credits. This would be evident when converting the statement in Preview mode. Most statements with multiple columns have one column for credits, one for debits, and perhaps another for balances. This is what the converter expects. However, if your bank created statements with different columns that do not identify debits vs credits then you would need to turn on this option. The only known bank that does this is PNC, where one column is used for checking withdrawals and another for check card debits. When this option is on, all the currency values will be assumed to be in a single column and the converter will rely on the section names or plus/minus values to distinguish credits versus debits. If your bank only has one column of currency values, then this option makes no difference.

Always run PDF+ text recognition (OCR)

Use this option when your statement may already have text from a previous OCR process. See the section on PDF+ for additional information.


Date Formats

PDF2CSV Convert can read and write dates either in US format (month-day-year) or European format (day-month-year). Use the Settings menu to select the date format for reading and writing. The read choice reflects what is used in your input PDF file, and the write choice is what format should be created in your CSV file. In most cases the read and write settings will be the same, but it is possible to read dates in one format and write them in the other.

Automation

To run PDF2CSV Convert from the command line or a script, invoke PDF2CSV Convert on Windows as:

PDF2CSV inputfile.pdf

Or for the portable version

PDF2CSVportable.jar inputfile.pdf

There is no need to specify an output file name, PDF2CSV Convert will use the same name and an CSV extension. The log will be written to a file with the same name and a .log extension, or ERROR.log if the input file name is invalid.

Note that if the output file already exists, it will be overwritten. And if your input file name has any spaces in it, remember to use quotes – for example:

PDF2CSV “input file.pdf”

To convert all the files in a folder use synatx such as:

PDF2CSV “c:\parent\pdf folder.dir”

where “c:\parent\pdf folder” is the full parthname of the folder with your PDF files. Note that you need to append “.dir’ to the folder name.
The log file will be created in the parent folder.

Working with Scanned Documents and PDF+

This section is only applicable if you purchased PDF2CSV Convert+ or the PDF+ AddOn. Running with PDF+ is virtually identical to running the normal version. In most cases the converter will recognize that the PDF statement does not contain readable text, and will automatically invoke the text recognition module. The main noticeable difference is that the text recognition takes much longer than files that don’t need text recognition.

When scanning documents to be processed by PDF2CSV Convert+ it is best to scan at a resolution of 300 dpi (dots per inch). Most scanners should have this as an optional setting. And obviously the cleaner and crisper the document scans, the better the recognition will be. A speck of dirt in the wrong place, such as making a keyword like ‘Debits’ unintelligible can throw off the entire conversion.

After text recognition and conversion, the converter will also automatically refine any transaction date or amount values that appear to be incorrect with Pin Point recognition, and redo the conversion. The number of values refined will be shown in the converter log. If there are still values that appear to be incorrect, the converter log will list the number of lines that should be manually corrected. They will be identified in the CSV file with an extra column containing a double question mark – “??”.

f you are converting scanned documents that had text recognition done by other Optical Character Recognition (OCR) software then you can choose to either use the text from the previous OCR software or use the MoneyThumb’s integrated PDF+ with Pin Point text recognition. The Settings option Always run PDF+ text recognition (OCR) tells the converter to always use PDF+ text recognition. If that box is unchecked PDF2CSV Convert+ will process those files using the searchable text created by your previous OCR. Always running with text recognition will take substantially longer, but will also generally get more accurate results than using the results from other OCR software. That is especially true when compared to free OCR software that may have come with your scanner.

Lastly, there are also a few banks that create PDF statements from images and a very few who create PDF statements with an internal encryption. Bank statements created from images should automatically be processed by PDF+, since no readable text will be found. Statements with an internal encryption will generate unusable text, so , they can only be correctly processed by using the Settings option above. You can recognize these statements by copying and pasting text from your PDF reader to any editing program, and seeing random text characters rather than the text you copied.

Trouble Shooting


Error Message: “No text found in the PDF file”

This error generally means that the PDF file is a image file, not a text based (or searchable) PDF file. Image PDF’s are created when scanning or a small minority of banks create a PDF statement with a few images rather than text. You can verify this by trying to select a line of text while viewing the PDF file in Adobe Acrobat. Depending on the type of PDF file, the selection are will either snap to a line of text, or just be a rectangle following the cursor.

Text Versus Image PDF Comparison

In either case you will need PDF2CSV Convert+ with text recognition in order to process this file. You could also use other OCR software, although MoneyThumb’s PDF+ is unique in being optimized for recognizing financial transactions.

Error Message: “No transactions found in the PDF file”

This error can be caused by anything from PDF2CSV not working correctly on your bank’s PDF file to a PDF file that has internal encryption or images that makes it impossible to convert.

A quick test is to verify whether the text in the statement is extractable. Open the statement with Adobe Acrobat, select the text for a transaction, and use Edit, Copy to copy the text to the clipboard. Open any kind of text or document editor (i.e. Notepad, Word, TextEdit, Pages) and paste the text into the program.
If the text does not paste correctly, then it’s an image, or somehow encrypted, and the statement can only be processed with PDF2CSV Convert+. You may need to turn on the Settings option for Always run text recognition (OCR).

If text was processed, there should be a line in the log like “Found 100 lines with a date, 90 lines with a currency value, 80 lines with both.” If this line is missing or the number of lines found is much lower than expected, then the statement has spacing, date, or currency formats that are not being recognized. You would need to send the file to MoneyThumb for further investigation.

If date and currency values were found, then the formatting of the statement is likely causing a problem. If your statement has multiple sections for different accounts, that can sometimes cause confusion. It can often be corrected by only processing pages for one section of the file at a time. Enter values for Only convert PDF pages from .. to .. in the lower right of the Settings menu.

If transactions are still not being found, you would need to send a test file to MoneyThumb for further investigation. We can send you a procedure to remove you personal information from the PDF statement.

Warning Message: “No separate credit/debit sections found. Verify plus/minus sign of amounts”

If the PDF2CSV Convert log ends with this message, then PDF2CSV Convert was unable to find distinct sections for credits and debits in the PDF statement. If your PDF statement has plus and minus signs, then you should check that they are correct when processing the CSV file.

Warning Message: “Credit/debit columns not identified. Verify plus/minus sign of amounts”

This warning is similar to the warning above regarding credit/debit sections, but PDF2CSV Convert did find separate columns for credits and debits, just could not determine which is which. Therefore, you should simply ensure that the debits and credits columns are identified correctly when processing the CSV file. If the columns in your statement do not distinguish credits and debits, then you should use the Settings option for Process statement as a single currency column.

Transactions have an incorrect year

Most bank statements don’t have the year on individual transactions but have the year in the statement date. This is normally picked up by PDF2CSV Convert. However if the statement date is not present or there are other dates found in the statement, sometimes all the transactions will have an incorrect year. To override the year value found in the statement, specify a year in the Settings menu, using the Year value on the lower right.

Saving the PDF2CSV Convert Log

After PDF2CSV Convert has run, you may wish to save the log information to a file. Select the File menu, and the Save Log menu entry. This will bring up a File Save dialog. Simply specify a file name and select Save.