Web API - Convert a PDF file
Extract transactions from a bank statement PDF file
- URL: /webapi/pdfconvert/makecsv
- Method: POST
- URL Params: None
- Data Params:
The data paramerters are passed via a Mulitpart form. An asterisk indicates required parameters.
license* – license code
product* – licensed product abbreviation, such as ‘pdfserver’ 'pdfservertp'
pdf-filename* – file to upload for conversion
webhook – url to call when conversion is complete, useful when some conversions are using OCR. For example 'www.mysite.com/done.php'. The page is called with the same JSON formatted data that is returned on a completed conversion.
readUSdates – flag – true for US Date format (M-D-Y), false for D-M-Y
writeUSdates – for csv output, similar to readUSdates
language – statement language – ISO abbreviations: en, es, fr, de, ln, pt, it
accounttype – one of Bank, CCard, Invst
doocrfile – flag to run OCR if the statement does not reconcile. Default (0) is to run OCR if statement is primarily an image. Set to 1 to always run OCR if the statement does not reconcile.
neverocr – flag to never run OCR. Default (0) is to run OCR if the file is an image file, or uses an encrypted font. Set to 1 to never run OCR.
nosepdates – flag if should treat a number like 0102 as January 2nd, rather than a plain number, such as a check number. Normally false, there are only 3 known banks that use this convention.
accountseqno – account number in the statement to process. 1 for first, etc. To process all accounts in a statement use -1.
monthseqno -statement number in the document to process. 1 for first, etc. To process all statement in the document use -1.
firstpage – first page number to process (if not page 1).
lastpage – last page number to process (if not last page in document).
combinelines – flag to combine description lines into one long description.
outputtype – transaction output format. Choices are “csv” or”json”. Default is csv.
logtype – type of log to create. Choices are “txt”, “htm” or “none”. Default is htm.
fixedcolumns – flag as to whether standard CSV columns are used. If not set, then the columns will in the same order found in the PDF file. Setting outputtype to json automatically sets this flag as
readempty – flag to read empty accounts and consider them reconciled if balances are found.
finduntrue – flag to evaluate credits for being untrue revenue. When used json output will contain an 'untrue' value for each transaction.
alloweddiff - value for allowing small balance differences to be ignored. Default is 0. A value such as 100 would mark the statement as reconciled if the balances were accurate within $100.00. - Success Response:
- Code: 200
Content: { JSON Structure with conversion results} see below
- Code: 200
- OR
- Code: 202 ACCEPTED
Content: { error : “Additional information.” }
- Code: 202 ACCEPTED
- Error Response:
- Code: 400 Bad Request
Content: { error : “User doesn’t exist” }
- Code: 400 Bad Request
- OR
- Code: 401 UNAUTHORIZED
Content: { error : “Invalid license code” }
- Code: 401 UNAUTHORIZED
- OR
- Code: 500 INTERNAL SERVER ERROR
Content: { error : “You are unauthorized to make this request.” }
- Code: 500 INTERNAL SERVER ERROR
Response logic:
A simple conversion of a text based PDF will return a 200 successful response with all the data. However if you are also making calls with scanned PDF's then conversions which requre OCR will return a 202 code indicating the conversion has been accepted. The recommended solution is to always use the webhook argument, and always process the json results received from the web hook. Alternatively one can poll for the OCR conversion to be completed, by making calls to webapi/pdfconvert/check (see below). If using polling it is critical that you check for error conditions indicating the conversion has failed so that that polling does not go on forever. A timer with a maximum value of 30 minutes is also recommended.
Output data:
The output is an array of results, with one structure of results for each account in each statement processed. The array will be of at least length one, even if no transactions were found.
-
- numtransactions – number of transactions found
- reconciled – flag if statement was reconciled
- numcredits – number of credits found
- numdebits – number of debits found
- totcredits_bd – printable total of credits
- totdebits_bd – printable total of debits
- startbalance_bd – starting balance printed on the statement
- endbalance_bd – ending balance printed on the statement
- endbalancecalc_bd – calculated balance (if did not reconcile)
- accountnumber – account number processed
- bankurl – url of the bank, a proxy for the bank name
- translations[] (if json transaction output was set)
- Each transaction has: date, description, amount, memo, checknumber, type, ocrmissinginfo, ocrsuspect
- Most of those are self-evident. The type is only present if there was a transaction type on the PDF statement. The OCR fields are only used on OCR conversions. ocrmissinginfo indicates the line was missing a data field and ocrsuspect is a low confidence numeric value (the first place to look if the statement did not reconcile)
- csvrows[]
(if csv transaction output was set)
-
- Rows of csv data with a header and one row for each transaction