Web API - Convert a PDF file

Extract transactions from a bank statement PDF file

  • URL: /webapi/pdfconvert/makecsv
  • Method: POST
  • URL Params: None
  • Data Params:
    The data paramerters are passed via a Mulitpart form. An asterisk indicates required parameters.
    license* – license code
    product* – licensed product abbreviation, such as ‘pdfserver’ 'pdfservertp'
    pdf-filename* – file to upload for conversion
    webhook – url to call when conversion is complete, useful when some conversions are using OCR. For example 'www.mysite.com/done.php'. The page is called with the same JSON formatted data that is returned on a completed conversion.
    readUSdates – flag – true for US Date format (M-D-Y), false for D-M-Y
    writeUSdates – for csv output, similar to readUSdates
    language – statement language – ISO abbreviations: en, es, fr, de, ln, pt, it
    accounttype – one of Bank, CCard, Invst
    doocrfile – flag to run OCR if the statement does not reconcile. Default (0) is to run OCR if statement is primarily an image. Set to 1 to always run OCR if the statement does not reconcile.
    neverocr – flag to never run OCR. Default (0) is to run OCR if the file is an image file, or uses an encrypted font. Set to 1 to never run OCR.
    nosepdates – flag if should treat a number like 0102 as January 2nd, rather than a plain number, such as a check number. Normally false, there are only 3 known banks that use this convention.
    accountseqno – account number in the statement to process. 1 for first, etc. To process all accounts in a statement use -1.
    monthseqno -statement number in the document to process. 1 for first, etc. To process all statement in the document use -1.
    firstpage – first page number to process (if not page 1).
    lastpage – last page number to process (if not last page in document).
    combinelines – flag to combine description lines into one long description.
    outputtype – transaction output format. Choices are “csv” or”json”. Default is csv.
    logtype – type of log to create. Choices are “txt”, “htm” or “none”. Default is htm.
    fixedcolumns – flag as to whether standard CSV columns are used. If not set, then the columns will in the same order found in the PDF file. Setting outputtype to json automatically sets this flag as
    readempty – flag to read empty accounts and consider them reconciled if balances are found.
    finduntrue – flag to evaluate credits for being untrue revenue. When used json output will contain an 'untrue' value for each transaction.
    alloweddiff - value for allowing small balance differences to be ignored. Default is 0. A value such as 100 would mark the statement as reconciled if the balances were accurate within $100.00.
  • Success Response:
    • Code: 200
      Content: { JSON Structure with conversion results} see below
  • OR
    • Code: 202 ACCEPTED
      Content: { error : “Additional information.” }
  • Error Response:
    • Code: 400 Bad Request
      Content: { error : “User doesn’t exist” }
  • OR
    • Code: 401 UNAUTHORIZED
      Content: { error : “Invalid license code” }
  • OR
    • Code: 500 INTERNAL SERVER ERROR
      Content: { error : “You are unauthorized to make this request.” }

Response logic:

A simple conversion of a text based PDF will return a 200 successful response with all the data. However if you are also making calls with scanned PDF's then conversions which requre OCR will return a 202 code indicating the conversion has been accepted. The recommended solution is to always use the webhook argument, and always process the json results received from the web hook. Alternatively one can poll for the OCR conversion to be completed, by making calls to webapi/pdfconvert/check (see below). If using polling it is critical that you check for error conditions indicating the conversion has failed so that that polling does not go on forever. A timer with a maximum value of 30 minutes is also recommended.
Output data:

The output is an array of results, with one structure of results for each account in each statement processed. The array will be of at least length one, even if no transactions were found.

    • numtransactions – number of transactions found
    • reconciled – flag if statement was reconciled
    • numcredits – number of credits found
    • numdebits – number of debits found
    • totcredits_bd – printable total of credits
    • totdebits_bd – printable total of debits
    • startbalance_bd – starting balance printed on the statement
    • endbalance_bd – ending balance printed on the statement
    • endbalancecalc_bd – calculated balance (if did not reconcile)
    • accountnumber – account number processed
    • bankurl – url of the bank, a proxy for the bank name
    • translations[] (if json transaction output was set)
      • Each transaction has: date, description, amount, memo, checknumber, type, ocrmissinginfo, ocrsuspect
      • Most of those are self-evident. The type is only present if there was a transaction type on the PDF statement. The OCR fields are only used on OCR conversions. ocrmissinginfo indicates the line was missing a data field and ocrsuspect is a low confidence numeric value (the first place to look if the statement did not reconcile)
    • csvrows[]

(if csv transaction output was set)

    • Rows of csv data with a header and one row for each transaction

Data fields returned in JSON format

results - Array of results, one for each statement converted in this conversion

numtransactions - number of transactions found. If negative is an http-like error code.

reconciled - whether statement reconciled against balances or totals

donegate - whether statement values need to be negated in order for the statement to balance, typically true for credit card statements

balanceValid - whether the balance values are valid (a 0.00 balance could be accurate)

numcredits - number of credits found

numdebits - number of debits found

totcredits_bd - total value of credits

totdebits_bd - total value of debits

startbalance_bd - start balance found in statement

endbalance_bd - end balance found in statement

endbalancecalc_bd- end balance calculated. Could be different than endbalance if the statement did not reconcile

accountnumber - account number

nmonths - number of monthly statements in the PDF

monthseq - index number of statement processed (0 is first)

naccounts - number of accounts found in the statement

accountseq - index number of account processed (0 is first)

isOCR - true if processed with OCR

isImage- true if statement looks like a scanned document

doctype - 0 for bank/credit card statement, 1 for tax form

typos - number of typos fixed by PDF+ PinPoint

suspectcount - number of currency values with a low confidence digit

minresolution - if a scanned image, minimum resolution of pages

maxresolution - if a scanned image, maximum resolution of pages

firstdate - earliest date found in a transaction

lastdate - latest date found in a transaction

startdate - start date of statement (defaults to firstdate if not found)

enddate- end date of statement (defaults to lastdate if not found)

bankname - name of bank statement is from, if looked up

bankurl - url of bank, generally the most reliable way to identify the bank

accountowner - owner address block from statement

address1 - first line of address

city - city

state - state

postalcode - postal code

filepath - full filename of the file just processed

fraudscore - thumprint fraudscore from 0 to 1000 of how likely it is that this file was altered. 0 is good, 1000 is bad. -1 is no score

tpreasons - array of integers, each number being a reason for a thumbprint fraudscore being greater than zero

transactions - list of transactions

date - date of transaction

description - description of payee for transaction

amount - amount of transaction

memo - subsequent transaction description lines (in a single line of text)

checknumber - check number of check

type - transaction type if the PDF statement had such a column

ocrmissinginfo - for OCR, a flag indicating an incomplete transaction, missing either a date or an amount

ocrsuspect - for OCR, a flag indicating the currency value has one or more low confidence characters

Sample Call

Using curl


curl -v -F product=pdfserver -F license="__your_license_code__" -F pdf-filename=@testfile.PDF -F readUSdates=true https://online.moneythumb.com/webapi/pdfconvert/makecsv

Using curl from PHP


$url='https://online.moneythumb.com/webapi/pdfconvert/makecsv';
$data['product']='pdfserver';
$data['license']='3XBGaPRHE6KrJ5GYHB....GHRS';


$pdf=curl_file_create('pdfiles/statement.PDF', 'application/pdf');
$data['pdf-filename']=$pdf;
$data['readUSdates']='true';
$data['outputtype']='json';
$data['logtype']='text';


$request_headers[] = ‘Content-Type: multipart/form-data’;
$curl_request = curl_init();
curl_setopt($curl_request, CURLOPT_URL,$url);
curl_setopt($curl_request, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl_request, CURLOPT_TIMEOUT, 30);
curl_setopt($curl_request, CURLOPT_HTTPHEADER, $request_headers);
curl_setopt($curl_request, CURLOPT_POST, true);
curl_setopt($curl_request, CURLOPT_POSTFIELDS, $data);


$result = curl_exec($curl_request);
if(curl_errno($curl_request))
print curl_error($curl_request);
else
curl_close($curl_request);


$json_results = json_decode ($results, true);
echo ‘Number transactions found ‘ . $json_results[‘results’][0][‘numtransactions’] . ‘<br>’;
echo ‘Csv is at ‘ . $json_results[‘logurl’] . ‘<br>’;
echo ‘Log is at ‘ . $json_results[‘logurl’] . ‘<br>’;

Using jQuery

First create formdata in some combination of a form and/or function calls


var url='https://online.moneythumb.com/webapi/pdfconvert/makecsv';
var form = document.getElementById('convert-form');
var formdata = new FormData(form);
formdata.append("readUSdates", true);

Then make the ajax call


ajax({
	type: 'POST',
	url: url,
	data: formdata,
	processData: false,
	contentType: false,
	dataType: 'json',
	success: function(data){
		echo 'Number of transactions: ' + data.results[index].numtransactions;
		...
	}
	error: function(jqXHR, textStatus, errorThrown){
		...
	}
});

Example JSON Output

{"results":[{"numtransactions":135,"reconciled":true,"donegate":false,"balanceValid":true, "numcredits":5,"numdebits":130,"startbalance_bd":3086.29, "endbalance_bd":1756.05,"endbalancecalc_bd":1756.05, "totcredits_bd":31519.54,"totdebits_bd":32849.78000000001,"accountnumber":"001234567899", "bankname":"","bankurl":"Chase.com","accountowner":["SOME NAME","SOME ADDRESS","CITY ST 12345"], "nsfcount":0,"trackingid":"Chase checking", "firstdate":{"iLocalMillis":1524441600000,"iChronology":{"iBase":{"iMinDaysInFirstWeek":4}}}, "lastdate":{"iLocalMillis":1526860800000,"iChronology":{"iBase":{"iMinDaysInFirstWeek":4}}}, "startdate":{"iLocalMillis":1524441600000,"iChronology":{"iBase":{"iMinDaysInFirstWeek":4}}}, "enddate":{"iLocalMillis":1526860800000,"iChronology":{"iBase":{"iMinDaysInFirstWeek":4}}}, "filepath":"C:\\Ralph\\MoneyThumb\\Chase checking.pdf", "nmonths":1,"monthseq":0,"naccounts":1,"accountseq":0,"isOCR":false,"isImage":false, "doctype":0,"typos":0,"address1":"SOME ADDRESS","company":"SOME NAME", "city":"CITY", "state":"ST","postalcode":"12345","suspectcount":0,"remaining":10000,"imgfiles":[], "taxforms":[],"transactions":[ {"date":{"iLocalMillis":1524441600000,"iChronology":{"iBase":{"iMinDaysInFirstWeek":4}}}, "description":"Card Purchase 04/18 Viva Italiano Pacifica CA Card 4215","amount":-16.88,"checknumber":0,"ocrmissinginfo":false,"ocrsuspect":false}, {"date":{"iLocalMillis":1524441600000,"iChronology":{"iBase":{"iMinDaysInFirstWeek":4}}}, "description":"Card Purchase 04/20 Sand Dollar Restaura Stinson Bea CA Card","amount":-29.92,"memo":"4215","checknumber":0,"ocrmissinginfo":false,"ocrsuspect":false}, {"date":{"iLocalMillis":1524441600000,"iChronology":{"iBase":{"iMinDaysInFirstWeek":4}}}, "description":"Card Purchase 04/21 P Town Cafe Pacifica CA Card 4215","amount":-5.80,"checknumber":0,"ocrmissinginfo":false,"ocrsuspect":false}, {"date":{"iLocalMillis":1524441600000,"iChronology":{"iBase":{"iMinDaysInFirstWeek":4}}}, "description":"Card Purchase With Pin 04/21 Oreilly Auto PA Pacifica CA Card 4215","amount":-32.46,"checknumber":0,"ocrmissinginfo":false,"ocrsuspect":false}, {"date":{"iLocalMillis":1524441600000,"iChronology":{"iBase":{"iMinDaysInFirstWeek":4}}}, "description":"Non-Chase ATM Withdraw 04/21 *Coastside Pacifica CA Card 4215","amount":-102.00,"checknumber":0,"ocrmissinginfo":false,"ocrsuspect":false}, {"date":{"iLocalMillis":1524441600000,"iChronology":{"iBase":{"iMinDaysInFirstWeek":4}}}, "description":"Card Purchase 04/21 Granucci's Pacifica CA Card 4215","amount":-16.20,"checknumber":0,"ocrmissinginfo":false,"ocrsuspect":false},
.....
{"date":{"iLocalMillis":1526860800000,"iChronology":{"iBase":{"iMinDaysInFirstWeek":4}}}, "description":"Interest Payment","amount":0.03,"checknumber":0,"ocrmissinginfo":false,"ocrsuspect":false}]}], "logurl":"https://online.moneythumb.com/results/M-IXBJJbN-5211652930396749203.htm", "csvurl":"https://online.moneythumb.com/results/M-IXBJJbN-5211652930396749203.csv", "error":"", "tid":"M-IXBJJbN-5211652930396749203", "outputurl":"","statementid":20768, "convertedstatements":0,"statementseq":0}

Check for completion of an OCR conversion

This API call has been deprecated and will be removed in future API versions.
Instead, use a webhook to be notified when an OCR conversion is complete.

  • URL: /webapi/pdfconvert/check
  • Method: GET
  • URL Params: None
  • Data Params:
    license* – name of results file (.res) that are checking on, returned as error

Using jQuery

Add an block to the error handling of makecsv like the following


error: function(jqXHR, textStatus, errorThrown) 
{
	if (jqXHR.status == 202)  // test for OCR processing
	{	    					
		resultsFile = jqXHR.responseText;  /
		if (resultsFile.endsWith(".res"))
		{
			startTime = new Date().getTime();
			if (pdfInterval)
				window.clearInterval(pdfInterval);
			pdfInterval = window.setInterval(checkStatus, 10000); // check status every 10 seconds
		}
		// for large image files, an email will be sent on completion
		else if (resultsFile.endsWith(".email")) 
		{			
		 	... provide user notification
		}
	}
}

$.ajax(
	{
        type: "GET",
        url: action="webapi/pdfconvert/check",
        data:  {"resfile": resultsFile}  ,
        dataType: "json",
        success: function (data) 
        {
			window.clearInterval(pdfInterval); // stop checking
			... conversion is complete
        },
        error: function(jqXHR, textStatus, errorThrown) 
		{
			if (jqXHR.status != 206)  // ANY other return than 206 indicates a fatal server error
			{	    					
	        	window.clearInterval(pdfInterval); //  no further polling should be done on a server error
				if (jqXHR.status == 410)   // 410 could be returned if server reboots
					showalert("error", "Connection to server lost", 
						"Please retry the conversion", null);
				else showalert("Error " + jqXHR.status + " : "+ jqXHR.statusText, "has-error");
				console.log(jqXHR);
			}
			
			var complete = jqXHR.responseText.split(" ");
			if (typeof complete[1] !== 'undefined')
			{
				... text(complete[1] is the percentage completed
			}
			var curTime = new Date().getTime();
			if (curTime - startTime > 1800000)  // time out after 30 minutes 
			{
				window.clearInterval(pdfInterval);
				showalert("Error " + jqXHR.status + " : "+ jqXHR.statusText, "has-error");
			}
		}
}); 

Skip to content