Each file sent to Parble is processed and parsed into various pieces of information. All the gathered information is in turn packaged within a single JSON response and ultimately sent back to the user. Regardless of the size of the file uploaded, reading and understanding JSON responses can be a tedious task.
This page aims at explaining the structure of our JSON responses, including a dummy JSON response structure followed by a line-by-line description.
The ID is the universally unique identifier (UUID) of the file.
It is formatted as a string of 24 alphanumeric characters, e.g.: 1234a56b789c0de123456fg7
The information on the timings of the whole file.
The upload time refers to the timestamp of when the machine started processing the file.
It is formatted as a timestamp format (YYYY-MM-DDTHH:MM:SS.MMMMMM), e.g.: 2021-10-12T15:48:09.688000
The done time refers to the timestamp of when the machine finished processing the file.
It is formatted as a timestamp format (YYYY-MM-DDTHH:MM:SS.MMMMMM), e.g.: 2021-10-12T15:48:09.688000
Is the original file name including the extension.
It is formatted as a string of characters, e.g.: invoice.eml
Is the indicator that shows whether all the predictions within the file are automated (does not need human review).
It is formatted as a boolean value (true meaning automated; false meaning not automated), e.g.: true
Is the total number of pages contained in the file, including all attachments.
It is formatted as an integer starting at 1, e.g.: 4
Is an array containing the information for all the recognized documents comprised within the file.
A file could for example be an email, and all the documents could be the email body and the attached PDF files. Another example is the file being a PDF file of 2 unique receipts, which will then be the 2 recognized documents.
Is the original file name including the extension.
It is formatted as a string of characters, e.g.: invoice.pdf
Is the indicator that shows whether all the predictions within the document are automated (does not need human review), including the predictions of the document classification, the header fields and the table items.
It is formatted as a boolean value (true meaning automated; false meaning not automated), e.g.: true
Full information on the classification of the document.
Please check Classification sectionFull information on the predictions made for any header field. These object keys are the header fields technical names, each key (or header field) will contain the informations related to it.
Please check Fields sectionThe container of information about all the recognized tables comprised within the file. This is a dictionary whose keys are the table names, e.g.: TaxTotal or line_items
Please check Fields sectionContains the full information about the classification of the document. Includes the predicted type of document, confidence of the prediction and the starting and ending page within the full file.
Is a boolean serving as the indicator that shows whether the classification of the document is automated (does not need human review). E.g.: true
Is a string with the predicted document type. E.g.: invoice
Is the confidence level of the document type prediction (value between 0 and 100). E.g.: 95
Is the number of page (starting at 0) of the first page of the document within the file. E.g.: 0
Is the number of page (starting at 0) of the last page of the document within the file. E.g.: 2
For each field detected, we have the following pieces of information inside the object: automated, confidence, page, coordinates, text, and value.
Is a boolean serving as the indicator that shows whether the prediction of the field is automated (does not need human review). E.g.: true
Is the number the percentage of the confidence level of the field prediction. E.g.: 95
Is the number of page (starting at 0) where the field was detected. E.g.: 1
Is an array of numbers containing the relative coordinates of the rectangle where the field was detected, in form of [x1, y1, x2, y2]. If no coordinates are associated it defaults to [0, -1, 0, -1]. E.g.: [0.2724, 0.1374, 0.3505, 0.1465]
Is the string with the value extracted exactly as read from the document. E.g.: April 21,2023
Is the value extracted from the document but formatted into the proper data type (number, boolean, string, date, etc... ). E.g.: 2023-04-21