Get started with Parble

Transform the way you process documents and extract valuable data with advanced Intelligent Document Processing.

JSON output schema

Each file sent to Parble is processed and parsed into various pieces of information. All the gathered information is in turn packaged within a single JSON response and ultimately sent back to the user. Regardless of the size of the file uploaded, reading and understanding JSON responses can be a tedious task.

This page aims at explaining the structure of our JSON responses, including a dummy JSON response structure followed by a line-by-line description.

JSON response structure

- id:
- ▶
  timings:
  - upload:
  - done:
- filename:
- automated:
- number_of_pages:
- ▶
  documents:
  - ▶
    0:
    - filename:
    - automated:
    - ▶
      classification:
      - start_page:
      - end_page:
      - document_type:
      - confidence:
      - automated:
    - ▶
      header_fields:
      - ▶
        FIELD_NAME:
        page:
        coordinates:
        text:
        value:
        confidence:
        automated:
    - ▶
      tables:
      - ▶
        TABLE_NAME:
        ▶
        0:
        ▶
        FIELD_NAME:
        page:
        coordinates:
        text:
        value:
        confidence:
        automated:

Elements explained

ID
The ID is the universally unique identifier (UUID) of the file.
It is formatted as a string of 24 alphanumeric characters, e.g.: 1234a56b789c0de123456fg7
Timings
The information on the timings of the whole file.
Upload time
The upload time refers to the timestamp of when the machine started processing the file.
It is formatted as a timestamp format (YYYY-MM-DDTHH:MM:SS.MMMMMM), e.g.: 2021-10-12T15:48:09.688000
Done time
The done time refers to the timestamp of when the machine finished processing the file.
It is formatted as a timestamp format (YYYY-MM-DDTHH:MM:SS.MMMMMM), e.g.: 2021-10-12T15:48:09.688000
Filename
Is the original file name including the extension.
It is formatted as a string of characters, e.g.: invoice.eml
Automated
Is the indicator that shows whether all the predictions within the file are automated (does not need human review).
It is formatted as a boolean value (true meaning automated; false meaning not automated), e.g.: true
Number of pages
Is the total number of pages contained in the file, including all attachments.
It is formatted as an integer starting at 1, e.g.: 4
Documents
Is an array containing the information for all the recognized documents comprised within the file.
A file could for example be an email, and all the documents could be the email body and the attached PDF files. Another example is the file being a PDF file of 2 unique receipts, which will then be the 2 recognized documents.
Filename
Is the original file name including the extension.
It is formatted as a string of characters, e.g.: invoice.pdf
Automated
Is the indicator that shows whether all the predictions within the document are automated (does not need human review), including the predictions of the document classification, the header fields and the table items.
It is formatted as a boolean value (true meaning automated; false meaning not automated), e.g.: true
Classification
Full information on the classification of the document.
Please check Classification section
Header fields
Full information on the predictions made for any header field. These object keys are the header fields technical names, each key (or header field) will contain the informations related to it.
Please check Fields section
Tables
The container of information about all the recognized tables comprised within the file. This is a dictionary whose keys are the table names, e.g.: TaxTotal or line_items
Please check Fields section
Document classification
Contains the full information about the classification of the document. Includes the predicted type of document, confidence of the prediction and the starting and ending page within the full file.
Automated
Is a boolean serving as the indicator that shows whether the classification of the document is automated (does not need human review). E.g.: true
Document type
Is a string with the predicted document type. E.g.: invoice
Confidence
Is the confidence level of the document type prediction (value between 0 and 100). E.g.: 95
Start page
Is the number of page (starting at 0) of the first page of the document within the file. E.g.: 0
End page
Is the number of page (starting at 0) of the last page of the document within the file. E.g.: 2
Field information
For each field detected, we have the following pieces of information inside the object: automated, confidence, page, coordinates, text, and value.
Automated
Is a boolean serving as the indicator that shows whether the prediction of the field is automated (does not need human review). E.g.: true
Confidence
Is the number the percentage of the confidence level of the field prediction. E.g.: 95
Page
Is the number of page (starting at 0) where the field was detected. E.g.: 1
Coordinates
Is an array of numbers containing the relative coordinates of the rectangle where the field was detected, in form of [x1, y1, x2, y2]. If no coordinates are associated it defaults to [0, -1, 0, -1]. E.g.: [0.2724, 0.1374, 0.3505, 0.1465]
Text
Is the string with the value extracted exactly as read from the document. E.g.: April 21,2023
Value
Is the value extracted from the document but formatted into the proper data type (number, boolean, string, date, etc... ). E.g.: 2023-04-21

Get started with Parble

Transform the way you process documents and extract valuable data with advanced Intelligent Document Processing.

JSON output schema

JSON response structure

Elements explained

ID

Timings

Upload time

Done time

Filename

Automated

Number of pages

Documents

Filename

Automated

Classification

Header fields

Tables

Document classification

Automated

Document type

Confidence

Start page

End page

Field information

Automated

Confidence

Page

Coordinates

Text

Value