Parble

Get started with Parble

Transform the way you process documents and extract valuable data with advanced Intelligent Document Processing.

JSON output schema

Each file sent to Parble is processed and parsed into various pieces of information. All the gathered information is in turn packaged within a single JSON response and ultimately sent back to the user. Regardless of the size of the file uploaded, reading and understanding JSON responses can be a tedious task.

This page aims at explaining the structure of our JSON responses, including a dummy JSON response structure followed by a line-by-line description.

JSON response structure

Elements explained

  • ID

    The ID is the universally unique identifier (UUID) of the file.

    It is formatted as a string of 24 alphanumeric characters, e.g.: 1234a56b789c0de123456fg7

  • Timings

    The information on the timings of the whole file.

    Upload time

    The upload time refers to the timestamp of when the machine started processing the file.

    It is formatted as a timestamp format (YYYY-MM-DDTHH:MM:SS.MMMMMM), e.g.: 2021-10-12T15:48:09.688000

    Done time

    The done time refers to the timestamp of when the machine finished processing the file.

    It is formatted as a timestamp format (YYYY-MM-DDTHH:MM:SS.MMMMMM), e.g.: 2021-10-12T15:48:09.688000

  • Filename

    Is the original file name including the extension.

    It is formatted as a string of characters, e.g.: invoice.eml

  • Automated

    Is the indicator that shows whether all the predictions within the file are automated (does not need human review).

    It is formatted as a boolean value (true meaning automated; false meaning not automated), e.g.: true

  • Number of pages

    Is the total number of pages contained in the file, including all attachments.

    It is formatted as an integer starting at 1, e.g.: 4

  • Documents

    Is an array containing the information for all the recognized documents comprised within the file.

    A file could for example be an email, and all the documents could be the email body and the attached PDF files. Another example is the file being a PDF file of 2 unique receipts, which will then be the 2 recognized documents.

    Filename

    Is the original file name including the extension.

    It is formatted as a string of characters, e.g.: invoice.pdf

    Automated

    Is the indicator that shows whether all the predictions within the document are automated (does not need human review), including the predictions of the document classification, the header fields and the table items.

    It is formatted as a boolean value (true meaning automated; false meaning not automated), e.g.: true

    Classification

    Full information on the classification of the document.

    Please check Classification section
    Header fields

    Full information on the predictions made for any header field. These object keys are the header fields technical names, each key (or header field) will contain the informations related to it.

    Please check Fields section
    Tables

    The container of information about all the recognized tables comprised within the file. This is a dictionary whose keys are the table names, e.g.: TaxTotal or line_items

    Please check Fields section
  • Document classification

    Contains the full information about the classification of the document. Includes the predicted type of document, confidence of the prediction and the starting and ending page within the full file.

    Automated

    Is a boolean serving as the indicator that shows whether the classification of the document is automated (does not need human review). E.g.: true

    Document type

    Is a string with the predicted document type. E.g.: invoice

    Confidence

    Is the confidence level of the document type prediction (value between 0 and 100). E.g.: 95

    Start page

    Is the number of page (starting at 0) of the first page of the document within the file. E.g.: 0

    End page

    Is the number of page (starting at 0) of the last page of the document within the file. E.g.: 2

  • Field information

    For each field detected, we have the following pieces of information inside the object: automated, confidence, page, coordinates, text, and value.

    Automated

    Is a boolean serving as the indicator that shows whether the prediction of the field is automated (does not need human review). E.g.: true

    Confidence

    Is the number the percentage of the confidence level of the field prediction. E.g.: 95

    Page

    Is the number of page (starting at 0) where the field was detected. E.g.: 1

    Coordinates

    Is an array of numbers containing the relative coordinates of the rectangle where the field was detected, in form of [x1, y1, x2, y2]. If no coordinates are associated it defaults to [0, -1, 0, -1]. E.g.: [0.2724, 0.1374, 0.3505, 0.1465]

    Text

    Is the string with the value extracted exactly as read from the document. E.g.: April 21,2023

    Value

    Is the value extracted from the document but formatted into the proper data type (number, boolean, string, date, etc... ). E.g.: 2023-04-21