The AI reads images of documents such as PDF, PNG, and JPG and converts them into text.
Beta version: The platform is in a development stage of the final version, which may be less stable than usual. The efficiency of platform access and usage might be limited. For example, the platform might crash, some features might not work properly, or some data might be lost.
The OCR model consists of two main components: text detector, and text recognizer. The role of the text detector is to detect a location of text in an image. The model is trained with various documents with bounding boxes—localized text area. Then the text recognizer will transcribe each localized text area into actual text characters. The model was trained using standard cross-entropy loss using encoder-decoder transformers model. We use our internal synthetic data libraries with various data augmentation applied to approximate the multiple real world document distribution.
We use CER to evaluate an OCR General Document model. Generally, the lower CER means better performance.
The model receives raw images as an input. The API command input consists of multiparts or form-data (the filename and filepath) under HTTP Post request. The Input should be in PNG, JPEG, JPG, PDF formats. Users can upload multiple files to API with optional setting (Parameter) described below. The parameter can significantly impact the model's performance. Please note that the ACP OCR General Document allows a maximum of 15 pages per inference request.
{
“box_threshold”: box_threshold # float (Optional)
}
The inputs can be raw images or document files. However, users can apply different preprocessing methods as described below to make the model perform better:
The output of the OCR general document model is extracted text with a bounding box.
The API response would be in the following JSON format:
[
{
"filename": <file name>
, "status": <status>
, "results": [
{
"page": <page number>
, "data": [
{
"bbox": [
<the top left coordinates (x ,y)>
, <the top right coordinates (x ,y)>
, <the bottom right coordinates (x ,y)>
, <the bottom left coordinates (x ,y)>
],
"text": <Message>
}
]
, "fullText": <full message>
}
,{
....
}
]
}
]