The AI reads images of receipts and converts them into text.
Beta version: The platform is in a development stage of the final version, which may be less stable than usual. The efficiency of platform access and usage might be limited. For example, the platform might crash, some features might not work properly, or some data might be lost.
OCR receipt model consists of two components: OCR Engine and Receipt parser. Each component is working independently. First, the input of this OCR receipt model is a receipt image, this image is then feed to OCR engine model to transcribe texts and bounding boxes. Then, the bounding boxes and text are used as an input to receipt parser to predict necessary values from the transcribed text.
Although the system can read a clear and perfectly aligned receipt without any issue, in some condition, the model could fail to predict receipt fields. In other words, the model capability is limited to the image that are clearly readable by human. If the receipt images are hard to read for human, those images are very likely to resulted in an inaccurate prediction. Additionally, the current version only supports only four fields: grand total, date, time, shop tax id, and receipt id.
The model is evaluated using accuracy metric, where the metric measures the correctness of the prediction for each field. For each field, a prediction is counted as correct ONLY IF the prediction and label in that specific field are EXACTLY matched.
The input image can be sent in a form of multi-part form data using the key name “files”. Here’s an example python snippet for requesting the API
import requests
def predict(url: str, img_path: str) -> dict:
files= [
('files', (os.path,basename(img_path), open(img_path, 'rb'), 'image/jpeg'))
]
response = requests.post(url, files,files)
return response.json()
When using the receipt parsing model, we recommend integrators to check whether:
If these conditions are applied, we can ensure that the model can perform best since it’s within a desired environment for the model.
The output is a structured JSON containing 6 main parts:
The example of output is shown below.
[
{
"shop_tax_id": "0101111000111",
"grand_total": "1234.0",
"receipt_id": "E000000000A1234,
"receipt_date": "31-12-2566",
"receipt_time": "19:00",
"receipt_timestamp": "2023-12-31T19:00:00+00:00"
}
]