The AI reads ID card images and converts them into Thai-English data.
Beta version: The platform is in a development stage of the final version, which may be less stable than usual. The efficiency of platform access and usage might be limited. For example, the platform might crash, some features might not work properly, or some data might be lost.
The ID card extraction model is similar to the conventional OCR model, which includes a detector, a recognizer and a post-processing. However, for the ID card extraction model, an additional post-processing model is required. Overall, the input image was sent to the detector, recognizer, and post-processing model respectively, resulting in a key-value pair of information present in the ID card in a JSON format.
Detector: the detector is the first model in the pipeline. The goal of the detector is to identify the text region present in the given input image. Therefore, the output of the detector model is bounding boxes covering the text area in the input image. The bounding box can be represented as a four-point coordinate representing each edge of a box.
Recognizer: after the detector identifies the text region in the form of a bounding box, the recognizer then reads all these images cropped from the bounding boxes. It transcribes them into computer-readable text. At this stage, the model can tell the location where the text is in the input image (from the detector), and the text resides in the specified location (recognizer). Although we have extracted all the text in the input ID card image, we can’t tell which text in each box is corresponding to which fields (name, address, date of birth, etc.)
Post-processing: finally, this transcribed text is then passed to the post-processing model. The post-processing model predicts whether each text from the recognizer belongs to which field. For instance, if the recognized text is “Mr. John”, the model should classify this text as “English first name”. In the end, the model will output the necessary fields present in the ID card.
Although the model is robust to some extent, there’re several constraints on the input image that must be avoided:
The rule of thumb for any machine learning model is never to provide any inputs that are incredible to humans. If your image contains too many scratches, reflection, or shadow that is too strong that is incredible to human, then the model will perform poorly.
Currently, the model can only extract one card at a time. If there’s more than one ID Card in an image, the results can be wrong and we encourage users not to give an image that contains more than one card.
The current version of the model only supports the ID card template as shown in the figure below. If your ID card is in an older format, the model would return bad results.
We use an edit distance metric to evaluate an OCR ID card extraction task. The edit distance metric calculates the distance (characters) required to change from one sentence (prediction) to another (label). Generally, the lower edit distance means better performance.
For reporting the model performance, we use
Per-field average edit distance
Average edit distance computed over all extracted fields
The API endpoint receives an HTTP POST request in a multi-part form data format. The returned output will be a JSON object specifying each extracted information.
The input should be images e.g., JPEG, JPG, and PNG format, and should be sent to the API in a multipart format. Users can also specify whether to detect whether we are using a front or back side of the ID card, and the specified sides should strictly match the number of images sent. In case the user did not specify the detection sides, all of the images will be set default as a front side. The code example for Python can be found below:
```
from typing import Any, Dict, List
import requests
def extract_id_visai(img_paths: List[str], sides: List[str]) -> Dict[str, Any]:
"""""Use VISAI IDCARD API to extract information from given ID cards
Arguments:
img_paths: List[str]
Path to image files to extract
sides: List[str]
Side of each image. Can be either "front" or "back" only
"""
assert len(img_paths) == len(sides), "Number of sides and image path should be equal"
files = [
("files", ("file", open(_img, "rb"), "image/png"))
for _img in img_paths]
# sides were joined into one string using "," as a delimiter
data = {"side": ",".join([_side for _side in sides])}
r = requests.post(f"{self.base_url}/predict", files=files, data=data)
# close any opened files
for _, (_, f, _) in files:
f.close()
result = r.json()
return result
idcard_imgs = ["/path/to/image1.jpg", "/path/to/image2.png"]
results = extract_id_visai(idcard_imgs, ["front", "back"])
```
It is recommended that the input image should contain the following properties for the best results:
Clear, and not blurred. All text is clearly readable for humans.
Minimum scratch presence. If any, the scratch shouldn’t be thick enough to cover any text on the ID card.
Make sure that the shadow or reflection is not present on the card. If unavoidable, make sure that the shadow/reflection doesn’t cover any text on the ID card.
Although we handle the case where the input image is misoriented, we still recommend not to use the image that is misoriented for more than 30 degrees.
Avoid having any text or signatures overlapping with the text on the card. This can cause the model to be confused with the overlapped text.
Make sure that the image contains only one ID card. This is very important.
The ID card should cover a large portion of the input image. The model can perform better if the input image is mostly covered with only an ID card.
For the front side, the JSON contains 5 main keys:
fields - The dictionary containing key values of each field on the ID card. The keys include:
idnum - ID card number
th_name - Thai first name and last name
en_fname - English first name
en_lname - English last name
religion - Religion
address_1 - First line of address
address_2 - Second line of address
dob_{th,en} - Date of birth in TH/EN
doi_{th,en} - Date of issuance in TH/EN
doe_{th,en} - Date of expiry in TH/EN
serial_id - Serial ID of the ID card, display under face section in the card
postprocess_fields - The dictionary contains additional information obtained from post-processing for further use. These fields contain 2 additional fields:
address - contains the postprocess information about address on ID card. This field contains the following keys:
district - District name in Thai
subdistrict - Subdistrict name in Thai
province - Province name in Thai
road - Road name, if exists
Soi - Soi name, if exists
moo - Moo if exists
homenum - Home number
subdistrict_en - Subdistrict name in English
subdistrict_id - Official subdistrict ID
district_en - District name in English
district_id - Official district ID
province_en - Province name in English
province_id - Official province ID
postal_code - Postal code of the specified address
person - contains the post-process information about person on an ID card. This field also contains the following keys:
title_th - Title of Thai name
title_en - Title of English name
th_fname - First name in Thai
th_lname - Last name in Thai
en_fname - First name in English
en_lname - Last name in English
gender - Biological gender according to title
face - The base64 string of the extracted face on the ID Card
rotation - the rotation degree required for the image in case that the image isn’t0 degree straight
process_time - time required to process the model in seconds. (This time excludes the time required for the API to be sent/received.)