Example for SDK Pre-annotation of an OCR Task

W.P_McNeill · January 23, 2025, 6:06pm

I am trying to use the Python SDK to create a OCR project. I have images of document pages. I would like to associate labels with rectangular regions on the page images (e.g. “header” and “paragraph”). (I would also like to be able to annotate these regions with the text they contain, but let’s focus on the region labels for the moment.)

I can create a page image labeling project. I have a labeling interface that looks like this.

<View>
<Image name='image' value='$image'/>
    <RectangleLabels name='label' toName='image'>
        <Label value='header' background='red'/>
        <Label value='paragraph' background='green'/>
    </RectangleLabels>
</View>

Now suppose I start of with hypotheses about where some of the labeled regions are. (Maybe I used a machine learning model.) I want to create the project incorporating these hypotheses so that Label Studio annotators can either approve, delete, or modify them. How do I do this?

I’ve tried a bunch of things, but here is the most successful thing so far. In Python code I create a list of dictionary objects that look like this:

{
    "data": image_url,
    "predictions": [
            {
                  "original_width": 612,
                  "original_height": 792,
                  "image_rotation": 0,
                  "value": {
                    "x": 33.702920016927635,
                    "y": 11.798561151079138,
                    "width": 35.19255184088024,
                    "height": 3.4532374100719405,
                    "rotation": 0,
                    "rectanglelabels": [
                      "header"
                    ]
                  },
                  "id": "12345,
                  "from_name": "label",
                  "to_name": "image",
                  "type": "rectanglelabels"
                }
    ]
}

I load this using the SDK like so.

project = label_studio_client.projects.create(**project_info)
tasks = [...] # List of dictionaries like the one above
label_studio_client.projects.import_tasks(project.id, request=tasks)

When I do this, I don’t see any of the pre-annotations show up in the Label Studio UI. When I look at the source of the tasks in the Label Studio UI, I see the predictions list where I created it and an “predictions” key at the same level as the “data” key with an empty list. Presumably I’m passing in the arguments to import_tasks wrong.

I think I am exactly following the “1. Import tasks in Label Studio JSON format” example in the import pre-annotations example notebook, but apparently I am not.

I get the impression should be easy, but I’m having a hard time understanding the documentation. There seem to be multiple ways of doing the same thing (e.g. label_studio_client.projects.import_tasks vs label_studio_client.predictions.create or whether I should be creating “annotations” or “predictions”), none of which work for me. I can’t find examples of what I want to do, so I’m doing lots of unsuccessful trial-and-error and reverse engineering.

Can someone give me a simple example of the correct Python SDK call with arguments that I can copy?

Topic		Replies	Views
Multi-page labeling importing pre-annotated data Label Studio Support	1	373	October 10, 2024
Label Studio not presenting the labels and OCR text Label Studio Support annotations	10	142	June 10, 2025
Project.update_annotations() Label Studio SDK	1	103	June 28, 2024
How to accept or reject predictions as annotations Label Studio Support api , annotations	0	143	December 9, 2024
Import Pre-Annotated Images Label Studio Support annotations	3	783	August 12, 2024

Example for SDK Pre-annotation of an OCR Task

Related topics