Label Studio not presenting the labels and OCR text

Hi there,

I’m new using label studio and I’m assigned to make a project for extracting data from invoices with different layouts using LayoutLM.
Since the invoices have thousands of words and need to label in the BIO format, meaning I need to label all the words I wanted to speed up the process.

I created a project on label studio Community with a storage to provide the images and deployed a backend ML to make the OCR and to label every word with O and then I will correct the entities I want to extract.

I’m having an issue that is strange. I call the predict from label studio and there is a 200 from the backend side and also a predict associated on a tab but the ocr_text is not passing to the frontend and also the labeling is not showing.
Here is my backend code where the answer is being made to the request.

predictions.append({
                    "id": str(id_counter),
                    "model_version": "ocr-bbox-initial-o-v1",
                    "type": "rectanglelabels",
                    "from_name": "bbox", # Assuming your RectangleLabels name is "bbox"
                    "to_name": "image",
                    "original_width": img_width,
                    "original_height": img_height,
                    "image_rotation": 0,
                    "value": {
                        "rotation": 0,
                        "x": x_normalized,
                        "y": y_normalized,
                        "width": w_normalized,
                        "height": h_normalized,
                        "rectanglelabels": ["O"]
                    }
                })
                id_counter += 1

        response = {
            "data": {"image": task["data"].get("image"),
                     "ocr_text": ocr_text},
            "results": [
                {
                    "model_version": "ocr-bbox-initial-o-v1",
                    "result": predictions
                }
            ]
        }

Here is the config I have in the project

<View>
  <Image name="image" value="$image" zoom="true" zoomControl="false"
         rotateControl="true" width="100%" height="100%"
         maxHeight="auto" maxWidth="auto"/>

  <TextArea name="ocr_text" toName="image" value="$ocr_text"
            editable="true"
            perRegion="false"
            required="false"
            maxSubmissions="1"
            rows="5"
            placeholder="Recognized Text"
            selectable="true"
            granularity="word"
            useModel="true" />

  <Labels name="ner" toName="ocr_text" useModel="true">
    <Label value="B-NIF" background="#ff5733"/>
    <Label value="B-FORN" background="#33ff57"/>
    <Label value="B-DATA" background="#3375ff"/>
    <Label value="B-VALOR" background="#ff33a1"/>
    <Label value="B-NUMFAT" background="#f3ff33"/>
    <Label value="I-NIF" background="#ff5733"/>
    <Label value="I-FORN" background="#33ff57"/>
    <Label value="I-DATA" background="#3375ff"/>
    <Label value="I-VALOR" background="#ff33a1"/>
    <Label value="I-NUMFAT" background="#f3ff33"/>
    <Label value="O" background="#aaa"/>
  </Labels>
</View>

Any advice here?
Thanks

Hi there!
You’re not getting the text back because you’re not passing it correctly in the prediction. Take a look at this example which uses a different model, but the part at the bottom where we upload the data to the frontend should still be applicable to you. Take a look especially at how we upload both the bbox and the text – we upload each one as result separately.

Let me know if you need more help!

I’m trying to make it kind of in a interactive way. When I enter in the image for the first time it call the predict for that image. Looks like you are make it via API. I’m triggering the backend when enter one image and label-studio is not complaining about the format of the JSON but is not doing anything

My full backend code is here:

import pytesseract
import json
from PIL import Image
import os
from flask import Flask, request, jsonify

class OCRNERModel:
    def setup(self):
        """Required setup endpoint for Label Studio"""
        print("ML Backend is set up successfully.")
        return jsonify({"status": "ok"})

    def predict(self, request_data, **kwargs):
        tasks = request_data.get("tasks", [])
        if not tasks:
            return jsonify({"error": "No tasks found in request."}), 400

        task = tasks[0]
        image_path = task["data"].get("image")

        # Convert relative URL to local path
        if image_path and image_path.startswith("/data/local-files/?d="):
            image_path = image_path.replace("/data/local-files/?d=", "/")

        if not image_path or not os.path.exists(image_path):
            return jsonify({"error": f"Image file not found: {image_path}"}), 404

        try:
            img = Image.open(image_path)
            ocr_data = pytesseract.image_to_data(img, output_type=pytesseract.Output.DICT)
            ocr_text = pytesseract.image_to_string(img).strip()
        except Exception as e:
            return jsonify({"error": f"OCR extraction failed: {str(e)}"}), 500

        predictions = []
        id_counter = 1
        img_width, img_height = img.size
        for i in range(len(ocr_data['text'])):
            text = ocr_data['text'][i].strip()
            if text:
                x_pixel = ocr_data['left'][i]
                y_pixel = ocr_data['top'][i]
                w_pixel = ocr_data['width'][i]
                h_pixel = ocr_data['height'][i]

                x_normalized = (x_pixel / img_width) * 100
                y_normalized = (y_pixel / img_height) * 100
                w_normalized = (w_pixel / img_width) * 100
                h_normalized = (h_pixel / img_height) * 100

                predictions.append({
                    "id": str(id_counter),
                    "model_version": "ocr-bbox-initial-o-v1",
                    "type": "rectanglelabels",
                    "from_name": "bbox", # Assuming your RectangleLabels name is "bbox"
                    "to_name": "image",
                    "original_width": img_width,
                    "original_height": img_height,
                    "image_rotation": 0,
                    "value": {
                        "rotation": 0,
                        "x": x_normalized,
                        "y": y_normalized,
                        "width": w_normalized,
                        "height": h_normalized,
                        "rectanglelabels": ["O"]
                    }
                })
                id_counter += 1

        response = {
            "data": {"image": task["data"].get("image"),
                     "ocr_text": ocr_text},
            "results": [
                {
                    "model_version": "ocr-bbox-initial-o-v1",
                    "result": predictions
                }
            ]
        }

        print("Sending Response:", json.dumps(response, indent=2))
        return jsonify(response)

# Initialize Flask app and model instance
app = Flask(__name__)
model = OCRNERModel()

# Setup endpoint to configure the model backend
@app.route('/setup', methods=['POST'])
def setup():
    """ Required setup endpoint for Label Studio """
    return jsonify({"status": "ok"})

# Predict endpoint that Label Studio will call
@app.route('/predict', methods=['POST'])
def predict():
    data = request.get_json()
    print(f"Received data: {json.dumps(data, indent=2)}")
    return model.predict(data) # Directly call model.predict and return its jsonify result

# Health check endpoint for Label Studio to verify the backend is up
@app.route('/health', methods=['GET'])
def health():
    """ Health check endpoint required by Label Studio """
    return jsonify({"status": "ok"})

# Run the Flask application
if __name__ == "__main__":
    app.run(host="0.0.0.0", port=8081)

right, but the reason that it’s not showing up is because it’s not in the json the right way. You need to add the text to the results dictionary.

I’m almost there. All the labels are already on the O category but is not bringing the word to the textarea.

import pytesseract
import json
from PIL import Image
import os
from flask import Flask, request, jsonify

class OCRNERModel:
    def setup(self):
        """Required setup endpoint for Label Studio"""
        print("ML Backend is set up successfully.")
        return jsonify({"status": "ok"})

    def predict(self, request_data, **kwargs):
        tasks = request_data.get("tasks", [])
        if not tasks:
            return jsonify({"error": "No tasks found in request."}), 400

        task = tasks[0]
        image_path = task["data"].get("image")

        # Convert relative URL to local path
        if image_path and image_path.startswith("/data/local-files/?d="):
            image_path = image_path.replace("/data/local-files/?d=", "/")

        if not image_path or not os.path.exists(image_path):
            return jsonify({"error": f"Image file not found: {image_path}"}), 404

        try:
            img = Image.open(image_path)
            ocr_data = pytesseract.image_to_data(img, output_type=pytesseract.Output.DICT)
            ocr_text = pytesseract.image_to_string(img).strip()
        except Exception as e:
            return jsonify({"error": f"OCR extraction failed: {str(e)}"}), 500

        predictions = []
        id_counter = 1
        img_width, img_height = img.size
        for i in range(len(ocr_data['text'])):
            text = ocr_data['text'][i].strip()
            if text:
                x_pixel = ocr_data['left'][i]
                y_pixel = ocr_data['top'][i]
                w_pixel = ocr_data['width'][i]
                h_pixel = ocr_data['height'][i]

                x_normalized = (x_pixel / img_width) * 100
                y_normalized = (y_pixel / img_height) * 100
                w_normalized = (w_pixel / img_width) * 100
                h_normalized = (h_pixel / img_height) * 100

                predictions.append({
                    "id": str(id_counter),
                    "type": "rectanglelabels",
                    "from_name": "label",
                    "to_name": "image",
                      "value": {
                        "x": x_normalized,
                        "y": y_normalized,
                        "width": w_normalized,
                        "height": h_normalized,
                        "rotation": 0,
                        "rectanglelabels": ["O"],
                    },
                })

                predictions.append({
                    "id": f"T{id_counter}",
                    "type": "textarea",
                    "from_name": "transcription",
                    "to_name": "image",
                    "value": {
                        "x": x_normalized,
                        "y": y_normalized,
                        "width": w_normalized,
                        "height": h_normalized,
                        "rotation": 0,
                        "text": text,  
                    },
                    "score": 1.0  
                })

                id_counter += 1

        response = {
            "data": {
                "image": task["data"].get("image"),
                "ocr_text": ocr_text
            },
            "results": [
                {
                    "model_version": "ocr-bbox-v1",
                    "result": predictions
                }
            ]
        }

        print("Sending Response:", json.dumps(response, indent=2))
        return jsonify(response)

# Initialize Flask app and model instance
app = Flask(__name__)
model = OCRNERModel()

# Setup endpoint to configure the model backend
@app.route('/setup', methods=['POST'])
def setup():
    """ Required setup endpoint for Label Studio """
    return jsonify({"status": "ok"})

# Predict endpoint that Label Studio will call
@app.route('/predict', methods=['POST'])
def predict():
    data = request.get_json()
    print(f"Received data: {json.dumps(data, indent=2)}")
    return model.predict(data) # Directly call model.predict and return its jsonify result

# Health check endpoint for Label Studio to verify the backend is up
@app.route('/health', methods=['GET'])
def health():
    """ Health check endpoint required by Label Studio """
    return jsonify({"status": "ok"})

# Run the Flask application
if __name__ == "__main__":
    app.run(host="0.0.0.0", port=8081)

What Am I missing?

You’re close! when you add the prediction for the text area, it needs to match the textarea tag. Therefore, you don’t need any of the bounding box information. It should look something like

 predictions.append({
                    "id": f"T{id_counter}",
                    "type": "textarea",
                    "from_name": "transcription",
                    "to_name": "image",
                    "value": {
                        "text": text,  
                    },
                    "score": 1.0  
                })

I adjusted the code and it is still not filling the textareas but when I go to the tasks json I can see the json there, Here is my code

import pytesseract
import json
from PIL import Image
import os
from flask import Flask, request, jsonify

class OCRNERModel:
    def setup(self):
        """Required setup endpoint for Label Studio"""
        print("ML Backend is set up successfully.")
        return jsonify({"status": "ok"})

    def predict(self, request_data, **kwargs):
        tasks = request_data.get("tasks", [])
        if not tasks:
            return jsonify({"error": "No tasks found in request."}), 400

        task = tasks[0]
        image_path = task["data"].get("image")

        # Convert relative URL to local path
        if image_path and image_path.startswith("/data/local-files/?d="):
            image_path = image_path.replace("/data/local-files/?d=", "/")

        if not image_path or not os.path.exists(image_path):
            return jsonify({"error": f"Image file not found: {image_path}"}), 404

        try:
            img = Image.open(image_path)
            ocr_data = pytesseract.image_to_data(img, output_type=pytesseract.Output.DICT)
            ocr_text = pytesseract.image_to_string(img).strip()
        except Exception as e:
            return jsonify({"error": f"OCR extraction failed: {str(e)}"}), 500

        predictions = []
        id_counter = 1
        img_width, img_height = img.size
        for i in range(len(ocr_data['text'])):
            text = ocr_data['text'][i].strip()
            if text:
                x_pixel = ocr_data['left'][i]
                y_pixel = ocr_data['top'][i]
                w_pixel = ocr_data['width'][i]
                h_pixel = ocr_data['height'][i]

                x_normalized = (x_pixel / img_width) * 100
                y_normalized = (y_pixel / img_height) * 100
                w_normalized = (w_pixel / img_width) * 100
                h_normalized = (h_pixel / img_height) * 100

                predictions.append({
                    "id": str(id_counter),
                    "type": "rectanglelabels",
                    "from_name": "label",
                    "to_name": "image",
                      "value": {
                        "x": x_normalized,
                        "y": y_normalized,
                        "width": w_normalized,
                        "height": h_normalized,
                        "rotation": 0,
                        "rectanglelabels": ["O"],
                    },
                })

                predictions.append({
                    "id": f"T{id_counter}",
                    "type": "textarea",
                    "from_name": "transcription",
                    "to_name": "image",
                    "value": {
                        "text": text,  
                    }
                })

                id_counter += 1

        response = {
            "data": {
                "image": task["data"].get("image"),
                "ocr_text": ocr_text
            },
            "results": [
                {
                    "model_version": "ocr-bbox-v1",
                    "result": predictions
                }
            ]
        }

        print("Sending Response:", json.dumps(response, indent=2))
        return jsonify(response)

# Initialize Flask app and model instance
app = Flask(__name__)
model = OCRNERModel()

# Setup endpoint to configure the model backend
@app.route('/setup', methods=['POST'])
def setup():
    """ Required setup endpoint for Label Studio """
    return jsonify({"status": "ok"})

# Predict endpoint that Label Studio will call
@app.route('/predict', methods=['POST'])
def predict():
    data = request.get_json()
    print(f"Received data: {json.dumps(data, indent=2)}")
    return model.predict(data) # Directly call model.predict and return its jsonify result

# Health check endpoint for Label Studio to verify the backend is up
@app.route('/health', methods=['GET'])
def health():
    """ Health check endpoint required by Label Studio """
    return jsonify({"status": "ok"})

# Run the Flask application
if __name__ == "__main__":
    app.run(host="0.0.0.0", port=8081)