Hello - I have followed the instructions to configure a Tesseract engine for my annotation project here: https://docs.humansignal.com/tutorials/tesseract
However, the Autodetect labels are not automatically displaying the text as shown in the tutorial. am using the same config as the one on the instructions page above. I tested the Tesseract connection and it is returning a good response of “200.” Any pointers to resolve the issue would be appreciated.
As a temporary workaround, I am manually preprocessing my documents using Tesseract and importing the resulting JSON files into Label Studio as described here: https://labelstud.io/blog/improve-ocr-quality-for-receipt-processing-with-tesseract-and-label-studio/.
Share your setup!
What version of Label Studio you’re using (for example, 1.10.0). 1.12.0
How you installed Label Studio (for example, pip, brew, Docker, etc.).
Docker
Your labeling configuration.
<View>
<Image name="image" value="$ocr" zoom="true" zoomControl="false"
rotateControl="true" width="100%" height="100%"
maxHeight="auto" maxWidth="auto"/>
<RectangleLabels name="bbox" toName="image" strokeWidth="1" smart="true">
<Label value="Label1" background="green"/>
<Label value="Label2" background="blue"/>
<Label value="Label3" background="red"/>
</RectangleLabels>
<TextArea name="transcription" toName="image"
editable="true" perRegion="true" required="false"
maxSubmissions="1" rows="5" placeholder="Recognized Text"
displayMode="region-list"/>
</View>
Sample data to help us reproduce the issue.
You can use the receipt on Wiki here: https://upload.wikimedia.org/wikipedia/commons/0/0b/ReceiptSwiss.jpg