How to Merge Multiple Annotation IDs into a Single ID with Unified Choices

Hello,

I am using Label Studio to create image annotations with the content-based image retrieval template.

Currently, I have multiple annotation IDs assigned to a single image ID. I would like to merge the choices from these multiple annotation IDs into one annotation ID, and delete the duplicate annotation IDs. In other words, I want each image ID to have only one annotation ID with all the merged choices.

Since mages are imported from local storage, and storing storage filename. I want to hold storage filename.

Could you please guide me on how to achieve this?

Thank you.

what rule will you follow during the choice merging?

There are some draft ideas and a draft SDK code.

To merge multiple annotation IDs into a single annotation ID for each image ID and delete the duplicate annotation IDs while preserving the storage filename, you can follow these steps:

  1. Fetch Annotations: Retrieve all annotations for each image.
  2. Merge Annotations: Combine the choices from multiple annotations into a single annotation.
  3. Update Annotations: Update the single annotation with the merged choices.
  4. Delete Duplicate Annotations: Remove the duplicate annotations.

Here’s a Python script using the Label Studio SDK to achieve this:

import label_studio_sdk
from label_studio_sdk import Client

# Initialize the Label Studio client
ls = Client(url="http://localhost:8080", api_key="your_api_key")

# Replace with your project ID
project_id = 1

# Fetch all tasks in the project
tasks = ls.get_project(project_id).get_tasks()

# Dictionary to store merged annotations
merged_annotations = {}

# Iterate through tasks to fetch annotations
for task in tasks:
    task_id = task['id']
    annotations = task['annotations']
    
    # Initialize a dictionary to store merged choices for each image
    if task_id not in merged_annotations:
        merged_annotations[task_id] = {
            'choices': set(),
            'annotation_ids': []
        }
    
    # Merge choices from all annotations
    for annotation in annotations:
        annotation_id = annotation['id']
        choices = annotation['result'][0]['value']['choices']
        merged_annotations[task_id]['choices'].update(choices)
        merged_annotations[task_id]['annotation_ids'].append(annotation_id)

# Update annotations and delete duplicates
for task_id, data in merged_annotations.items():
    choices = list(data['choices'])
    annotation_ids = data['annotation_ids']
    
    # Update the first annotation with merged choices
    if annotation_ids:
        first_annotation_id = annotation_ids[0]
        update_data = {
            "result": [
                {
                    "value": {
                        "choices": choices
                    }
                }
            ]
        }
        ls.get_project(project_id).update_annotation(first_annotation_id, update_data)
        
        # Delete the remaining duplicate annotations
        for annotation_id in annotation_ids[1:]:
            ls.get_project(project_id).delete_annotation(annotation_id)

print("Annotations merged and duplicates deleted successfully.")

Explanation:

  1. Initialize the Label Studio Client: Connect to your Label Studio instance using the API key.
  2. Fetch Tasks: Retrieve all tasks in the specified project.
  3. Merge Annotations: For each task, merge the choices from all annotations into a single set.
  4. Update Annotations: Update the first annotation with the merged choices.
  5. Delete Duplicates: Delete the remaining duplicate annotations.

Notes:

  • Ensure you replace "http://localhost:8080" and "your_api_key" with your Label Studio instance URL and API key.
  • This script assumes that the choices are stored in the result field of the annotations. Adjust the script if your annotation structure is different.
  • Test the script with a small subset of your data to ensure it works as expected before running it on the entire dataset.

By following these steps, you should be able to merge the choices from multiple annotation IDs into a single annotation ID for each image ID and delete the duplicate annotation IDs while preserving the storage filename.

Thank you for the code.

The merge rule of the code was to unify the choices without duplicates into the annotation results of the earliest annotation ID, and delete the others so that only one annotation ID remains. Although I was the only one doing the annotations, there were several items with multiple annotations unintentionally. To make it easier to check the contents of the choices, I made the initial post.

I will test it to see if it works well in my case.