To filter tasks annotated by a specific user using the Label Studio SDK, you can use the Column.annotators field. This field represents the list of annotator IDs who have annotated each task. Here’s how you can achieve this:
from label_studio_sdk import Client
from label_studio_sdk.data_manager import Filters, Column, Operator, Type
# Initialize the Label Studio SDK Client
LABEL_STUDIO_URL = 'https://your-label-studio-url.com' # Replace with your Label Studio URL
API_KEY = 'your-api-key' # Replace with your API key
ls = Client(url=LABEL_STUDIO_URL, api_key=API_KEY)
ls.check_connection()
# Get the project
project_id = YOUR_PROJECT_ID # Replace with your project ID
project = ls.get_project(project_id)
# Get the user ID for 'userhandle1'
users = ls.get_users()
user_id = None
for user in users:
if user.email == 'userhandle1' or user.first_name == 'userhandle1':
user_id = user.id
break
if user_id is None:
print('User not found')
exit()
# Create filter to get tasks where 'annotators' contains the user_id
filters = Filters.create(
Filters.AND,
[
Filters.item(
Column.annotators,
Operator.CONTAINS,
Type.Number,
Filters.value(user_id)
)
]
)
# Retrieve tasks with the filter
tasks = project.get_tasks(filters=filters)
# Output the task IDs or any other desired information
for task in tasks:
print(f"Task ID: {task['id']}")
Explanation:
Retrieve User ID:
We fetch all users using ls.get_users().
We search for the user with the email or first name matching 'userhandle1' to get their user_id.
Create Filter:
We use the Column.annotators column, which contains a list of annotators’ IDs for each task.
We use Operator.CONTAINS to check if the annotators list includes the user_id.
The Type.Number is used because user_id is a numerical value.
Retrieve Tasks:
We apply the filter using project.get_tasks(filters=filters) to get all tasks annotated by the specific user.
Now, the UI of LabelStudio allows for two different user filter:
Annotated by
Updated by
Using Column.annotators seems to only consider the former. I’m trying to understand how to filter based on the latter.
In the tasks retrieved by project.get_tasks(...), there seems to be a field 'updated_by': [{'user_id': 4}], but I haven’t figured out how to access this through a filter.