Custom filtering

Label Studio v1.22.0

Hi,

I am experimenting with adding custom fields to Label Studio. The custom fields appear in the table, i.e. meta_info, and I can change the column type i.e. “txt” or “str”, but in the Filters, the type in the dropdown list is stuck in “data” and will error. Is this ability a new incomplete feature?

The file paths for the images seem to be base64 encoded when displayed to the user, even when using presigned. When I batch import images using JSON, I am unable to filter by “contains” for file-path (image [data]). I need to directly access the database, filter using sql on the file path to get the id of the task and then filter on id in the UI. This is inefficient but works. So, I am trying to create custom fields that contain the parts of the file paths I would like to filter on.

[
  {
    "data": {
      "image": "s3://redacted-path/my-image.jpg",
      "meta_info": "redacted-path/my-image.jpg"
    }
  }
]

Thank you for any assistance

Why the column “type” can be txt/str, but Filters are stuck on data (and error)

Columns display type and Filters value type are not the same system in Label Studio.

  • In the backend, Data Manager filter type is validated against a fixed enum (Number, Datetime, Boolean, String, List, Unknown) in label_studio/data_manager/prepare_params.py. Anything else (like data) is not a valid filter type and can error.

    Source: prepare_params.py filter Type enum

  • Separately, Data Manager columns are generated by get_all_columns() and task.data.* fields are typed based on labeling config / inferred data types. Anything not recognized as media types ends up as String for display.

    Source: get_all_columns() column type mapping

So yes: this looks like an incomplete/unified UX — changing a column’s display type doesn’t necessarily produce a compatible filter definition. The reliable path is to ensure your filter payload uses one of the supported filter type values (usually String for your meta_info).

How to filter on your custom field correctly (recommended)

Use a Data Manager filter that targets task.data.meta_info and set the filter type to String.

For example, via a Data Manager View payload (REST; SDK v2 helper for “views” requires a check, maybe this function is not implemented):

{
  "project": 123,
  "data": {
    "title": "Images by path fragment",
    "filters": {
      "conjunction": "and",
      "items": [
        {
          "filter": "filter:tasks:data.meta_info",
          "operator": "contains",
          "type": "String",
          "value": "redacted-path/"
        }
      ]
    }
  }
}

The key detail is the filter name format for task.data fields:

Why you see base64-ish image paths (even with presigned URLs)

If your task’s data.image value is something like:

/tasks/<some_id>/presign/?fileuri=<BASE64...>

that fileuri is typically a base64-encoded storage URI (gs://, s3://, etc.). This often happens when tasks were created by copying/reusing the output of /api/tasks (which may contain presigned/proxy URLs) instead of importing the original storage URIs.

To avoid generating tasks with these /presign/?fileuri=... values, when you read tasks from the API for re-use you should request un-resolved URIs:

  • /api/tasks?project=<id>&resolve_uri=false

Practical best practice for your use case (fast filtering)

Keep data.image as the real media pointer, and add additional short searchable fields (like you’re doing) such as:

  • meta_info (full key)
  • image_prefix (folder/prefix)
  • filename (basename)
  • dataset, source_id, etc.

Then filter on those fields using filter:tasks:data.<field> + type: String.

Clarifying questions (so I can give the exact payload / workaround for your UI)

  1. Which edition/version are you on (Community vs Enterprise, and version)?
  2. When you open a task JSON, is data.image currently s3://... or /tasks/.../presign/?fileuri=...?
  3. What is the exact error message when you try filtering your meta_info field in the UI?

Further information:

  1. Label Studio v1.22.0 Community (self-hosted). “Advanced filters” are technically an Enterprise feature; we are being frugal.
  2. “/tasks/1685903/resolve/?fileuri=”. I have played with “Use pre-signed URLs (On).”
  3. See below
Runtime error
operator does not exist: text -> unknown
LINE 1: SELECT ("task"."data" -> 'meta_info') FROM "task" WHERE "tas...
^
HINT: No operator matches the given name and argument types. You might need to add explicit type casts.

Workaround:

In the database, I can:

SELECT 
  MIN(id) AS lowest_id,
  MAX(id) AS highest_id
FROM public.task
WHERE data LIKE '%/my_filter/%';

or

SELECT * FROM public.task
WHERE data LIKE '%/folder/my_image.jpg%' 
ORDER BY id ASC

or, for the custom filter columns

SELECT 
  id,
  data,
  data::jsonb->'meta_info' AS meta_info
FROM public.task
WHERE data::jsonb ? 'meta_info'
  AND data LIKE '%my_filtered_dir/%'
ORDER BY id ASC;

And then filter on the IDs in the UI to annotate the images I am interested in. But it would be nice to skip this step and fix the filter.