TLDExtract Registered Domain is Empty using Custom S3 Domain

Anna_Kiefer · March 7, 2025, 5:01pm

I’m trying to integrate Label Studio with a custom S3-compatible storage endpoint hosted at http://s3.company.internal, but I’ve encountered an issue where the system isn’t correctly parsing the domain name (at label-studio/label_studio/io_storages/s3/utils.py at develop · HumanSignal/label-studio · GitHub). In this line, registered_domain returns as an empty string, which then causes the next line to fail and the “unrecognized S3 domain” exception.

Here’s the code:

self.s3_endpoint = ‘http://s3.company.internal’

urlparse(self.s3_endpoint)
ParseResult(scheme=‘http’, netloc=‘s3.company.internal’, path=‘’, params=‘’, query=‘’, fragment=‘’)

extracted_lib = extractor.extract_urllib(parsed)
extracted_lib
ExtractResult(subdomain=‘s3.company’, domain=‘internal’, suffix=‘’, is_private=False)
extracted_lib.registered_domain
‘’

I’ve verified that the URL is correct, and other tools can access it without issues.
I’ve attempted to troubleshoot using urlparse directly, and the result shows that while the domain (internal) and subdomain (s3.company) are extracted, the registered_domain is empty.

Has anyone faced similar issues with custom S3 domains or endpoints? Any advice on how to make Label Studio correctly handle custom domains like s3.company.internal?

I am using Label Studio 1.16.0. I’d prefer to keep the custom domain as it follows a similar format to all of my other endpoints.

Jo_Booth · March 7, 2025, 11:09pm

Hi Anna,

Thanks for writing in with this error - and great work digging into the issue! One thing to note is that if the tldextract code is being reached at all, this does imply that an exception is already being raised when Label Studio is trying to interact with the storage endpoint. If you’re able to see the logs of the running Label Studio instance, you’d be able to read the details of the full logged exception on stderr:

github.com/HumanSignal/label-studio

label_studio/io_storages/s3/utils.py

develop


      
          raise a new one with the previous context suppressed. See also: https://peps.python.org/pep-0409/
          """
          
          def wrapper(self, *args, **kwargs):
              try:
                  return func(self, *args, **kwargs)
              except Exception as e:
                  if self.s3_endpoint and (
                      domain := extractor.extract_urllib(urlparse(self.s3_endpoint)).registered_domain.lower()
                  ) not in [trusted_domain.lower() for trusted_domain in settings.S3_TRUSTED_STORAGE_DOMAINS]:
                      logger.error(f'Exception from unrecognized S3 domain: {e}', exc_info=True)
                      raise S3StorageError(
                          f'Debugging info is not available for s3 endpoints on domain: {domain}. '
                          'Please contact your Label Studio devops team if you require detailed error reporting for this domain.'
                      ) from None
                  else:
                      raise e
          
          return wrapper

By reading these logs, it should be possible to figure out what’s causing the exception and remedy the issue.

As far as providing a general in-UI solution for this scenario, I think the ideal approach would be to expose an environment variable setting in base.py for tldextract’s extra suffixes feature: GitHub - john-kurkowski/tldextract: Accurately separates a URL’s subdomain, domain, and public suffix, using the Public Suffix List (PSL). - and similarly, if you’re comfortable modifying Label Studio’s code and building it, you could add internal to an extra_suffixes kwarg on this line: label-studio/label_studio/io_storages/s3/utils.py at 827e3a6edc66b4c3425be1fde5291674808ca732 · HumanSignal/label-studio · GitHub

In the meantime, we’ll track implementing extra suffixes support as a feature request on our side.

Cheers,
Jo

Topic		Replies	Views
API Ignoring Presigned URLs Parameter Label Studio Support	1	57	November 18, 2024
How can I override Label Studio’s CSP_CONNECT_SRC in Docker to allow loading CSVs from a custom domain? Label Studio Support	0	3	July 10, 2025
Moved backend to new subnet - front-end unhappy Label Studio Support	2	44	September 28, 2024
Label studio and Minio Local Setup Label Studio Support	1	803	October 1, 2024
Persistent Label Studio API Connection Issue from Python - 401 "Invalid token" Error Label Studio SDK	2	9	July 10, 2025

TLDExtract Registered Domain is Empty using Custom S3 Domain

self.s3_endpoint = ‘http://s3.company.internal’

Related topics