Error checks on network calls, try/catch, etc #440

Joshfindit · 2025-06-11T23:13:07Z

Joshfindit
Jun 11, 2025

Related to #429 but happened to me in a few other situations (accessing from a VPS without a proxy)

I think it would be helpful to add checks and catches to any network calls:

Bad response code
Empty reply (this was the case for "no element found")
Response contents start with <!DOCTYPE html> when the expected reply is JSON (this is likely in the case of a text/HTML warning from YT when they make assumptions that it's someone writing a script)
Parsing response body fails due to invalid structure

If it's welcome, can I create a long-running feature branch for network error handling? I'll wrap the calls one at a time, keep it synced with main, and you can merge my progress incrementally.

jdepoix · 2025-06-12T07:38:54Z

jdepoix
Jun 12, 2025
Maintainer

Hi @Joshfindit,

thank you for the suggestion!

Could you provide some examples, were you feel like additional errors would provide value? There's already quite a few specific exception types in ._errors.py that are aimed at providing users with additional information to understand/handle the error. To touch on your points specifically with that in mind:

Bad response code

We already do response.raise_for_status() on all responses, which will raise a requests exception if the status code is != 2XX. I think this usually provides the most detailed information on unexpected network errors. I usually don't see any value in wrapping errors where we are not able to provide additional information. But if you have specific cases in mind where we could provide useful information, I am definitely up for it!

Empty reply (this was the case for "no element found")

I already added a check for this in v1.1.0

Response contents start with when the expected reply is JSON (this is likely in the case of a text/HTML warning from YT when they make assumptions that it's someone writing a script)

Have you encountered an example of this happening? From my testing the Innertube API will return a JSON error response if something went wrong. We are already checking that the returned JSON contains the fields we are expecting and throwing exceptions accordingly. So what cases do you see unchecked here?

Parsing response body fails due to invalid structure

As mentioned above, I think this is already happening sufficiently, but let me know if you feel like there are unchecked cases!

Generally speaking it's up to you whether you do a long running, or individual branches on your fork, but it's definitely easier for me if you create PRs for individual changes when adding multiple new error types (unless they depend on each other of course).

I converted this into a discussion, as it's not really about a specific issue!

0 replies

Joshfindit · 2025-06-12T14:35:44Z

Joshfindit
Jun 12, 2025
Author

Thanks for the detailed response and for converting this to a discussion. You're absolutely right that there's already solid error handling in many places. Let me be more specific about the gaps I've identified:

XML Parsing the response body:

The primary uncovered case is in _TranscriptParser.parse() in _transcripts.py. This line has no error handling:

for xml_element in ElementTree.fromstring(raw_data)

Specific scenarios that would crash:

Empty transcript response ("") → xml.etree.ElementTree.ParseError
YouTube returns HTML error page with 200 status → ParseError
Malformed XML from network corruption → ParseError
YouTube API format changes → ParseError

Current flow that can crash:

# In Transcript.fetch()
response = self._http_client.get(self._url)
snippets = _TranscriptParser(preserve_formatting=preserve_formatting).parse(
    _raise_http_errors(response, self.video_id).text,  # ← HTTP-level check only
)

In all those cases, wrapping it in something that gives an error message like "YouTube returned an unexpected response body. This may be due to your IP being blocked or to some other measure" would help users understand what's going on so they can review the documentation.

_raise_http_errors validates HTTP status, but not content format. If YouTube returns a 200 response with invalid XML, it crashes with an unhandled ParseError instead of raising one of your custom exceptions.

Proposed Solution:

Following your existing _raise_http_errors pattern, I'd suggest adding a _raise_content_errors function:

def _raise_content_errors(raw_data: str, video_id: str, content_type: str = "xml") -> str:
    """
    Validates content format and raises appropriate exceptions for invalid data.
    
    :param raw_data: The raw content to validate
    :param video_id: Video ID for error context
    :param content_type: Type of content expected ("xml", "json")
    :return: The validated raw_data
    """
    if not raw_data or not raw_data.strip():
        raise YouTubeDataUnparsable(video_id, "Empty response content")
    
    if content_type == "xml":
        try:
            # Validate XML can be parsed
            ElementTree.fromstring(raw_data)
        except Exception as xml_error:
            raise YouTubeDataUnparsable(video_id, f"Invalid XML content") from xml_error
    
    elif content_type == "json":
        try:
            import json
            json.loads(raw_data)
        except json.JSONDecodeError as json_error:
            raise YouTubeDataUnparsable(video_id, f"Invalid JSON content") from json_error
        except Exception as json_error:
            raise YouTubeDataUnparsable(video_id, f"JSON parsing failed") from json_error
    
    return raw_data

With this function, it can be used anywhere _raise_http_errors is used, including the new InnerTube calls:

def _fetch_innertube_data(self, video_id: str, api_key: str) -> Dict:
    response = self._http_client.post(
        INNERTUBE_API_URL.format(api_key=api_key),
        json={
            "context": INNERTUBE_CONTEXT,
            "videoId": video_id,
        },
    )
    validated_content = _raise_content_errors(
        _raise_http_errors(response, video_id).text,
        video_id,
        "json"
    )
    return json.loads(validated_content)

This maintains your existing error handling philosophy while catching the XML parsing edge cases.
Why This Matters

The InnerTube refactor (great work btw!) made caption metadata discovery more robust, but the final transcript fetching still uses the old XML endpoints and parsing. Any malformed XML crashes the entire application instead of providing users with a helpful YouTubeDataUnparsable exception.

Happy to implement this as a focused PR if you think it adds value!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Error checks on network calls, try/catch, etc #440

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Error checks on network calls, try/catch, etc #440

Uh oh!

Joshfindit Jun 11, 2025

Replies: 2 comments

Uh oh!

jdepoix Jun 12, 2025 Maintainer

Uh oh!

Joshfindit Jun 12, 2025 Author

Joshfindit
Jun 11, 2025

jdepoix
Jun 12, 2025
Maintainer

Joshfindit
Jun 12, 2025
Author