pydantic
diff --git a/‎docs/input.md
Lines changed: 21 additions & 8 deletions b/‎docs/input.md
Lines changed: 21 additions & 8 deletions
diff --git a/‎docs/models/google.md
Lines changed: 0 additions & 7 deletions b/‎docs/models/google.md
Lines changed: 0 additions & 7 deletions
diff --git a/‎pydantic_ai_slim/pydantic_ai/messages.py
Lines changed: 43 additions & 13 deletions b/‎pydantic_ai_slim/pydantic_ai/messages.py
Lines changed: 43 additions & 13 deletions
diff --git a/‎pydantic_ai_slim/pydantic_ai/models/__init__.py
Lines changed: 89 additions & 2 deletions b/‎pydantic_ai_slim/pydantic_ai/models/__init__.py
Lines changed: 89 additions & 2 deletions
diff --git a/‎pydantic_ai_slim/pydantic_ai/models/anthropic.py
Lines changed: 3 additions & 11 deletions b/‎pydantic_ai_slim/pydantic_ai/models/anthropic.py
Lines changed: 3 additions & 11 deletions
diff --git a/‎pydantic_ai_slim/pydantic_ai/models/bedrock.py
Lines changed: 23 additions & 15 deletions b/‎pydantic_ai_slim/pydantic_ai/models/bedrock.py
Lines changed: 23 additions & 15 deletions
@@ -2,6 +2,7 @@
 
 Some LLMs are now capable of understanding audio, video, image and document content.
 
+
 ## Image Input
 
 !!! info
@@ -64,14 +65,6 @@ You can provide video input using either [`VideoUrl`][pydantic_ai.VideoUrl] or [
 !!! info
     Some models do not support document input. Please check the model's documentation to confirm whether it supports document input.
 
-!!! warning
-    When using Gemini models, the document content will always be sent as binary data, regardless of whether you use `DocumentUrl` or `BinaryContent`. This is due to differences in how Vertex AI and Google AI handle document inputs.
-
-    For more details, see [this discussion](https://discuss.ai.google.dev/t/i-am-using-google-generative-ai-model-gemini-1-5-pro-for-image-analysis-but-getting-error/34866/4).
-
-    If you are unsatisfied with this behavior, please let us know by opening an issue on
-    [GitHub](https://github.com/pydantic/pydantic-ai/issues).
-
 You can provide document input using either [`DocumentUrl`][pydantic_ai.DocumentUrl] or [`BinaryContent`][pydantic_ai.BinaryContent]. The process is similar to the examples above.
 
 If you have a direct URL for the document, you can use [`DocumentUrl`][pydantic_ai.DocumentUrl]:
@@ -109,3 +102,23 @@ result = agent.run_sync(
 print(result.output)
 # > The document discusses...
 ```
+
+## User-side download vs. direct file URL
+
+As a general rule, when you provide a URL using any of `ImageUrl`, `AudioUrl`, `VideoUrl` or `DocumentUrl`, PydanticAI downloads the file content and then sends it as part of the API request.
+
+The situation is different for certain models:
+
+- [`AnthropicModel`][pydantic_ai.models.anthropic.AnthropicModel]: if you provide a PDF document via `DocumentUrl`, the URL is sent directly in the API request, so no download happens on the user side.
+
+- [`GeminiModel`][pydantic_ai.models.gemini.GeminiModel] and [`GoogleModel`][pydantic_ai.models.google.GoogleModel] on Vertex AI: any URL provided using `ImageUrl`, `AudioUrl`, `VideoUrl`, or `DocumentUrl` is sent as-is in the API request and no data is downloaded beforehand.
+
+    See the [Gemini API docs for Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference#filedata) to learn more about supported URLs, formats and limitations:
+
+    - Cloud Storage bucket URIs (with protocol `gs://`)
+    - Public HTTP(S) URLs
+    - Public YouTube video URL (maximum one URL per request)
+
+    However, because of crawling restrictions, it may happen that Gemini can't access certain URLs. In that case, you can instruct PydanticAI to download the file content and send that instead of the URL by setting the boolean flag `force_download` to `True`. This attribute is available on all objects that inherit from [`FileUrl`][pydantic_ai.messages.FileUrl].
+
+- [`GeminiModel`][pydantic_ai.models.gemini.GeminiModel] and [`GoogleModel`][pydantic_ai.models.google.GoogleModel] on GLA: YouTube video URLs are sent directly in the request to the model.
@@ -161,13 +161,6 @@ See the [Gemini API docs](https://ai.google.dev/gemini-api/docs/safety-settings)
 
 `GoogleModel` supports multi-modal input, including documents, images, audio, and video. See the [input documentation](../input.md) for details and examples.
 
-!!! warning
-    When using Gemini models, document content is always sent as binary data, regardless of whether you use `DocumentUrl` or `BinaryContent`.
-    This is due to differences in how Vertex AI and Google AI handle document inputs.
-
-    See [this discussion](https://discuss.ai.google.dev/t/i-am-using-google-generative-ai-model-gemini-1-5-pro-for-image-analysis-but-getting-error/34866/4)
-    for more details.
-
 ## Model settings
 
 You can use the [`GoogleModelSettings`][pydantic_ai.models.google.GoogleModelSettings] class to customize the model request.
 
@@ -2,6 +2,7 @@
 
 import base64
 import uuid
+from abc import ABC, abstractmethod
 from collections.abc import Sequence
 from dataclasses import dataclass, field, replace
 from datetime import datetime
@@ -80,8 +81,35 @@ def otel_event(self, _settings: InstrumentationSettings) -> Event:
 
 
 @dataclass(repr=False)
-class VideoUrl:
-    """A URL to an video."""
+class FileUrl(ABC):
+    """Abstract base class for any URL-based file."""
+
+    url: str
+    """The URL of the file."""
+
+    force_download: bool = False
+    """If the model supports it:
+
+    * If True, the file is downloaded and the data is sent to the model as bytes.
+    * If False, the URL is sent directly to the model and no download is performed.
+    """
+
+    @property
+    @abstractmethod
+    def media_type(self) -> str:
+        """Return the media type of the file, based on the url."""
+
+    @property
+    @abstractmethod
+    def format(self) -> str:
+        """The file format."""
+
+    __repr__ = _utils.dataclasses_no_defaults_repr
+
+
+@dataclass(repr=False)
+class VideoUrl(FileUrl):
+    """A URL to a video."""
 
     url: str
     """The URL of the video."""
@@ -108,9 +136,19 @@ def media_type(self) -> VideoMediaType:
             return 'video/x-ms-wmv'
         elif self.url.endswith('.three_gp'):
             return 'video/3gpp'
+        # Assume that YouTube videos are mp4 because there would be no extension
+        # to infer from. This should not be a problem, as Gemini disregards media
+        # type for YouTube URLs.
+        elif self.is_youtube:
+            return 'video/mp4'
         else:
             raise ValueError(f'Unknown video file extension: {self.url}')
 
+    @property
+    def is_youtube(self) -> bool:
+        """True if the URL has a YouTube domain."""
+        return self.url.startswith(('https://youtu.be/', 'https://youtube.com/', 'https://www.youtube.com/'))
+
     @property
     def format(self) -> VideoFormat:
         """The file format of the video.
@@ -119,11 +157,9 @@ def format(self) -> VideoFormat:
         """
         return _video_format_lookup[self.media_type]
 
-    __repr__ = _utils.dataclasses_no_defaults_repr
-
 
 @dataclass(repr=False)
-class AudioUrl:
+class AudioUrl(FileUrl):
     """A URL to an audio file."""
 
     url: str
@@ -147,11 +183,9 @@ def format(self) -> AudioFormat:
         """The file format of the audio file."""
         return _audio_format_lookup[self.media_type]
 
-    __repr__ = _utils.dataclasses_no_defaults_repr
-
 
 @dataclass(repr=False)
-class ImageUrl:
+class ImageUrl(FileUrl):
     """A URL to an image."""
 
     url: str
@@ -182,11 +216,9 @@ def format(self) -> ImageFormat:
         """
         return _image_format_lookup[self.media_type]
 
-    __repr__ = _utils.dataclasses_no_defaults_repr
-
 
 @dataclass(repr=False)
-class DocumentUrl:
+class DocumentUrl(FileUrl):
     """The URL of the document."""
 
     url: str
@@ -215,8 +247,6 @@ def format(self) -> DocumentFormat:
         except KeyError as e:
             raise ValueError(f'Unknown document media type: {media_type}') from e
 
-    __repr__ = _utils.dataclasses_no_defaults_repr
-
 
 @dataclass(repr=False)
 class BinaryContent:
 
@@ -6,21 +6,23 @@
 
 from __future__ import annotations as _annotations
 
+import base64
 from abc import ABC, abstractmethod
 from collections.abc import AsyncIterator, Iterator
 from contextlib import asynccontextmanager, contextmanager
 from dataclasses import dataclass, field, replace
 from datetime import datetime
 from functools import cache, cached_property
+from typing import Generic, TypeVar, overload
 
 import httpx
-from typing_extensions import Literal, TypeAliasType
+from typing_extensions import Literal, TypeAliasType, TypedDict
 
 from pydantic_ai.profiles import DEFAULT_PROFILE, ModelProfile, ModelProfileSpec
 
 from .._parts_manager import ModelResponsePartsManager
 from ..exceptions import UserError
-from ..messages import ModelMessage, ModelRequest, ModelResponse, ModelResponseStreamEvent
+from ..messages import FileUrl, ModelMessage, ModelRequest, ModelResponse, ModelResponseStreamEvent, VideoUrl
 from ..profiles._json_schema import JsonSchemaTransformer
 from ..settings import ModelSettings
 from ..tools import ToolDefinition
@@ -611,6 +613,91 @@ def _cached_async_http_transport() -> httpx.AsyncHTTPTransport:
     return httpx.AsyncHTTPTransport()
 
 
+DataT = TypeVar('DataT', str, bytes)
+
+
+class DownloadedItem(TypedDict, Generic[DataT]):
+    """The downloaded data and its type."""
+
+    data: DataT
+    """The downloaded data."""
+
+    data_type: str
+    """The type of data that was downloaded.
+
+    Extracted from header "content-type", but defaults to the media type inferred from the file URL if content-type is "application/octet-stream".
+    """
+
+
+@overload
+async def download_item(
+    item: FileUrl,
+    data_format: Literal['bytes'],
+    type_format: Literal['mime', 'extension'] = 'mime',
+) -> DownloadedItem[bytes]: ...
+
+
+@overload
+async def download_item(
+    item: FileUrl,
+    data_format: Literal['base64', 'base64_uri', 'text'],
+    type_format: Literal['mime', 'extension'] = 'mime',
+) -> DownloadedItem[str]: ...
+
+
+async def download_item(
+    item: FileUrl,
+    data_format: Literal['bytes', 'base64', 'base64_uri', 'text'] = 'bytes',
+    type_format: Literal['mime', 'extension'] = 'mime',
+) -> DownloadedItem[str] | DownloadedItem[bytes]:
+    """Download an item by URL and return the content as a bytes object or a (base64-encoded) string.
+
+    Args:
+        item: The item to download.
+        data_format: The format to return the content in:
+            - `bytes`: The raw bytes of the content.
+            - `base64`: The base64-encoded content.
+            - `base64_uri`: The base64-encoded content as a data URI.
+            - `text`: The content as a string.
+        type_format: The format to return the media type in:
+            - `mime`: The media type as a MIME type.
+            - `extension`: The media type as an extension.
+
+    Raises:
+        UserError: If the URL points to a YouTube video or its protocol is gs://.
+    """
+    if item.url.startswith('gs://'):
+        raise UserError('Downloading from protocol "gs://" is not supported.')
+    elif isinstance(item, VideoUrl) and item.is_youtube:
+        raise UserError('Downloading YouTube videos is not supported.')
+
+    client = cached_async_http_client()
+    response = await client.get(item.url, follow_redirects=True)
+    response.raise_for_status()
+
+    if content_type := response.headers.get('content-type'):
+        content_type = content_type.split(';')[0]
+        if content_type == 'application/octet-stream':
+            content_type = None
+
+    media_type = content_type or item.media_type
+
+    data_type = media_type
+    if type_format == 'extension':
+        data_type = data_type.split('/')[1]
+
+    data = response.content
+    if data_format in ('base64', 'base64_uri'):
+        data = base64.b64encode(data).decode('utf-8')
+        if data_format == 'base64_uri':
+            data = f'data:{media_type};base64,{data}'
+        return DownloadedItem[str](data=data, data_type=data_type)
+    elif data_format == 'text':
+        return DownloadedItem[str](data=data.decode('utf-8'), data_type=data_type)
+    else:
+        return DownloadedItem[bytes](data=data, data_type=data_type)
+
+
 @cache
 def get_user_agent() -> str:
     """Get the user agent string for the HTTP client."""
 
@@ -31,14 +31,7 @@
 from ..providers import Provider, infer_provider
 from ..settings import ModelSettings
 from ..tools import ToolDefinition
-from . import (
-    Model,
-    ModelRequestParameters,
-    StreamedResponse,
-    cached_async_http_client,
-    check_allow_model_requests,
-    get_user_agent,
-)
+from . import Model, ModelRequestParameters, StreamedResponse, check_allow_model_requests, download_item, get_user_agent
 
 try:
     from anthropic import NOT_GIVEN, APIStatusError, AsyncAnthropic, AsyncStream
@@ -372,11 +365,10 @@ async def _map_user_prompt(
                     if item.media_type == 'application/pdf':
                         yield BetaBase64PDFBlockParam(source={'url': item.url, 'type': 'url'}, type='document')
                     elif item.media_type == 'text/plain':
-                        response = await cached_async_http_client().get(item.url)
-                        response.raise_for_status()
+                        downloaded_item = await download_item(item, data_format='text')
                         yield BetaBase64PDFBlockParam(
                             source=BetaPlainTextSourceParam(
-                                data=response.text, media_type=item.media_type, type='text'
+                                data=downloaded_item['data'], media_type=item.media_type, type='text'
                             ),
                             type='document',
                         )
 
@@ -32,12 +32,7 @@
     UserPromptPart,
     VideoUrl,
 )
-from pydantic_ai.models import (
-    Model,
-    ModelRequestParameters,
-    StreamedResponse,
-    cached_async_http_client,
-)
+from pydantic_ai.models import Model, ModelRequestParameters, StreamedResponse, download_item
 from pydantic_ai.profiles import ModelProfileSpec
 from pydantic_ai.providers import Provider, infer_provider
 from pydantic_ai.providers.bedrock import BedrockModelProfile
@@ -55,6 +50,7 @@
         ConverseResponseTypeDef,
         ConverseStreamMetadataEventTypeDef,
         ConverseStreamOutputTypeDef,
+        DocumentBlockTypeDef,
         GuardrailConfigurationTypeDef,
         ImageBlockTypeDef,
         InferenceConfigurationTypeDef,
@@ -507,25 +503,37 @@ async def _map_user_prompt(part: UserPromptPart, document_count: Iterator[int])
                     else:
                         raise NotImplementedError('Binary content is not supported yet.')
                 elif isinstance(item, (ImageUrl, DocumentUrl, VideoUrl)):
-                    response = await cached_async_http_client().get(item.url)
-                    response.raise_for_status()
+                    downloaded_item = await download_item(item, data_format='bytes', type_format='extension')
+                    format = downloaded_item['data_type']
                     if item.kind == 'image-url':
                         format = item.media_type.split('/')[1]
                         assert format in ('jpeg', 'png', 'gif', 'webp'), f'Unsupported image format: {format}'
-                        image: ImageBlockTypeDef = {'format': format, 'source': {'bytes': response.content}}
+                        image: ImageBlockTypeDef = {'format': format, 'source': {'bytes': downloaded_item['data']}}
                         content.append({'image': image})
 
                     elif item.kind == 'document-url':
                         name = f'Document {next(document_count)}'
-                        data = response.content
-                        content.append({'document': {'name': name, 'format': item.format, 'source': {'bytes': data}}})
+                        document: DocumentBlockTypeDef = {
+                            'name': name,
+                            'format': item.format,
+                            'source': {'bytes': downloaded_item['data']},
+                        }
+                        content.append({'document': document})
 
                     elif item.kind == 'video-url':  # pragma: no branch
                         format = item.media_type.split('/')[1]
-                        assert format in ('mkv', 'mov', 'mp4', 'webm', 'flv', 'mpeg', 'mpg', 'wmv', 'three_gp'), (
-                            f'Unsupported video format: {format}'
-                        )
-                        video: VideoBlockTypeDef = {'format': format, 'source': {'bytes': response.content}}
+                        assert format in (
+                            'mkv',
+                            'mov',
+                            'mp4',
+                            'webm',
+                            'flv',
+                            'mpeg',
+                            'mpg',
+                            'wmv',
+                            'three_gp',
+                        ), f'Unsupported video format: {format}'
+                        video: VideoBlockTypeDef = {'format': format, 'source': {'bytes': downloaded_item['data']}}
                         content.append({'video': video})
                 elif isinstance(item, AudioUrl):  # pragma: no cover
                     raise NotImplementedError('Audio is not supported yet.')