diff --git a/peps/pep-0694.rst b/peps/pep-0694.rst index db2ef5fdb24..c07a09feada 100644 --- a/peps/pep-0694.rst +++ b/peps/pep-0694.rst @@ -1,6 +1,6 @@ PEP: 694 Title: Upload 2.0 API for Python Package Indexes -Author: Barry Warsaw , Donald Stufft +Author: Barry Warsaw , Donald Stufft , Ee Durbin Discussions-To: https://discuss.python.org/t/pep-694-upload-2-0-api-for-python-package-repositories/16879 Status: Draft Type: Standards Track @@ -14,22 +14,24 @@ Post-History: `27-Jun-2022 `__; +* "staging" a release, which can be used to test uploads before publicly publishing them, + without the need for `test.pypi.org `__; * artifacts which can be overwritten and replaced, until a session is published; -* asynchronous and "chunked", resumable file uploads, for more efficient use of network bandwidth; - * detailed status on the state of artifact uploads; * new project creation without requiring the uploading of an artifact. +* a protocol to extend the supported upload mechanisms in the future without requiring a full PEP; + these can be standardized and recommended for all indexes, or be index-specific; + Once this new upload API is adopted, the existing legacy API can be deprecated, however this PEP does not propose a deprecation schedule for the legacy API. @@ -49,10 +51,10 @@ In addition, there are a number of major issues with the legacy API: * It is fully synchronous, which forces requests to be held open both for the upload itself, and while the index processes the uploaded file to determine success or failure. -* It does not support any mechanism for resuming an upload. With the largest default file size on - PyPI being around 1GB in size, requiring the entire upload to complete successfully means - bandwidth is wasted when such uploads experience a network interruption while the request is in - progress. +* It does not support any mechanism for parallelizing or resuming an upload. With the largest + default file size on PyPI being around 1GB in size, requiring the entire upload to complete + successfully means bandwidth is wasted when such uploads experience a network interruption while + the request is in progress. * The atomic unit of operation is a single file. This is problematic when a release logically includes an sdist and multiple binary wheels, leading to race conditions where consumers get @@ -77,9 +79,12 @@ In addition, there are a number of major issues with the legacy API: * Creation of new projects requires the uploading of at least one file, leading to "stub" uploads to claim a project namespace. -The new upload API proposed in this PEP solves all of these problems, providing for a much more -flexible, bandwidth friendly approach, with better error reporting, a better release testing -experience, and atomic and simultaneous publishing of all release artifacts. +The new upload API proposed in this PEP provides an immediate solution to many of these problems, +and defines a flexible mechanism for future support of the other problems by extension. +Indexes implementing this API will provide better error reporting, +better release testing experience, +and atomic and simultaneous publishing of all release artifacts. +In the future indexes can implement resumable and parallel uploads via extensions. Legacy API @@ -96,8 +101,9 @@ The existing upload API lives at a base URL. For PyPI, that URL is currently ``https://upload.pypi.org/legacy/``. Clients performing uploads specify the API they want to call by adding an ``:action`` URL parameter with a value of ``file_upload``. [#fn-action]_ -The legacy API also has a ``protocol_version`` parameter, in theory allowing new versions of the API -to be defined. In practice this has never happened, and the value is always ``1``. +The legacy API also has a ``protocol_version`` parameter, +in theory allowing new versions of the API to be defined. +In practice this has never happened, and the value is always ``1``. Thus, the effective upload API on PyPI is: ``https://upload.pypi.org/legacy/?:action=file_upload&protocol_version=1``. @@ -108,8 +114,8 @@ Encoding The data to be submitted is submitted as a ``POST`` request with the content type of ``multipart/form-data``. This reflects the legacy API's historical nature, which was originally -designed not as an API, but rather as a web form on the initial PyPI implementation, with client code -written to programmatically submit that form. +designed not as an API, but rather as a web form on the initial PyPI implementation, +with client code written to programmatically submit that form. Content @@ -118,8 +124,8 @@ Content Roughly speaking, the metadata contained within the package is submitted as parts where the content disposition is ``form-data``, and the metadata key is the name of the field. The names of these various pieces of metadata are not documented, and they sometimes, but not always match the names -used in the ``METADATA`` files for package artifacts. The case rarely matches, and the ``form-data`` -to ``METADATA`` conversion is inconsistent. +used in the ``METADATA`` files for package artifacts. +The case rarely matches, and the ``form-data`` to ``METADATA`` conversion is inconsistent. The upload artifact file itself is sent as a ``application/octet-stream`` part with the name of ``content``, and if there is a PGP signature attached, then it will be included as a @@ -139,11 +145,6 @@ methods. Upload 2.0 API Specification ============================ -This PEP draws inspiration from the `Resumable Uploads for HTTP `_ internet draft, -however there are significant differences. This is largely due to the unique nature of Python -package releases (i.e. metadata, multiple related artifacts, etc.), and the support for an upload -session and release stages. Where it makes sense to adopt details of the draft, this PEP does so. - This PEP traces the root cause of most of the issues with the existing API to be roughly two things: - The metadata is submitted alongside the file, rather than being parsed from the @@ -155,11 +156,17 @@ This PEP traces the root cause of most of the issues with the existing API to be To address these issues, this PEP proposes a multi-request workflow, which at a high level involves these steps: -#. Initiate an upload session, creating a release stage. -#. Upload the file(s) to that stage as part of the upload session. -#. Complete the upload session, publishing or discarding the stage. -#. Optionally check the status of an upload session. +#. Initiate an :ref:`Publishing Session `, creating a release stage. +#. Initiate :ref:`File Upload Session(s) ` to that stage + as part of the Publishing Session. +#. Negotiate the specific :ref:`File Upload Mechanism ` to use + between client and server. +#. Execute File Upload Mechanism for the File Upload Session(s) using the negotiated mechanism(s). +#. Complete the File Upload Session(s), marking them as completed or canceled. +#. Complete the Publishing Session, publishing or discarding the stage. +#. Optionally check the status of a Publishing Session. +.. _versioning: Versioning ---------- @@ -170,6 +177,53 @@ PEP does not modify the legacy API in any way. The API proposed in this PEP therefor has the version number ``2.0``. +Content Types +------------- + +Like :pep:`691`, this PEP proposes that all requests and responses from this upload API will have a +standard content type that describes what the content is, what version of the API it represents, +and what serialization format has been used. + +This standard request content type applies to all requests *except* for requests to execute +a File Upload Mechanism, which will be specified by the documentation for that mechanism. + +The structure of the ``Content-Type`` header for all other requests is: + +.. code-block:: text + + application/vnd.pypi.upload.$version+$format + +Since minor API version differences should never be disruptive, only the major version is included +in the content type; the version number is prefixed with a ``v``. + +The major API version specified in the ``.meta.api-version`` JSON key of client requests +**MUST** match the ``Content-Type`` header for major version. + +Unlike :pep:`691`, this PEP does not change the existing *legacy* ``1.0`` upload API in any way, +so servers are required to host the new API described in this PEP at a different endpoint than the +existing upload API. + +Since JSON is the only defined request format defined in this PEP, all non-file-upload requests +defined in this PEP **MUST** include a ``Content-Type`` header value of: + +- ``application/vnd.pypi.upload.v2+json``. + +Similar to :pep:`691`, this PEP also standardizes on using server-driven content negotiation to +allow clients to request different versions or serialization formats, which includes the ``format`` +part of the content type. However, since this PEP expects the existing legacy ``1.0`` upload API +to exist at a different endpoint, and this PEP currently only provides for JSON serialization, this +mechanism is not particularly useful. +Clients only have a single version and serialization they can request. +However clients **SHOULD** be prepared to handle content negotiation gracefully +in the case that additional formats or versions are added in the future. + +Unless otherwise specified, all HTTP requests and responses in this document are assumed to include +the HTTP header: + +.. code-block:: text + + Content-Type: application/vnd.pypi.upload.v2+json + Root Endpoint ------------- @@ -178,18 +232,56 @@ All URLs described here are relative to the "root endpoint", which may be locate the url structure of a domain. For example, the root endpoint could be ``https://upload.example.com/``, or ``https://example.com/upload/``. -Specifically for PyPI, this PEP proposes to implement the root endpoint at -``https://upload.pypi.org/2.0``. This root URL will be considered provisional while the feature is -being tested, and will be blessed as permanent after sufficient testing with live projects. +The choice of the root endpoint is left up to the index operator. +.. _session-errors: -.. _session-create: +Errors +------ -Create an Upload Session -~~~~~~~~~~~~~~~~~~~~~~~~ +All error responses that contain content look like: -A release starts by creating a new upload session. To create the session, a client submits a ``POST`` request -to the root URL, with a payload that looks like: +.. code-block:: json + + { + "meta": { + "api-version": "2.0" + }, + "message": "...", + "errors": [ + { + "source": "...", + "message": "..." + } + ] + } + +Besides the standard ``meta`` key, this has the following top level keys: + +``message`` + A singular message that encapsulates all errors that may have happened on this + request. + +``errors`` + An array of specific errors, each of which contains a ``source`` key, which is a string that + indicates what the source of the error is, and a ``message`` key for that specific error. + +The ``message`` and ``source`` strings do not have any specific meaning, and are intended for human +interpretation to aid in diagnosing underlying issue. + + +.. _publishing-session: + +Publishing Session +------------------ + +.. _publishing-session-create: + +Create a Publishing Session +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +A release starts by creating a new Publishing Session. To create the session, a client submits a +``POST`` request to the root URL like: .. code-block:: json @@ -216,31 +308,34 @@ The request includes the following top-level keys: The version of the project that this session is attempting to add files to. ``nonce`` (**optional**) - An additional client-side string input to the :ref:`"session token" ` - algorithm. Details are provided below, but if this key is omitted, it is equivalent - to passing the empty string. + An additional client-side string input to the + :ref:`"Publishing Session Token" ` algorithm. + Details are provided below, but if this key is omitted, + it is equivalent to passing the empty string. Upon successful session creation, the server returns a ``201 Created`` response. If an error occurs, the appropriate ``4xx`` code will be returned, as described in the :ref:`session-errors` section. -If a session is created for a project which has no previous release, then the index **MAY** reserve -the project name before the session is published, however it **MUST NOT** be possible to navigate to -that project using the "regular" (i.e. :ref:`unstaged `) access protocols, *until* -the stage is published. If this first-release stage gets canceled, then the index **SHOULD** delete -the project record, as if it were never uploaded. +If a session is created for a project which has no previous release, +then the index **MAY** reserve the project name before the session is published, +however it **MUST NOT** be possible to navigate to that project using +the "regular" (i.e. :ref:`unstaged `) access protocols, +*until* the stage is published. +If this first-release stage gets canceled, +then the index **SHOULD** delete the project record, as if it were never uploaded. -The session is owned by the user that created it, and all subsequent requests **MUST** be performed -with the same credentials, otherwise a ``403 Forbidden`` will be returned on those subsequent -requests. +The session is owned by the user that created it, +and all subsequent requests **MUST** be performed with the same credentials, +otherwise a ``403 Forbidden`` will be returned on those subsequent requests. -.. _session-response: +.. _publishing-session-response: Response Body +++++++++++++ -The successful response includes the following JSON content: +The successful response includes the following content: .. code-block:: json @@ -253,6 +348,7 @@ The successful response includes the following JSON content: "upload": "...", "session": "...", }, + "mechanisms": ["http-post-bytes"], "session-token": "", "valid-for": 604800, "status": "pending", @@ -267,14 +363,18 @@ Besides the ``meta`` key, which has the same format as the request JSON, the suc the following keys: ``links`` - A dictionary mapping :ref:`keys to URLs ` related to this session, the details of - which are provided below. + A dictionary mapping :ref:`keys to URLs ` related to this session, + the details of which are provided below. + +``mechanisms`` + A list of file-upload mechanisms supported by the server, sorted in server-preferred order. + At least one value is required. ``session-token`` - If the index supports :ref:`previewing staged releases `, this key will contain - the unique :ref:`"session token" ` that can be provided to installers in order to - preview the staged release before it's published. If the index does *not* support stage - previewing, this key **MUST** be omitted. + If the index supports :ref:`previewing staged releases `, + this key will contain the unique :ref:`"session token" ` + that can be provided to installers in order to preview the staged release before it's published. + If the index does *not* support stage previewing, this key **MUST** be omitted. ``valid-for`` An integer representing how long, in seconds, until the server itself will expire this session, @@ -283,8 +383,9 @@ the following keys: :ref:`extended `. The session **SHOULD** live at least this much longer unless the client itself has canceled or published the session. Servers **MAY** choose to *increase* this time, but should never *decrease* it, except naturally through the passage of - time. Clients can query the :ref:`session status ` to get time remaining in the - session. + time. + Clients can query the :ref:`session status ` + to get time remaining in the session. ``status`` A string that contains one of ``pending``, ``published``, ``error``, or ``canceled``, @@ -292,7 +393,7 @@ the following keys: ``files`` A mapping containing the filenames that have been uploaded to this session, to a mapping - containing details about each :ref:`file referenced in this session `. + containing details about each :ref:`file referenced in this session `. ``notices`` An optional key that points to an array of human-readable informational notices that the server @@ -300,16 +401,16 @@ the following keys: to any particular file in the session. -.. _session-links: +.. _publishing-session-links: -Session Links -+++++++++++++ +Publishing Session Links +++++++++++++++++++++++++ For the ``links`` key in the success JSON, the following sub-keys are valid: ``upload`` - The endpoint session clients will use to initiate :ref:`uploads ` for each file to - be included in this session. + The endpoint session clients will use to initiate a :ref:`File Upload Session ` + for each file to be included in this session. ``stage`` The endpoint where this staged release can be :ref:`previewed ` prior to @@ -317,36 +418,39 @@ For the ``links`` key in the success JSON, the following sub-keys are valid: the index does not support previewing staged releases, this key **MUST** be omitted. ``session`` - The endpoint where actions for this session can be performed, including :ref:`publishing this - session `, :ref:`canceling and discarding the session `, - :ref:`querying the current session status `, and :ref:`requesting an extension - of the session lifetime ` (*if* the server supports it). + The endpoint where actions for this session can be performed, + including :ref:`publishing this session `, + :ref:`canceling and discarding the session `, + :ref:`querying the current session status `, + and :ref:`requesting an extension of the session lifetime ` + (*if* the server supports it). -.. _session-files: +.. _publishing-session-files: -Session Files -+++++++++++++ +Publishing Session Files +++++++++++++++++++++++++ The ``files`` key contains a mapping from the names of the files uploaded in this session to a sub-mapping with the following keys: ``status`` - A string with valid values ``partial``, ``pending``, ``complete``, and ``error``. If a file - upload has not seen an ``Upload-Complete: ?1`` header, then ``partial`` will be returned. If - ``Upload-Complete: ?1`` resulted in a ``202 Accepted``, then ``pending`` will be returned until - asynchronous processing of the last chunk and the full file has been completed. If a ``201 - Created`` was returned, or the last chunk processing is finished, ``complete`` will be returned. - If there was an error during upload, then clients should not assume the file is in any usable - state, ``error`` will be returned and it's best to :ref:`cancel or delete ` - the file and start over. This action would remove the file name from the ``files`` key of the - :ref:`session status response body `. + A string with valid values + ``pending``, ``processing``, ``complete``, ``error``, and ``canceled``. + If there was an error during upload, + then clients should not assume the file is in any usable state, + ``error`` will be returned and it's best to + :ref:`cancel or delete ` the file and start over. + This action would remove the file name from the ``files`` key of the + :ref:`session status response body `. ``link`` - The *absolute* URL that the client should use to reference this specific file. This URL is used - to retrieve, replace, or delete the :ref:`referenced file `. If a ``nonce`` was - provided, this URL **MUST** be obfuscated with a non-guessable token as described in the - :ref:`session token ` section. + The *absolute* URL that the client should use to reference this specific file. + This URL is used to retrieve, replace, or delete + the :ref:`referenced file `. + If a ``nonce`` was provided, this URL **MUST** be obfuscated + with a non-guessable token as described in the + :ref:`Publishing Session Token ` section. ``notices`` An optional key with similar format and semantics as the ``notices`` session key, except that @@ -354,21 +458,19 @@ sub-mapping with the following keys: If a second session is created for the same name-version pair while a session for that pair is in the ``pending`` state, then the server **MUST** return the JSON status response for the already -existing session, along with the ``200 Ok`` status code rather than creating a new, empty session. +existing session, along with the ``200 OK`` status code rather than creating a new, empty session. -.. _file-uploads: +.. _publishing-session-completion: -File Upload -~~~~~~~~~~~ +Complete a Publishing Session +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -After creating the session, the ``upload`` endpoint from the response's :ref:`session links -` mapping is used to begin the upload of new files into that session. Clients -**MUST** use the provided ``upload`` URL and **MUST NOT** assume there is any pattern or commonality -to those URLs from one session to the next. +To complete a session and publish the files that have been included in it, a client issues a +``POST`` request to the ``session`` :ref:`link ` +given in the :ref:`session creation response body `. -To initiate a file upload, a client first sends a ``POST`` request to the ``upload`` URL. The -request body has the following JSON format: +The request looks like: .. code-block:: json @@ -376,275 +478,331 @@ request body has the following JSON format: "meta": { "api-version": "2.0" }, - "filename": "foo-1.0.tar.gz", - "size": 1000, - "hashes": {"sha256": "...", "blake2b": "..."}, - "metadata": "..." + "action": "publish", } -Besides the standard ``meta`` key, the request JSON has the following additional keys: - -``filename`` (**required**) - The name of the file being uploaded. - -``size`` (**required**) - The size in bytes of the file being uploaded. - -``hashes`` (**required**) - A mapping of hash names to hex-encoded digests. Each of these digests are the checksums of the - file being uploaded when hashed by the algorithm identified in the name. +If the server is able to immediately complete the Publishing Session, it may do so and return a +``201 Created`` response. If it is unable to immediately complete the Publishing Session +(for instance, if it needs to do validation that may take longer than reasonable in a single HTTP +request), then it may return a ``202 Accepted`` response. - By default, any hash algorithm available in `hashlib - `_ can be used as a key for the hashes - dictionary [#fn-hash]_. At least one secure algorithm from ``hashlib.algorithms_guaranteed`` - **MUST** always be included. This PEP specifically recommends ``sha256``. +In either case, the server should include a ``Location`` header pointing back to +the Publishing Session status URL, +and if the server returned a ``202 Accepted``, +the client may poll that URL to watch for the status to change. - Multiple hashes may be passed at a time, but all hashes provided **MUST** be valid for the file. +If an error occurs, the appropriate ``4xx`` code should be returned, as described in the +:ref:`session-errors` section. -``metadata`` (**optional**) - If given, this is a string value containing the file's `core metadata - `_. +.. _publishing-session-cancellation: -Servers **MAY** use the data provided in this request to do some sanity checking prior to allowing -the file to be uploaded. These checks may include, but are not limited to: +Cancellation +~~~~~~~~~~~~ -- checking if the ``filename`` already exists in a published release; +To cancel a Publishing Session, a client issues a ``DELETE`` request to +the ``session`` :ref:`link ` +given in the :ref:`session creation response body `. +The server then marks the session as canceled, and **SHOULD** purge any data that was uploaded +as part of that session. +Future attempts to access that session URL or any of the Publishing Session URLs +**MUST** return a ``404 Not Found``. -- checking if the ``size`` would exceed any project or file quota; +To prevent dangling sessions, servers may also choose to cancel timed-out sessions on their own +accord. It is recommended that servers expunge their sessions after no less than a week, but each +server may choose their own schedule. Servers **MAY** support client-directed :ref:`session +extensions `. -- checking if the contents of the ``metadata``, if provided, are valid. -If the server determines that upload should proceed, it will return a ``201 Created`` response, with -an empty body, and a ``Location`` header pointing to the URL that the file content should be -uploaded to. The :ref:`status ` of the session will also include the filename in -the ``files`` mapping, with the above ``Location`` URL included in under the ``link`` sub-key. -If the server determines the upload cannot proceed, it **MUST** return a ``409 Conflict``. The -server **MAY** allow parallel uploads of files, but is not required to. +.. _publishing-session-token: +Publishing Session Token +~~~~~~~~~~~~~~~~~~~~~~~~ -.. IMPORTANT:: +When creating a Publishing Session, clients can provide a ``nonce`` in the +:ref:`initial session creation request `. +This nonce is a string with arbitrary content. The ``nonce`` is +optional, and if omitted, is equivalent to providing an empty string. - The `IETF draft `_ calls this the URL of the `upload resource - `_, and this PEP uses that nomenclature as well. +In order to support previewing of staged uploads, the package ``name`` and ``version``, along with +this ``nonce`` are used as input into a hashing algorithm to produce a unique "session token". +This session token is valid for the life of the session +(i.e., until it is completed, either by cancellation or publishing), +and can be provided to supporting installers to gain access to the staged release. + +The use of the ``nonce`` allows clients to decide whether they want to +obscure the visibility of their staged releases or not, +and there can be good reasons for either choice. +For example, if a CI system wants to upload some wheels for a new release, +and wants to allow independent validation of a stage before it's published, +the client may opt for not including a nonce. +On the other hand, if a client would like to pre-seed a release which it publishes atomically +at the time of a public announcement, +that client will likely opt for providing a nonce. -.. _ietf-upload-resource: https://www.ietf.org/archive/id/draft-ietf-httpbis-resumable-upload-05.html#name-upload-creation-2 +The `SHA256 algorithm `_ is used to +turn these inputs into a unique token, in the order ``name``, ``version``, ``nonce``, using the +following Python code as an example: +.. code-block:: python -.. _upload-contents: + from hashlib import sha256 -Upload File Contents -++++++++++++++++++++ + def gentoken(name: bytes, version: bytes, nonce: bytes = b''): + h = sha256() + h.update(name) + h.update(version) + h.update(nonce) + return h.hexdigest() -The actual file contents are uploaded by issuing a ``POST`` request to the upload resource URL -[#fn-location]_. The client may either upload the entire file in a single request, or it may opt -for "chunked" upload where the file contents are split into multiple requests, as described below. +It should be evident that if no ``nonce`` is provided in the +:ref:`session creation request `, +then the session token is easily guessable from the package name and version number alone. +Clients can elect to omit the ``nonce`` (or set it to the empty string themselves) +if they want to allow previewing from anybody without access to the session token. +By providing a non-empty ``nonce``, +clients can elect for security-through-obscurity, +but this does not protect staged files behind any kind of authentication. -.. IMPORTANT:: - The protocol defined in this PEP differs from the `IETF draft `_ in a few ways: +File Upload Session +------------------- - * For chunked uploads, the `second and subsequent chunks `_ are uploaded - using a ``POST`` request instead of ``PATCH`` requests. Similarly, this PEP uses - ``application/octet-stream`` for the ``Content-Type`` headers for all chunks. +.. _file-upload-session: - * No ``Upload-Draft-Interop-Version`` header is required. +Create a File Upload Session +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - * Some of the server responses are different. +After creating a Publishing Session, the ``upload`` endpoint from the response's +:ref:`session links ` mapping +is used to begin the upload of new files into that session. +Clients **MUST** use the provided ``upload`` URL and +**MUST NOT** assume there is any pattern or commonality to those URLs from one session to the next. -.. _ietf-upload-append: https://www.ietf.org/archive/id/draft-ietf-httpbis-resumable-upload-05.html#name-upload-append-2 +To initiate a file upload, a client first sends a ``POST`` request to the ``upload`` URL. +The request looks like: +.. code-block:: json -When uploading the entire file in a single request, the request **MUST** include the following -headers (e.g. for a 100,000 byte file): + { + "meta": { + "api-version": "2.0" + }, + "filename": "foo-1.0.tar.gz", + "size": 1000, + "hashes": {"sha256": "...", "blake2b": "..."}, + "metadata": "...", + "mechanism": "http-post-bytes" + } -.. code-block:: email - Content-Length: 100000 - Content-Type: application/octet-stream - Upload-Length: 100000 - Upload-Complete: ?1 +Besides the standard ``meta`` key, the request JSON has the following additional keys: -The body of this request contains all 100,000 bytes of the unencoded raw binary data. +``filename`` (**required**) + The name of the file being uploaded. -``Content-Length`` - The number of file bytes contained in the body of *this* request. +``size`` (**required**) + The size in bytes of the file being uploaded. -``Content-Type`` - **MUST** be ``application/octet-stream``. +``hashes`` (**required**) + A mapping of hash names to hex-encoded digests. Each of these digests are the checksums of the + file being uploaded when hashed by the algorithm identified in the name. -``Upload-Length`` - Indicates the total number of bytes that will be uploaded for this file. For single-request - uploads this will always be equal to ``Content-Length``, but these values will likely differ for - chunked uploads. This value **MUST** equal the number of bytes given in the ``size`` field of - the file upload initiation request. + By default, any hash algorithm available in `hashlib + `_ can be used as a key for the hashes + dictionary [#fn-hash]_. At least one secure algorithm from ``hashlib.algorithms_guaranteed`` + **MUST** always be included. This PEP specifically recommends ``sha256``. -``Upload-Complete`` - A flag indicating whether more chunks are coming for this file. For single-request uploads, the - value of this header **MUST** be ``?1``. + Multiple hashes may be passed at a time, but all hashes provided **MUST** be valid for the file. -If the upload completes successfully, the server **MUST** respond with a ``201 Created`` status. -The response body has no content. +``mechanism`` (**required**) + The file-upload mechanisms the client intends to use for this file. + This mechanism **SHOULD** be chosen from the list of mechanisms advertised in the + :ref:`Publishing Session response body `. + A client **MAY** send a mechanism that is not advertised in cases where server operators have + documented a new or up-coming mechanism that is available for use on a "pre-release" basis. -If this single-request upload fails, the entire file must be resent in another single HTTP request. -This is the recommended, preferred format for file uploads since fewer requests are required. +``metadata`` (**optional**) + If given, this is a string value containing the file's `core metadata + `_. -As an example, if the client was to upload a 100,000 byte file, the headers would look like: +Servers **MAY** use the data provided in this request to do some sanity checking prior to allowing +the file to be uploaded. These checks may include, but are not limited to: -.. code-block:: email +- checking if the ``filename`` already exists in a published release; - Content-Length: 100000 - Content-Type: application/octet-stream - Upload-Length: 100000 - Upload-Complete: ?1 +- checking if the ``size`` would exceed any project or file quota; -Clients can opt to upload the file in multiple chunks. Because the upload resource URL provided in -the metadata response will be unique per file, clients **MUST** use the given upload resource URL -for all chunks. Clients upload file chunks by sending multiple ``POST`` requests to this URL, with -one request per chunk. +- checking if the contents of the ``metadata``, if provided, are valid. -For chunked uploads, the ``Content-Length`` is equal to the size in bytes of the chunk that is -currently being sent. The client **MUST** include a ``Upload-Offset`` header which indicates the -byte offset that the content included in this chunk's request starts at, and an ``Upload-Complete`` -header with the value ``?0``. For the first chunk, the ``Upload-Offset`` header **MUST** be set to -``0``. As with single-request uploads, the ``Content-Type`` header is ``application/octet-stream`` -and the body is the raw, unencoded bytes of the chunk. +If the server determines that upload should proceed, it will return a ``202 Accepted`` response, +with the response body below. +The :ref:`status ` of the session will also include +the filename in the ``files`` mapping. +If the server determines the upload cannot proceed, +it **MUST** return a ``409 Conflict``. +The server **MAY** allow parallel uploads of files, but is not required to. +If the server cannot proceed with an upload because +the ``mechanism`` supplied by the client is not supported +it **MUST** return a ``422 Unprocessable Entity``. -For example, if uploading a 100,000 byte file in 1000 byte chunks, the first chunk's request headers -would be: +.. _file-upload-session-response: -.. code-block:: email +Response Body ++++++++++++++ - Content-Length: 1000 - Content-Type: application/octet-stream - Upload-Offset: 0 - Upload-Length: 100000 - Upload-Complete: ?0 +The successful response includes the following: -For the second chunk representing bytes 1000 through 1999, include the following headers: +.. code-block:: json -.. code-block:: email + { + "meta": { + "api-version": "2.0" + }, + "links": { + "publishing-session": "...", + "file-upload-session": "..." + }, + "status": "pending", + "valid-for": 3600, + "mechanism": { + "identifier": "http-post-bytes", + "file_url": "...", + "attestations_url": "..." + } + } - Content-Length: 1000 - Content-Type: application/octet-stream - Upload-Offset: 1000 - Upload-Length: 100000 - Upload-Complete: ?0 -These requests would continue sequentially until the last chunk is ready to be uploaded. +Besides the ``meta`` key, which has the same format as the request JSON, the success response has +the following keys: -For each successful chunk, the server **MUST** respond with a ``202 Accepted`` header, except for -the final chunk, which **MUST** be either: +``links`` + A dictionary mapping :ref:`keys to URLs ` related to this session, + the details of which are provided below. -* ``201 Created`` if the server accepts and processes the last chunk synchronously, completing the - file upload. -* ``202 Accepted`` if the server accepts the last chunk, but must process it asynchronously. In - this case, the client should query the :ref:`session status ` periodically until - the uploaded :ref:`file status ` transitions to ``complete``. +``mechanism`` + A mapping containing the necessary details for the supported mechanism + as negotiated by the client and server. + This mapping **MUST** contain a key ``identifier`` which maps to + the identifier string for the File Upload Mechanism. -The final chunk of data **MUST** include the ``Upload-Complete: ?1`` header, since at that point the -entire file has been uploaded. +.. _file-upload-session-links: -With both chunked and non-chunked uploads, once completed successfully, the file **MUST NOT** be -publicly visible in the repository, but merely staged until the upload session is :ref:`completed -`. If the server supports :ref:`previews `, the file **MUST** be -visible at the ``stage`` :ref:`URL `. Partially uploaded chunked files **SHOULD -NOT** be visible at the ``stage`` URL. +File Upload Session Links ++++++++++++++++++++++++++ -The following constraints are placed on uploads regardless of whether they are single chunk or -multiple chunks: +For the ``links`` key in the success JSON, the following sub-keys are valid: -- A client **MUST NOT** perform multiple ``POST`` requests in parallel for the same file to avoid - race conditions and data loss or corruption. +``publishing-session`` + The endpoint where actions for the parent Publishing Session can be performed. -- If the offset provided in ``Upload-Offset`` is not ``0`` and does not correctly specify the byte - offset of the next chunk in an incomplete upload, then the server **MUST** respond with a ``409 - Conflict``. This means that a client **MUST NOT** upload chunks out of order. +``file-upload-session`` + The endpoint where actions for this file-upload-session can be performed. + including :ref:`canceling and discarding the File Upload Session `, + :ref:`querying the current File Upload Session status `, + and :ref:`requesting an extension of the File Upload Session lifetime ` + (*if* the server supports it). -- Once a file upload has completed successfully, you may initiate another upload for that file, - which **once completed**, will replace that file. This is possible until the entire session is - completed, at which point no further file uploads (either creating or replacing a session file) - are accepted. I.e. once a session is published, the files included in that release are immutable - [#fn-immutable]_. +.. _file-upload-session-completion: +Complete a File Upload Session +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Resume an Upload -++++++++++++++++ +To complete a File Upload Session, which indicates that the file upload mechanism has been executed +and did not produce an error, a client issues a ``POST`` to the ``file-upload-session`` link in the +File Upload Session creation response body. -To resume an upload, you first have to know how much of the file's contents the server has already -received. If this is not already known, a client can make a ``HEAD`` request to the upload resource -URL. +The requests looks like: -The server **MUST** respond with a ``204 No Content`` response, with an ``Upload-Offset`` header -that indicates what offset the client should continue uploading from. If the server has not received -any data, then this would be ``0``, if it has received 1007 bytes then it would be ``1007``. For -this example, the full response headers would look like: +.. code-block:: json -.. code-block:: email + { + "meta": { + "api-version": "2.0" + }, + "action": "complete", + } - Upload-Offset: 1007 - Upload-Complete: ?0 - Cache-Control: no-store +If the server is able to immediately complete the File Upload Session, it may do so and return a +``201 Created`` response and set the status of the File Upload Session to ``complete``. +If it is unable to immediately complete the File Upload Session +(for instance, if it needs to do validation that may take longer than reasonable in a single HTTP +request), then it may return a ``202 Accepted`` response +and set the status of the File Upload Session to ``processing``. +In either case, the server should include a ``Location`` header pointing back to the File Upload +Session status URL, and if the server returned a ``202 Accepted``, the client may poll that URL to +watch for the status to change. -Once the client has retrieved the offset that they need to start from, they can upload the rest of -the file as described above, either in a single request containing all of the remaining bytes, or in -multiple chunks as per the above protocol. +If an error occurs, the appropriate ``4xx`` code should be returned, as described in the +:ref:`session-errors` section. -.. _cancel-an-upload: +.. _file-upload-session-cancelation: -Canceling and Deleting File Uploads -+++++++++++++++++++++++++++++++++++ +Cancellation and Deletion +~~~~~~~~~~~~~~~~~~~~~~~~~ -A client can cancel an in-progress upload for a file, or delete a file that has been completely -uploaded. In both cases, the client performs this by issuing a ``DELETE`` request to the upload -resource URL of the file they want to delete. +A client can cancel an in-progress File Upload Session, or delete a file that has been +completely uploaded. In both cases, the client performs this by issuing a ``DELETE`` request to +the File Upload Session URL of the file they want to delete. A successful deletion request **MUST** response with a ``204 No Content``. -Once canceled or deleted, a client **MUST NOT** assume that the previous upload resource URL can be reused. +Once canceled or deleted, a client **MUST NOT** assume that +the previous File Upload Session resource +or associated file upload mechanisms +can be reused. Replacing a Partially or Fully Uploaded File -++++++++++++++++++++++++++++++++++++++++++++ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To replace a session file, the file upload **MUST** have been previously completed, canceled, or deleted. It is not possible to replace a file if the upload for that file is in-progress. -To replace a session file, clients should :ref:`cancel and delete the in-progress upload -` by issuing a ``DELETE`` to the upload resource URL for the file they want to -replace. After this, the new file upload can be initiated by beginning the entire :ref:`file upload -` sequence over again. This means providing the metadata request again to retrieve a -new upload resource URL. Client **MUST NOT** assume that the previous upload resource URL can be -reused after deletion. +To replace a session file, clients should +:ref:`cancel and delete the in-progress upload ` by +issuing a ``DELETE`` to the upload resource URL for the file they want to replace. +After this, the new file upload can be initiated by beginning +the entire :ref:`file upload ` sequence over again. +This means providing the metadata request again to retrieve a new upload resource URL. +Clients **MUST NOT** assume that the previous upload resource URL can be reused after deletion. .. _session-status: Session Status -~~~~~~~~~~~~~~ +-------------- -At any time, a client can query the status of the session by issuing a ``GET`` request to the -``session`` :ref:`link ` given in the :ref:`session creation response body -`. +At any time, a client can query the status of a session by issuing a ``GET`` request to the +``publishing-session`` :ref:`link ` +or ``file-upload-session`` :ref:`link ` +given in the :ref:`session creation response body ` +or :ref:`File Upload Session creation response body `, +respectively. -The server will respond to this ``GET`` request with the same :ref:`response ` -that they got when they initially created the upload session, except with any changes to ``status``, -``valid-for``, or ``files`` reflected. +The server will respond to this ``GET`` request with the same +:ref:`Publishing Session creation response body ` +or :ref:`File Upload Session creation response body `, +that they got when they initially created the Publishing Session or File Upload Session, +except with any changes to ``status``, ``valid-for``, or ``files`` reflected. .. _session-extension: Session Extension -~~~~~~~~~~~~~~~~~ +----------------- Servers **MAY** allow clients to extend sessions, but the overall lifetime and number of extensions allowed is left to the server. To extend a session, a client issues a ``POST`` request to the -``session`` :ref:`link ` given in the :ref:`session creation response body -`. +``publishing-session`` :ref:`link ` +or ``file-upload-session`` :ref:`link ` +given in the :ref:`Publishing Session creation response body ` +or :ref:`File Upload Session creation response body `, +respectively. -The JSON body of this request looks like: +The request looks like: .. code-block:: json @@ -652,144 +810,37 @@ The JSON body of this request looks like: "meta": { "api-version": "2.0" }, - ":action": "extend", + "action": "extend", "extend-for": 3600 } The number of seconds specified is just a suggestion to the server for the number of additional seconds to extend the current session. For example, if the client wants to extend the current session for another hour, ``extend-for`` would be ``3600``. Upon successful extension, the server -will respond with the same :ref:`response ` that they got when they initially -created the upload session, except with any changes to ``status``, ``valid-for``, or ``files`` -reflected. +will respond with the same +:ref:`Publishing Session creation response body ` +or :ref:`File Upload Session creation response body `, +that they got when they initially created the Publishing Session or File Upload Session, +except with any changes to ``status``, ``valid-for``, or ``files`` reflected. If the server refuses to extend the session for the requested number of seconds, it still returns a success response, and the ``valid-for`` key will simply include the number of seconds remaining in the current session. - -.. _session-cancellation: - -Session Cancellation -~~~~~~~~~~~~~~~~~~~~ - -To cancel an entire session, a client issues a ``DELETE`` request to the ``session`` :ref:`link -` given in the :ref:`session creation response body `. The server -then marks the session as canceled, and **SHOULD** purge any data that was uploaded as part of that -session. Future attempts to access that session URL or any of the upload session URLs **MUST** -return a ``404 Not Found``. - -To prevent dangling sessions, servers may also choose to cancel timed-out sessions on their own -accord. It is recommended that servers expunge their sessions after no less than a week, but each -server may choose their own schedule. Servers **MAY** support client-directed :ref:`session -extensions `. - - -.. _publish-session: - -Session Completion -~~~~~~~~~~~~~~~~~~ - -To complete a session and publish the files that have been included in it, a client issues a -``POST`` request to the ``session`` :ref:`link ` given in the :ref:`session creation -response body `. - -The JSON body of this request looks like: - -.. code-block:: json - - { - "meta": { - "api-version": "2.0" - }, - ":action": "publish", - } - - -If the server is able to immediately complete the session, it may do so and return a ``201 Created`` -response. If it is unable to immediately complete the session (for instance, if it needs to do -processing that may take longer than reasonable in a single HTTP request), then it may return a -``202 Accepted`` response. - -In either case, the server should include a ``Location`` header pointing back to the session status -URL, and if the server returned a ``202 Accepted``, the client may poll that URL to watch for the -status to change. - -If a session is published that has no staged files, the operation is effectively a no-op, except -where a new project name is being reserved. In this case, the new project is created, reserved, and -owned by the user that created the session. - -If an error occurs, the appropriate ``4xx`` code should be returned, as described in the -:ref:`session-errors` section. - - -.. _session-token: - -Session Token -~~~~~~~~~~~~~ - -When creating a session, clients can provide a ``nonce`` in the :ref:`initial session creation -request ` . This nonce is a string with arbitrary content. The ``nonce`` is -optional, and if omitted, is equivalent to providing an empty string. - -In order to support previewing of staged uploads, the package ``name`` and ``version``, along with -this ``nonce`` are used as input into a hashing algorithm to produce a unique "session token". This -session token is valid for the life of the session (i.e., until it is completed, either by -cancellation or publishing), and can be provided to supporting installers to gain access to the -staged release. - -The use of the ``nonce`` allows clients to decide whether they want to obscure the visibility of -their staged releases or not, and there can be good reasons for either choice. For example, if a CI -system wants to upload some wheels for a new release, and wants to allow independent validation of a -stage before it's published, the client may opt for not including a nonce. On the other hand, if a -client would like to pre-seed a release which it publishes atomically at the time of a public -announcement, that client will likely opt for providing a nonce. - -The `SHA256 algorithm `_ is used to -turn these inputs into a unique token, in the order ``name``, ``version``, ``nonce``, using the -following Python code as an example: - -.. code-block:: python - - from hashlib import sha256 - - def gentoken(name: bytes, version: bytes, nonce: bytes = b''): - h = sha256() - h.update(name) - h.update(version) - h.update(nonce) - return h.hexdigest() - -It should be evident that if no ``nonce`` is provided in the :ref:`session creation request -`, then the preview token is easily guessable from the package name and version -number alone. Clients can elect to omit the ``nonce`` (or set it to the empty string themselves) if -they want to allow previewing from anybody without access to the preview token. By providing a -non-empty ``nonce``, clients can elect for security-through-obscurity, but this does not protect -staged files behind any kind of authentication. - - .. _staged-preview: Stage Previews -~~~~~~~~~~~~~~ +-------------- The ability to preview staged releases before they are published is an important feature of this PEP, enabling an additional level of last-mile testing before the release is available to the public. Indexes **MAY** provide this functionality through the URL provided in the ``stage`` -sub-key of the :ref:`links key ` returned when the session is created. The ``stage`` -URL can be passed to installers such as ``pip`` by setting the `--extra-index-url +sub-key of the :ref:`links key ` returned when +the Publishing Session is created. +The ``stage`` URL can be passed to installers such as ``pip`` by setting the `--extra-index-url `__ flag to this value. Multiple stages can even be previewed by repeating this flag with multiple values. -In the future, it may be valuable to include something like a ``Stage-Token`` header to the `Simple -Repository API `_ -requests or the :pep:`691` JSON-based Simple API, with the value from the ``session-token`` sub-key -of the JSON response to the session creation request. Multiple ``Stage-Token`` headers could be -allowed, and installers could support enabling stage previews by adding a ``--staged `` or -similarly named option to set the ``Stage-Token`` header at the command line. This feature is not -currently support, nor proposed by this PEP, though it could be proposed by a separate PEP in the -future. - In either case, the index will return views that expose the staged releases to the installer tool, making them available to download and install into virtual environments built for that last-mile testing. The former option allows for existing installers to preview staged releases with no @@ -797,125 +848,103 @@ changes, although perhaps in a less user-friendly way. The latter option can be experience, but the details of this are left to installer tool maintainers. -.. _session-errors: +.. _file-upload-mechanisms: -Errors ------- +File Upload Mechanisms +---------------------- -All error responses that contain content will have a body that looks like: +Servers **MUST** implement :ref:`required file upload mechanisms `. +Such mechanisms serve as a fallback if no server specific implementations exist. -.. code-block:: json +Each major version of the Upload API **MUST** specify at least one required File Upload Mechanism. - { - "meta": { - "api-version": "2.0" - }, - "message": "...", - "errors": [ - { - "source": "...", - "message": "..." - } - ] - } +New required mechanisms **MUST NOT** be added +and existing required mechanisms **MUST NOT** be removed +without an update to the :ref:`major version `. -Besides the standard ``meta`` key, this has the following top level keys: +.. _required-file-upload-mechanisms: -``message`` - A singular message that encapsulates all errors that may have happened on this - request. +Required File Upload Mechanisms +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -``errors`` - An array of specific errors, each of which contains a ``source`` key, which is a string that - indicates what the source of the error is, and a ``message`` key for that specific error. +``http-post-bytes`` ++++++++++++++++++++ -The ``message`` and ``source`` strings do not have any specific meaning, and are intended for human -interpretation to aid in diagnosing underlying issue. +A client executes this mechanism by submitting a ``POST`` request to the ``file_url`` +returned in the ``http-post-bytes`` map of the ``mechanism`` map of the +:ref:`File Upload Session creation response body ` like: +.. code-block:: text -Content Types -------------- + Content-Type: application/octet-stream -Like :pep:`691`, this PEP proposes that all requests and responses from this upload API will have a -standard content type that describes what the content is, what version of the API it represents, and -what serialization format has been used. + -This standard request content type applies to all requests *except* for :ref:`file upload requests -` which, since they contain only binary data, is always ``application/octet-stream``. +Servers **MAY** support uploading of digital attestations for files (see :pep:`740`). +This support will be indicated by inclusion of an ``attestations_url`` key in the +``http-post-bytes`` map of the ``mechanism`` map of the +:ref:`File Upload Session creation response body `. +Attestations **MUST** be uploaded to the ``attestations_url`` before +:ref:`File Upload Session completion `. -The structure of the ``Content-Type`` header for all other requests is: +To upload an attestation, a client submits a ``POST`` request to the ``attestations_url`` +containing a JSON array of :pep:`attestation objects <740#attestation-objects>` like: .. code-block:: text - application/vnd.pypi.upload.$version+$format - -Since minor API version differences should never be disruptive, only the major version is included -in the content type; the version number is prefixed with a ``v``. - -Unlike :pep:`691`, this PEP does not change the existing *legacy* ``1.0`` upload API in any way, so -servers are required to host the new API described in this PEP at a different endpoint than the -existing upload API. - -Since JSON is the only defined request format defined in this PEP, all non-file-upload requests -defined in this PEP **MUST** include a ``Content-Type`` header value of: - -- ``application/vnd.pypi.upload.v2+json``. + Content-Type: application/json -As with :pep:`691`, a special "meta" version is supported named ``latest``, the purpose of which is -to allow clients to request the latest version implemented by the server, without having to know -ahead of time what that version is. It is recommended however, that clients be explicit about what -versions they support. + [{"version": 1, "verification_material": {...}, "envelope": {...}},...] -Similar to :pep:`691`, this PEP also standardizes on using server-driven content negotiation to -allow clients to request different versions or serialization formats, which includes the ``format`` -part of the content type. However, since this PEP expects the existing legacy ``1.0`` upload API to -exist at a different endpoint, and this PEP currently only provides for JSON serialization, this -mechanism is not particularly useful. Clients only have a single version and serialization they can -request. However clients **SHOULD** be prepared to handle content negotiation gracefully in the case -that additional formats or versions are added in the future. +.. _server-specific-file-upload-mechanisms: -FAQ -=== +Server Specific File Upload Mechanisms +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Does this mean PyPI is planning to drop support for the existing upload API? ----------------------------------------------------------------------------- +A given server **MAY** implement an arbitrary number of server specific mechanisms +and is responsible for documenting their usage. -At this time PyPI does not have any specific plans to drop support for the existing upload API. +A server specific implementation file upload mechanism identifier has three parts: -Unlike with :pep:`691` there are significant benefits to doing so, so it is likely that support for -the legacy upload API to be (responsibly) deprecated and removed at some point in the future. Such -future deprecation planning is explicitly out of scope for *this* PEP. +.. code-block:: text + -- -Is this Resumable Upload protocol based on anything? ----------------------------------------------------- +Server specific implementations **MUST** use ``vnd`` as their ``prefix``. +The ``operator identifier`` **SHOULD** clearly identify the server operator, +be unique from other well known indexes, +and contain only alphanumeric characters ``[a-z0-9]``. +The ``implementation identifier`` **SHOULD** concisely describe the underlying implementation +and contain only alphanumeric characters ``[a-z0-9]`` and ``-``. -Yes! +For example: -It's actually based on the protocol specified in an `active internet draft `_, where the -authors took what they learned implementing `tus `_ to provide the idea of -resumable uploads in a wholly generic, standards based way. +====================================== ================ ========================================================================= +File Upload Mechanism string Server Operator Mechanism description +====================================== ================ ========================================================================= +``vnd-pypi-s3multipart-presigned`` PyPI S3 multipart upload via pre-signed URL +``vnd-pypi-http-fetch`` PyPI File delivered by instructing server to fetch from a URL via HTTP request +``vnd-acmecorp-http-fetch`` Acme Corp File delivered by instructing server to fetch from a URL via HTTP request +``vnd-acmecorp-postal`` Acme Corp File delivered via postal mail +``vnd-madscience-quantumentanglement`` Mad Science Labs Upload via quantum entanglement +====================================== ================ ========================================================================= -.. _ietf-draft: https://www.ietf.org/archive/id/draft-ietf-httpbis-resumable-upload-05.html +If a server intends to match the behavior of another server's implementation, it **MAY** respond +with that implementation's file upload mechanism name. -This PEP deviates from that spec in several ways, as described in the body of the proposal. This -decision was made for a few reasons: -- The ``104 Upload Resumption Supported`` is the only part of that draft which does not rely - entirely on things that are already supported in the existing standards, since it was adding a new - informational status. +FAQ +=== -- Many clients and web frameworks don't support ``1xx`` informational responses in a very good way, - if at all, adding it would complicate implementation for very little benefit. +Does this mean PyPI is planning to drop support for the existing upload API? +---------------------------------------------------------------------------- -- The purpose of the ``104 Upload Resumption Supported`` support is to allow clients to determine - that an arbitrary endpoint that they're interacting with supports resumable uploads. Since this - PEP is mandating support for that in servers, clients can just assume that the server they are - interacting with supports it, which makes using it unneeded. +At this time PyPI does not have any specific plans to drop support for the existing upload API. -- In theory, if the support for ``1xx`` responses got resolved and the draft gets accepted with it - in, we can add that in at a later date without changing the overall flow of the API. +Unlike with :pep:`691` there are significant benefits to doing so, so it is likely that support for +the legacy upload API to be (responsibly) deprecated and removed at some point in the future. +Such future deprecation planning is explicitly out of scope for *this* PEP. Can I use the upload 2.0 API to reserve a project name? @@ -924,9 +953,11 @@ Can I use the upload 2.0 API to reserve a project name? Yes! If you're not ready to upload files to make a release, you can still reserve a project name (assuming of course that the name doesn't already exist). -To do this, :ref:`create a new session `, then :ref:`publish the session -` without uploading any files. While the ``version`` key is required in the JSON -body of the create session request, you can simply use the placeholder version number ``"0.0.0"``. +To do this, +:ref:`create a new Publishing Session `, +then :ref:`publish the session ` without uploading any files. +While the ``version`` key is required in the JSON body of the create session request, +you can simply use the placeholder version number ``"0.0.0"``. The user that created the session will become the owner of the new project. @@ -945,90 +976,6 @@ However, the ability to preview stages before they're published does complicate this proposal. We could defer this feature for later, although if we do, we should still keep the optional ``nonce`` for token generation, in order to be easily future proof. - -Multipart Uploads vs tus ------------------------- - -This PEP currently bases the actual uploading of files on an `internet draft `_ -(originally designed by `tus.io `__) that supports resumable file uploads. - -That protocol requires a few things: - -- That if clients don't upload the entire file in one shot, that they have to submit the chunks - serially, and in the correct order, with all but the final chunk having a ``Upload-Complete: ?0`` - header. - -- Resumption of an upload is essentially just querying the server to see how much data they've - gotten, then sending the remaining bytes (either as a single request, or in chunks). - -- The upload implicitly is completed when the server successfully gets all of the data from the - client. - -This has the benefit that if a client doesn't care about resuming their download, it can essentially -ignore the protocol. Clients can just ``POST`` the file to the file upload URL, and if it doesn't -succeed, they can just ``POST`` the whole file again. - -The other benefit is that even if clients do want to support resumption, unless they *need* to -resume the download, they can still just ``POST`` the file. - -Another, possibly theoretical benefit is that for hashing the uploaded files, the serial chunks -requirement means that the server can maintain hashing state between requests, update it for each -request, then write that file back to storage. Unfortunately this isn't actually possible to do with -Python's `hashlib `__ standard library module. -There are some libraries third party libraries, such as `Rehash -`__ that do implement the necessary APIs, but they don't -support every hash that ``hashlib`` does (e.g. ``blake2`` or ``sha3`` at the time of writing). - -We might also need to reconstitute the download for processing anyways to do things like extract -metadata, etc from it, which would make it a moot point. - -The downside is that there is no ability to parallelize the upload of a single file because each -chunk has to be submitted serially. - -AWS S3 has a similar API, and most blob stores have copied it either wholesale or something like it -which they call multipart uploading. - -The basic flow for a multipart upload is: - -#. Initiate a multipart upload to get an upload ID. -#. Break your file up into chunks, and upload each one of them individually. -#. Once all chunks have been uploaded, finalize the upload. This is the step where any errors would - occur. - -Such multipart uploads do not directly support resuming an upload, but it allows clients to control -the "blast radius" of failure by adjusting the size of each part they upload, and if any of the -parts fail, they only have to resend those specific parts. The trade-off is that it allows for more -parallelism when uploading a single file, allowing clients to maximize their bandwidth using -multiple threads to send the file data. - -We wouldn't need an explicit step (1), because our session would implicitly initiate a multipart -upload for each file. - -There are downsides to this though: - -- Clients have to do more work on every request to have something resembling resumable uploads. They - would *have* to break the file up into multiple parts rather than just making a single POST - request, and only needing to deal with the complexity if something fails. - -- Clients that don't care about resumption at all still have to deal with the third explicit step, - though they could just upload the file all as a single part. (S3 works around this by having - another API for one shot uploads, but the PEP authors place a high value on having a single API - for uploading any individual file.) - -- Verifying hashes gets somewhat more complicated. AWS implements hashing multipart uploads by - hashing each part, then the overall hash is just a hash of those hashes, not of the content - itself. Since PyPI needs to know the actual hash of the file itself anyway, we would have to - reconstitute the file, read its content, and hash it once it's been fully uploaded, though it - could still use the hash of hashes trick for checksumming the upload itself. - -The PEP authors lean towards ``tus`` style resumable uploads, due to them being simpler to use, -easier to imp;lement, and more consistent, with the main downside being that multi-threaded -performance is theoretically left on the table. - -One other possible benefit of the S3 style multipart uploads is that you don't have to try and do -any sort of protection against parallel uploads, since they're just supported. That alone might -erase most of the server side implementation simplification. - .. rubric:: Footnotes .. [#fn-action] Obsolete ``:action`` values ``submit``, ``submit_pkg_info``, and ``doc_upload`` are @@ -1046,10 +993,6 @@ erase most of the server side implementation simplification. .. [#fn-immutable] Published files may still be yanked (i.e. :pep:`592`) or `deleted `__ as normal. -.. [#fn-location] Or the URL given in the ``Location`` header in the response to the file upload - initiation request, i.e. the metadata upload request; both of these links **MUST** - be the same. - Copyright =========