diff --git a/peps/pep-0694.rst b/peps/pep-0694.rst index db2ef5fdb24..5f9fc36248d 100644 --- a/peps/pep-0694.rst +++ b/peps/pep-0694.rst @@ -24,7 +24,7 @@ with standardization, the upload API provides additional useful features such as * artifacts which can be overwritten and replaced, until a session is published; -* asynchronous and "chunked", resumable file uploads, for more efficient use of network bandwidth; +* flexible file upload mechanisms for index operators; * detailed status on the state of artifact uploads; @@ -49,10 +49,10 @@ In addition, there are a number of major issues with the legacy API: * It is fully synchronous, which forces requests to be held open both for the upload itself, and while the index processes the uploaded file to determine success or failure. -* It does not support any mechanism for resuming an upload. With the largest default file size on - PyPI being around 1GB in size, requiring the entire upload to complete successfully means - bandwidth is wasted when such uploads experience a network interruption while the request is in - progress. +* It does not support any mechanism for parallelizing or resuming an upload. With the largest + default file size on PyPI being around 1GB in size, requiring the entire upload to complete + successfully means bandwidth is wasted when such uploads experience a network interruption while + the request is in progress. * The atomic unit of operation is a single file. This is problematic when a release logically includes an sdist and multiple binary wheels, leading to race conditions where consumers get @@ -78,7 +78,7 @@ In addition, there are a number of major issues with the legacy API: to claim a project namespace. The new upload API proposed in this PEP solves all of these problems, providing for a much more -flexible, bandwidth friendly approach, with better error reporting, a better release testing +flexible approach, with better error reporting, a better release testing experience, and atomic and simultaneous publishing of all release artifacts. @@ -139,11 +139,6 @@ methods. Upload 2.0 API Specification ============================ -This PEP draws inspiration from the `Resumable Uploads for HTTP `_ internet draft, -however there are significant differences. This is largely due to the unique nature of Python -package releases (i.e. metadata, multiple related artifacts, etc.), and the support for an upload -session and release stages. Where it makes sense to adopt details of the draft, this PEP does so. - This PEP traces the root cause of most of the issues with the existing API to be roughly two things: - The metadata is submitted alongside the file, rather than being parsed from the @@ -156,7 +151,9 @@ To address these issues, this PEP proposes a multi-request workflow, which at a these steps: #. Initiate an upload session, creating a release stage. -#. Upload the file(s) to that stage as part of the upload session. +#. Initiate file-upload session(s) to that stage as part of the upload session. +#. Execute file upload mechanism for the file-upload session(s). +#. Complete the file-upload session(s), marking them as executed or canceled. #. Complete the upload session, publishing or discarding the stage. #. Optionally check the status of an upload session. @@ -253,6 +250,7 @@ The successful response includes the following JSON content: "upload": "...", "session": "...", }, + "mechanisms": ["http-post-application-octet-stream"], "session-token": "", "valid-for": 604800, "status": "pending", @@ -270,6 +268,9 @@ the following keys: A dictionary mapping :ref:`keys to URLs ` related to this session, the details of which are provided below. +``mechanisms`` + A list of file-upload mechanisms supported by the server. + ``session-token`` If the index supports :ref:`previewing staged releases `, this key will contain the unique :ref:`"session token" ` that can be provided to installers in order to @@ -308,8 +309,8 @@ Session Links For the ``links`` key in the success JSON, the following sub-keys are valid: ``upload`` - The endpoint session clients will use to initiate :ref:`uploads ` for each file to - be included in this session. + The endpoint session clients will use to initiate a :ref:`file-upload session ` + for each file to be included in this session. ``stage`` The endpoint where this staged release can be :ref:`previewed ` prior to @@ -332,13 +333,9 @@ The ``files`` key contains a mapping from the names of the files uploaded in thi sub-mapping with the following keys: ``status`` - A string with valid values ``partial``, ``pending``, ``complete``, and ``error``. If a file - upload has not seen an ``Upload-Complete: ?1`` header, then ``partial`` will be returned. If - ``Upload-Complete: ?1`` resulted in a ``202 Accepted``, then ``pending`` will be returned until - asynchronous processing of the last chunk and the full file has been completed. If a ``201 - Created`` was returned, or the last chunk processing is finished, ``complete`` will be returned. + A string with valid values ``pending``, ``processing``, ``complete``, ``error``, and ``canceled``. If there was an error during upload, then clients should not assume the file is in any usable - state, ``error`` will be returned and it's best to :ref:`cancel or delete ` + state, ``error`` will be returned and it's best to :ref:`cancel or delete ` the file and start over. This action would remove the file name from the ``files`` key of the :ref:`session status response body `. @@ -379,7 +376,8 @@ request body has the following JSON format: "filename": "foo-1.0.tar.gz", "size": 1000, "hashes": {"sha256": "...", "blake2b": "..."}, - "metadata": "..." + "metadata": "...", + "mechanism": "http-post-application-octet-stream" } @@ -402,6 +400,13 @@ Besides the standard ``meta`` key, the request JSON has the following additional Multiple hashes may be passed at a time, but all hashes provided **MUST** be valid for the file. +``mechanism`` (**required**) + The file-upload mechanisms the client intends to use for this file. + This mechanism **SHOULD** be chosen from the list of mechanisms advertised in the `session response body + `_. + A client **MAY** send a mechanism that is not advertised in cases where server operators have + documented a new or up-coming mechanism that is available for use on a "pre-release" basis. + ``metadata`` (**optional**) If given, this is a string value containing the file's `core metadata `_. @@ -415,195 +420,108 @@ the file to be uploaded. These checks may include, but are not limited to: - checking if the contents of the ``metadata``, if provided, are valid. -If the server determines that upload should proceed, it will return a ``201 Created`` response, with -an empty body, and a ``Location`` header pointing to the URL that the file content should be -uploaded to. The :ref:`status ` of the session will also include the filename in -the ``files`` mapping, with the above ``Location`` URL included in under the ``link`` sub-key. -If the server determines the upload cannot proceed, it **MUST** return a ``409 Conflict``. The -server **MAY** allow parallel uploads of files, but is not required to. - - -.. IMPORTANT:: - - The `IETF draft `_ calls this the URL of the `upload resource - `_, and this PEP uses that nomenclature as well. - -.. _ietf-upload-resource: https://www.ietf.org/archive/id/draft-ietf-httpbis-resumable-upload-05.html#name-upload-creation-2 - - -.. _upload-contents: - -Upload File Contents -++++++++++++++++++++ - -The actual file contents are uploaded by issuing a ``POST`` request to the upload resource URL -[#fn-location]_. The client may either upload the entire file in a single request, or it may opt -for "chunked" upload where the file contents are split into multiple requests, as described below. - -.. IMPORTANT:: - - The protocol defined in this PEP differs from the `IETF draft `_ in a few ways: - - * For chunked uploads, the `second and subsequent chunks `_ are uploaded - using a ``POST`` request instead of ``PATCH`` requests. Similarly, this PEP uses - ``application/octet-stream`` for the ``Content-Type`` headers for all chunks. - - * No ``Upload-Draft-Interop-Version`` header is required. - - * Some of the server responses are different. - -.. _ietf-upload-append: https://www.ietf.org/archive/id/draft-ietf-httpbis-resumable-upload-05.html#name-upload-append-2 - - -When uploading the entire file in a single request, the request **MUST** include the following -headers (e.g. for a 100,000 byte file): - -.. code-block:: email - - Content-Length: 100000 - Content-Type: application/octet-stream - Upload-Length: 100000 - Upload-Complete: ?1 - -The body of this request contains all 100,000 bytes of the unencoded raw binary data. - -``Content-Length`` - The number of file bytes contained in the body of *this* request. - -``Content-Type`` - **MUST** be ``application/octet-stream``. - -``Upload-Length`` - Indicates the total number of bytes that will be uploaded for this file. For single-request - uploads this will always be equal to ``Content-Length``, but these values will likely differ for - chunked uploads. This value **MUST** equal the number of bytes given in the ``size`` field of - the file upload initiation request. +If the server determines that upload should proceed, it will return a ``202 Accepted`` response, with +the response body below. The :ref:`status ` of the session will also include the filename in the ``files`` mapping. If the server determines the upload cannot proceed, it **MUST** return +a ``409 Conflict``. The server **MAY** allow parallel uploads of files, but is not required to. +If the server cannot proceed with an upload because the ``mechanism`` supplied by the client is not supported +it **MUST** return a ``422 Unprocessable Entity``. -``Upload-Complete`` - A flag indicating whether more chunks are coming for this file. For single-request uploads, the - value of this header **MUST** be ``?1``. +.. _file-upload-session-response: -If the upload completes successfully, the server **MUST** respond with a ``201 Created`` status. -The response body has no content. +File Upload Session Response Body ++++++++++++++++++++++++++++++++++ -If this single-request upload fails, the entire file must be resent in another single HTTP request. -This is the recommended, preferred format for file uploads since fewer requests are required. - -As an example, if the client was to upload a 100,000 byte file, the headers would look like: - -.. code-block:: email - - Content-Length: 100000 - Content-Type: application/octet-stream - Upload-Length: 100000 - Upload-Complete: ?1 - -Clients can opt to upload the file in multiple chunks. Because the upload resource URL provided in -the metadata response will be unique per file, clients **MUST** use the given upload resource URL -for all chunks. Clients upload file chunks by sending multiple ``POST`` requests to this URL, with -one request per chunk. - -For chunked uploads, the ``Content-Length`` is equal to the size in bytes of the chunk that is -currently being sent. The client **MUST** include a ``Upload-Offset`` header which indicates the -byte offset that the content included in this chunk's request starts at, and an ``Upload-Complete`` -header with the value ``?0``. For the first chunk, the ``Upload-Offset`` header **MUST** be set to -``0``. As with single-request uploads, the ``Content-Type`` header is ``application/octet-stream`` -and the body is the raw, unencoded bytes of the chunk. - -For example, if uploading a 100,000 byte file in 1000 byte chunks, the first chunk's request headers -would be: - -.. code-block:: email - - Content-Length: 1000 - Content-Type: application/octet-stream - Upload-Offset: 0 - Upload-Length: 100000 - Upload-Complete: ?0 - -For the second chunk representing bytes 1000 through 1999, include the following headers: +The successful response includes the following JSON content: -.. code-block:: email +.. code-block:: json - Content-Length: 1000 - Content-Type: application/octet-stream - Upload-Offset: 1000 - Upload-Length: 100000 - Upload-Complete: ?0 + { + "meta": { + "api-version": "2.0" + }, + "links": { + "session": "...", + "file-upload-session": "..." + }, + "status": "pending", + "valid-for": 3600, + "mechanism": { + "http-post-application-octet-stream": { + "url": "..." + } + } + } -These requests would continue sequentially until the last chunk is ready to be uploaded. -For each successful chunk, the server **MUST** respond with a ``202 Accepted`` header, except for -the final chunk, which **MUST** be either: +Besides the ``meta`` key, which has the same format as the request JSON, the success response has +the following keys: -* ``201 Created`` if the server accepts and processes the last chunk synchronously, completing the - file upload. -* ``202 Accepted`` if the server accepts the last chunk, but must process it asynchronously. In - this case, the client should query the :ref:`session status ` periodically until - the uploaded :ref:`file status ` transitions to ``complete``. +``links`` + A dictionary mapping :ref:`keys to URLs ` related to this session, + the details of which are provided below. -The final chunk of data **MUST** include the ``Upload-Complete: ?1`` header, since at that point the -entire file has been uploaded. +``mechanism`` + A mapping containing the supported mechanism identifier negotiated by the client and server, + to a mapping containing details necessary to execute the mechanism. -With both chunked and non-chunked uploads, once completed successfully, the file **MUST NOT** be -publicly visible in the repository, but merely staged until the upload session is :ref:`completed -`. If the server supports :ref:`previews `, the file **MUST** be -visible at the ``stage`` :ref:`URL `. Partially uploaded chunked files **SHOULD -NOT** be visible at the ``stage`` URL. +.. _file-upload-session-links: -The following constraints are placed on uploads regardless of whether they are single chunk or -multiple chunks: +Session Links ++++++++++++++ -- A client **MUST NOT** perform multiple ``POST`` requests in parallel for the same file to avoid - race conditions and data loss or corruption. +For the ``links`` key in the success JSON, the following sub-keys are valid: -- If the offset provided in ``Upload-Offset`` is not ``0`` and does not correctly specify the byte - offset of the next chunk in an incomplete upload, then the server **MUST** respond with a ``409 - Conflict``. This means that a client **MUST NOT** upload chunks out of order. +``session`` + The endpoint where actions for the parent session can be performed. -- Once a file upload has completed successfully, you may initiate another upload for that file, - which **once completed**, will replace that file. This is possible until the entire session is - completed, at which point no further file uploads (either creating or replacing a session file) - are accepted. I.e. once a session is published, the files included in that release are immutable - [#fn-immutable]_. +``file-upload-session`` + The endpoint where actions for this file-upload-session can be performed. + including :ref:`canceling and discarding the file upload session `, + :ref:`querying the current file upload session status `, + and :ref:`requesting an extension of the file upload session lifetime ` + (*if* the server supports it). +.. _file-upload-session-completion: -Resume an Upload -++++++++++++++++ +File Upload Session Completion +++++++++++++++++++++++++++++++ -To resume an upload, you first have to know how much of the file's contents the server has already -received. If this is not already known, a client can make a ``HEAD`` request to the upload resource -URL. +To complete a file upload session, which indicates that the file upload mechanism has been executed +and did not produce an error, a client issues a ``POST`` to the ``file-upload-session`` link in the +file upload session creation response body. -The server **MUST** respond with a ``204 No Content`` response, with an ``Upload-Offset`` header -that indicates what offset the client should continue uploading from. If the server has not received -any data, then this would be ``0``, if it has received 1007 bytes then it would be ``1007``. For -this example, the full response headers would look like: +The JSON body of this requests looks like: -.. code-block:: email +.. code-block:: json - Upload-Offset: 1007 - Upload-Complete: ?0 - Cache-Control: no-store + { + "meta": { + "api-version": "2.0" + }, + "action": "complete", + } -Once the client has retrieved the offset that they need to start from, they can upload the rest of -the file as described above, either in a single request containing all of the remaining bytes, or in -multiple chunks as per the above protocol. +After receiving this requests the server **MAY** perform additional asynchronous processing on the file, +for instance to verify its hashes or contents. +If the processing is required to complete before an upload session can be published, +the status of the file upload session can be set to ``processing`` until such processing is complete, +reaches an error state, or the file upload session is canceled. -.. _cancel-an-upload: +.. _file-upload-session-cancelation: Canceling and Deleting File Uploads +++++++++++++++++++++++++++++++++++ -A client can cancel an in-progress upload for a file, or delete a file that has been completely -uploaded. In both cases, the client performs this by issuing a ``DELETE`` request to the upload -resource URL of the file they want to delete. +A client can cancel an in-progress upload session for a file, or delete a file that has been +completely uploaded. In both cases, the client performs this by issuing a ``DELETE`` request to +the file upload session URL of the file they want to delete. A successful deletion request **MUST** response with a ``204 No Content``. -Once canceled or deleted, a client **MUST NOT** assume that the previous upload resource URL can be reused. +Once canceled or deleted, a client **MUST NOT** assume that the previous file upload session resource +or associated file upload mechanisms can be reused. Replacing a Partially or Fully Uploaded File @@ -613,7 +531,7 @@ To replace a session file, the file upload **MUST** have been previously complet deleted. It is not possible to replace a file if the upload for that file is in-progress. To replace a session file, clients should :ref:`cancel and delete the in-progress upload -` by issuing a ``DELETE`` to the upload resource URL for the file they want to +` by issuing a ``DELETE`` to the upload resource URL for the file they want to replace. After this, the new file upload can be initiated by beginning the entire :ref:`file upload ` sequence over again. This means providing the metadata request again to retrieve a new upload resource URL. Client **MUST NOT** assume that the previous upload resource URL can be @@ -625,13 +543,16 @@ reused after deletion. Session Status ~~~~~~~~~~~~~~ -At any time, a client can query the status of the session by issuing a ``GET`` request to the -``session`` :ref:`link ` given in the :ref:`session creation response body -`. +At any time, a client can query the status of a session by issuing a ``GET`` request to the +``session`` :ref:`link ` or ``file-upload-session`` :ref:`link ` +given in the :ref:`session creation response body ` +or :ref:`file upload session creation response body `, +respectively. -The server will respond to this ``GET`` request with the same :ref:`response ` -that they got when they initially created the upload session, except with any changes to ``status``, -``valid-for``, or ``files`` reflected. +The server will respond to this ``GET`` request with the same :ref:`session response ` +or :ref:`file upload session creation response body `, +that they got when they initially created the upload session or file upload session, +except with any changes to ``status``, ``valid-for``, or ``files`` reflected. .. _session-extension: @@ -641,8 +562,9 @@ Session Extension Servers **MAY** allow clients to extend sessions, but the overall lifetime and number of extensions allowed is left to the server. To extend a session, a client issues a ``POST`` request to the -``session`` :ref:`link ` given in the :ref:`session creation response body -`. +``session`` :ref:`link ` or ``file-upload-session`` :ref:`link ` +given in the :ref:`session creation response body ` +or :ref:`file upload session creation response body `, respectively. The JSON body of this request looks like: @@ -652,7 +574,7 @@ The JSON body of this request looks like: "meta": { "api-version": "2.0" }, - ":action": "extend", + "action": "extend", "extend-for": 3600 } @@ -702,22 +624,18 @@ The JSON body of this request looks like: "meta": { "api-version": "2.0" }, - ":action": "publish", + "action": "publish", } -If the server is able to immediately complete the session, it may do so and return a ``201 Created`` -response. If it is unable to immediately complete the session (for instance, if it needs to do -processing that may take longer than reasonable in a single HTTP request), then it may return a -``202 Accepted`` response. - -In either case, the server should include a ``Location`` header pointing back to the session status -URL, and if the server returned a ``202 Accepted``, the client may poll that URL to watch for the -status to change. +If the server is able to immediately complete the file upload session, it may do so and return a +``201 Created`` response. If it is unable to immediately complete the file upload session +(for instance, if it needs to do validation that may take longer than reasonable in a single HTTP +request), then it may return a ``202 Accepted`` response. -If a session is published that has no staged files, the operation is effectively a no-op, except -where a new project name is being reserved. In this case, the new project is created, reserved, and -owned by the user that created the session. +In either case, the server should include a ``Location`` header pointing back to the file upload +session status URL, and if the server returned a ``202 Accepted``, the client may poll that URL to +watch for the status to change. If an error occurs, the appropriate ``4xx`` code should be returned, as described in the :ref:`session-errors` section. @@ -781,15 +699,6 @@ URL can be passed to installers such as ``pip`` by setting the `--extra-index-ur `__ flag to this value. Multiple stages can even be previewed by repeating this flag with multiple values. -In the future, it may be valuable to include something like a ``Stage-Token`` header to the `Simple -Repository API `_ -requests or the :pep:`691` JSON-based Simple API, with the value from the ``session-token`` sub-key -of the JSON response to the session creation request. Multiple ``Stage-Token`` headers could be -allowed, and installers could support enabling stage previews by adding a ``--staged `` or -similarly named option to set the ``Stage-Token`` header at the command line. This feature is not -currently support, nor proposed by this PEP, though it could be proposed by a separate PEP in the -future. - In either case, the index will return views that expose the staged releases to the installer tool, making them available to download and install into virtual environments built for that last-mile testing. The former option allows for existing installers to preview staged releases with no @@ -832,6 +741,24 @@ Besides the standard ``meta`` key, this has the following top level keys: The ``message`` and ``source`` strings do not have any specific meaning, and are intended for human interpretation to aid in diagnosing underlying issue. +File Upload Mechanisms +---------------------- + +File Upload Mechanisms, with the exception of ``http-post-application-octet-stream`` are left as an +implementation detail specific to each server. Servers **MUST** implement a +``http-post-application-octet-stream`` mechanism as a fallback if no server specific implementations +exist. + +A given server may implement an arbitrary number of mechanisms and is responsible for documenting +their usage. Implemenatations **SHOULD** be prefixed with a string that clearly identifies the +server and is unique from other well known servers or implementations. + +If a server intendes to match the behavior of another server's implementation, it **MAY** respond +with that implementation's file upload mechanism name. + +All implementations of this PEP **MUST** implement the ``http-post-application-octet-stream`` file +upload mechanism. + Content Types ------------- @@ -840,8 +767,8 @@ Like :pep:`691`, this PEP proposes that all requests and responses from this upl standard content type that describes what the content is, what version of the API it represents, and what serialization format has been used. -This standard request content type applies to all requests *except* for :ref:`file upload requests -` which, since they contain only binary data, is always ``application/octet-stream``. +This standard request content type applies to all requests *except* for requests to execute +a File Upload Mechanism, which will be specified by the documentation for that mechanism. The structure of the ``Content-Type`` header for all other requests is: @@ -888,36 +815,6 @@ the legacy upload API to be (responsibly) deprecated and removed at some point i future deprecation planning is explicitly out of scope for *this* PEP. -Is this Resumable Upload protocol based on anything? ----------------------------------------------------- - -Yes! - -It's actually based on the protocol specified in an `active internet draft `_, where the -authors took what they learned implementing `tus `_ to provide the idea of -resumable uploads in a wholly generic, standards based way. - -.. _ietf-draft: https://www.ietf.org/archive/id/draft-ietf-httpbis-resumable-upload-05.html - -This PEP deviates from that spec in several ways, as described in the body of the proposal. This -decision was made for a few reasons: - -- The ``104 Upload Resumption Supported`` is the only part of that draft which does not rely - entirely on things that are already supported in the existing standards, since it was adding a new - informational status. - -- Many clients and web frameworks don't support ``1xx`` informational responses in a very good way, - if at all, adding it would complicate implementation for very little benefit. - -- The purpose of the ``104 Upload Resumption Supported`` support is to allow clients to determine - that an arbitrary endpoint that they're interacting with supports resumable uploads. Since this - PEP is mandating support for that in servers, clients can just assume that the server they are - interacting with supports it, which makes using it unneeded. - -- In theory, if the support for ``1xx`` responses got resolved and the draft gets accepted with it - in, we can add that in at a later date without changing the overall flow of the API. - - Can I use the upload 2.0 API to reserve a project name? ------------------------------------------------------- @@ -945,90 +842,6 @@ However, the ability to preview stages before they're published does complicate this proposal. We could defer this feature for later, although if we do, we should still keep the optional ``nonce`` for token generation, in order to be easily future proof. - -Multipart Uploads vs tus ------------------------- - -This PEP currently bases the actual uploading of files on an `internet draft `_ -(originally designed by `tus.io `__) that supports resumable file uploads. - -That protocol requires a few things: - -- That if clients don't upload the entire file in one shot, that they have to submit the chunks - serially, and in the correct order, with all but the final chunk having a ``Upload-Complete: ?0`` - header. - -- Resumption of an upload is essentially just querying the server to see how much data they've - gotten, then sending the remaining bytes (either as a single request, or in chunks). - -- The upload implicitly is completed when the server successfully gets all of the data from the - client. - -This has the benefit that if a client doesn't care about resuming their download, it can essentially -ignore the protocol. Clients can just ``POST`` the file to the file upload URL, and if it doesn't -succeed, they can just ``POST`` the whole file again. - -The other benefit is that even if clients do want to support resumption, unless they *need* to -resume the download, they can still just ``POST`` the file. - -Another, possibly theoretical benefit is that for hashing the uploaded files, the serial chunks -requirement means that the server can maintain hashing state between requests, update it for each -request, then write that file back to storage. Unfortunately this isn't actually possible to do with -Python's `hashlib `__ standard library module. -There are some libraries third party libraries, such as `Rehash -`__ that do implement the necessary APIs, but they don't -support every hash that ``hashlib`` does (e.g. ``blake2`` or ``sha3`` at the time of writing). - -We might also need to reconstitute the download for processing anyways to do things like extract -metadata, etc from it, which would make it a moot point. - -The downside is that there is no ability to parallelize the upload of a single file because each -chunk has to be submitted serially. - -AWS S3 has a similar API, and most blob stores have copied it either wholesale or something like it -which they call multipart uploading. - -The basic flow for a multipart upload is: - -#. Initiate a multipart upload to get an upload ID. -#. Break your file up into chunks, and upload each one of them individually. -#. Once all chunks have been uploaded, finalize the upload. This is the step where any errors would - occur. - -Such multipart uploads do not directly support resuming an upload, but it allows clients to control -the "blast radius" of failure by adjusting the size of each part they upload, and if any of the -parts fail, they only have to resend those specific parts. The trade-off is that it allows for more -parallelism when uploading a single file, allowing clients to maximize their bandwidth using -multiple threads to send the file data. - -We wouldn't need an explicit step (1), because our session would implicitly initiate a multipart -upload for each file. - -There are downsides to this though: - -- Clients have to do more work on every request to have something resembling resumable uploads. They - would *have* to break the file up into multiple parts rather than just making a single POST - request, and only needing to deal with the complexity if something fails. - -- Clients that don't care about resumption at all still have to deal with the third explicit step, - though they could just upload the file all as a single part. (S3 works around this by having - another API for one shot uploads, but the PEP authors place a high value on having a single API - for uploading any individual file.) - -- Verifying hashes gets somewhat more complicated. AWS implements hashing multipart uploads by - hashing each part, then the overall hash is just a hash of those hashes, not of the content - itself. Since PyPI needs to know the actual hash of the file itself anyway, we would have to - reconstitute the file, read its content, and hash it once it's been fully uploaded, though it - could still use the hash of hashes trick for checksumming the upload itself. - -The PEP authors lean towards ``tus`` style resumable uploads, due to them being simpler to use, -easier to imp;lement, and more consistent, with the main downside being that multi-threaded -performance is theoretically left on the table. - -One other possible benefit of the S3 style multipart uploads is that you don't have to try and do -any sort of protection against parallel uploads, since they're just supported. That alone might -erase most of the server side implementation simplification. - .. rubric:: Footnotes .. [#fn-action] Obsolete ``:action`` values ``submit``, ``submit_pkg_info``, and ``doc_upload`` are @@ -1046,10 +859,6 @@ erase most of the server side implementation simplification. .. [#fn-immutable] Published files may still be yanked (i.e. :pep:`592`) or `deleted `__ as normal. -.. [#fn-location] Or the URL given in the ``Location`` header in the response to the file upload - initiation request, i.e. the metadata upload request; both of these links **MUST** - be the same. - Copyright =========