Skip to content

ref splitting code could perform more complete validation of content type #162

@halfline

Description

@halfline

Recently it was discovered that colons didn't work in the "filename" part of a file ref, because it would be treated as delimiter for the content type. This caused:

╎❯ omlmd push http://quay.io/rhatdan/tinyllama:latest '/home/dwalsh/.local/share/ramalama/repos/ollama/blobs/sha256:2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816' --empty-metadata

to fail, since the filename has a colon between the hash type and hash in it.

That issue was addressed by #161 which added some I/O (a file existence test) to decide whether or not a colon in the ref "belongs" to the filename or as the ref delimiter. That's the backstory, but during discussion it also came to light that the media type itself isn't validated. This lack of validation had little bearing on the original problem, and so it was decided it was out of scope for the original issue, but it may still merit further investigation. That's why I'm filing this issue (at the request of @vsoch and @tarilabs)

Specifically there are a few areas that could be looked into:

  1. Right now the media type after the colon is not validated, but the OCI spec requires a strictly defined format here
  2. There is some question of whether or not the media type should be allowed to be fully qualified or if only media types of the form maintype/subtype should be supported, not types with parameters (like say text/plain; charset=iso8859-1)
  3. There's some scratch code to do some initial validation and offer better "colon finding" heuristics here (informed by that validation). It's not ready to go as-is. it's untested, and the answer to 2 decides which direction the code needs to go. If parameters shouldn't be supported, the regex's need to be adapted to drop that. If parameters should be supported, ideally there would be some checks added to error out if there are duplicate parameters. It might make sense to just add a dependency on the rfc3986 module or similar
  4. It might also make sense to move the guts of split_path_and_content to the PathAndOptionalContent constructor. It's really just serving as a static constructor in practice.

Anyway, none of this is critical, and it's not something i'm personally going to spend more time on, so we could just close this ticket out if it's not a priority, but I'm filing the issue anyway because I was asked to.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions