Skip to content

fix(to-json-schema): translate v.file() schema #1119

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

lukeed
Copy link

@lukeed lukeed commented Mar 28, 2025

A v.file() can/should be translated to the JSON schema:

type: string
format: binary

See https://swagger.io/docs/specification/v3_0/describing-request-body/file-upload/ for OpenAPI support.

I initally thought this was an issue with hono-openapi's translation, but tracked it to here instead.

lukeed added 2 commits March 28, 2025 09:41
A `v.file()` can/should be translated to the JSON schema:

```yml
type: string
format: binary
```

See https://swagger.io/docs/specification/v3_0/describing-request-body/file-upload/ for OpenAPI support.

I initally [thought this was an issue](rhinobase/hono-openapi#66 (comment)) with `hono-openapi`'s translation, but tracked it to here instead.
@Copilot Copilot AI review requested due to automatic review settings March 28, 2025 16:43
Copy link

vercel bot commented Mar 28, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
valibot ✅ Ready (Inspect) Visit Preview 💬 Add feedback Apr 7, 2025 4:07pm

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses the issue of translating a v.file() schema to a JSON schema that complies with OpenAPI by mapping it to a string type with binary format. The key changes include:

  • Adding a case in convertSchema for handling v.file() and mapping it to type 'string' with format 'binary'.
  • Updating the unit tests in convertSchema.test.ts to ensure the new file schema conversion works as expected.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
packages/to-json-schema/src/convertSchema.ts Added new case to translate v.file() to a JSON schema with type string and format binary
packages/to-json-schema/src/convertSchema.test.ts Added a test case to validate the new file schema conversion

Tip: Copilot only keeps its highest confidence comments to reduce noise and keep you focused. Learn more

@fabian-hiller
Copy link
Owner

Thanks for creating this PR! I am not sure if we should convert the file schema to a binary string in JSON Schema because the underlying data type is different. Maybe we should provide a binary action for strings to enable this conversion. Looking forward to your feedback!

const Schema = v.pipe(v.string(), v.binary()); 

@fabian-hiller fabian-hiller self-assigned this Mar 29, 2025
@fabian-hiller fabian-hiller added the enhancement New feature or request label Mar 29, 2025
Copy link

pkg-pr-new bot commented Mar 29, 2025

Open in StackBlitz

npm i https://pkg.pr.new/valibot@1119

commit: b3065df

@lukeed
Copy link
Author

lukeed commented Mar 29, 2025

The OpenAPI spec says that files need to be string/binary. Because that’s what they are. https://swagger.io/docs/specification/v3_0/describing-request-body/file-upload/

The translation is going from one representation to another. The rest of the ecosystem on top of valibot is already using v.file() for formdata validation, which works after req.formData() parsing…but that needs to be recorded and described in OpenAPI format too.

@fabian-hiller
Copy link
Owner

fabian-hiller commented Mar 29, 2025

Would you also convert blob to a binary string and date to a date string? In the past, I have decided that people should use v.pipe(v.string(), v.isoTimestamp()) or one of our other ios... date actions. But I am happy to discuss it and improve it in the long run.

@fabian-hiller
Copy link
Owner

Out of curiosity, what are the disadvantages of defining a binary string as v.pipe(v.string(), v.binary())?

@lukeed
Copy link
Author

lukeed commented Mar 29, 2025

When forms are uploaded, they are encoded as binary strings. Thru HTTP headers, those strings are known to be 1) sliced by a boundary 2) some are indicated as being files 3) of a certain mime type

On the JS server, req.formData() will take all this information and parse it accordingly, turning files into File instances. This is where your v.file() will pass/validate the value(s).

But before getting to the server, OpenAPI clients need to know that whether or not file(s) can be sent. In OpenAPI definitions (thru JSON schema) this is represented as { type: "string", format: "binary" } because, again, this is describing the transport. AFAICT, format: "binary" isnt part of the spec itself, but its been the pattern/consensus.

However, there is now a draft-9 spec change that incorporates the (still optional) contentMediaType and contentEncoding properties to denote what kind of content the uploaded string is.

If I were you, I would do this:

v.file();
//-> {
//->   "type": "string",
//->   "format": "binary",
//-> }

// ImageSchema
v.pipe(
  v.file('Please select an image file.'),
  v.mimeType(['image/jpeg', 'image/png'], 'Please select a JPEG or PNG file.'),
  v.maxSize(1024 * 1024 * 10, 'Please select a file smaller than 10 MB.')
);
//-> {
//->   "oneOf": [{
//->     "type": "string",
//->     "format": "binary",
//->     "contentMediaType": "image/jpeg",
//->     "maxLength": 1024 * 1024 * 10
//->    }, {
//->     "type": "string",
//->     "format": "binary",
//->     "contentMediaType": "image/png",
//->     "maxLength": 1024 * 1024 * 10
//->    }
//-> }

You can't force/assume a contentEncoding until the developer has specified one. AFAICT you dont have an API to capture this information yet, so I didnt include an example for it.


what are the disadvantages of defining a binary string as v.pipe(v.string(), v.binary())

In the example above, once user req.formData()s, the valibot schema will always fail since it's no longer checking that the value instanceof File. It's information loss.

Would you also convert blob to a binary string

Blobs can't be uploaded. It's a JS specific API and isn't HTTP transferrable. At best, JSON schema could use a string/binary/content* combination for this, but there's no API that would automatically convert the uploaded bytes into a Blob. The constructor is lost (and meaningless) over the wire, so there's no reason to try to preserve the conversion. This is not true of File and why it should be different/fixed.

Would you also convert ... date to a date string

This could perhaps be an option, but I wouldn't. Like Blob, there's nothing in JS that would auto-convert the date's string/number representation back into a Date instance. The value would just stay as a string/number until the dev (or some abstraction layer) converted it

@fabian-hiller
Copy link
Owner

Thank you for taking the time to explain it so clearly! I really appreciate it! I don't have a lot of time right now, but I will try to review and merge within the next few days and publish a new version.

@fabian-hiller fabian-hiller added the priority This has priority label Mar 31, 2025
@muningis
Copy link
Contributor

muningis commented Apr 7, 2025

JSON Schema does not have binary format, this is unique to OpenAPI. Perhaps there should be some toggle/flag, to indicate that transformation is for OpenAPI declaration :?

@lukeed
Copy link
Author

lukeed commented Apr 7, 2025

Implementations MAY support custom format attributes.
https://json-schema.org/draft/2020-12/draft-bhutton-json-schema-validation-00#rfc.section.7.2.3

You can use anything in a format. There are common formats that have been added thru general agreement, but the contents of format attribute are not validated by the spec itself.

@fabian-hiller
Copy link
Owner

fabian-hiller commented Apr 8, 2025

Does File has a specific encoding when represented as a string that should be specified via .contentEncoding in JSON Schema?

@lukeed
Copy link
Author

lukeed commented Apr 8, 2025

@fabian-hiller sorry I don’t understand the question

@fabian-hiller
Copy link
Owner

Sorry, my bad. I fixed my previous comment.

@lukeed
Copy link
Author

lukeed commented Apr 9, 2025

Ah. No, you can’t add any more than what developer specifies. I’ll point to the above example snippet again. On its own, you can only translate v.file() to string/format=binary

@fabian-hiller
Copy link
Owner

I am still not sure what to do. Both Zod v4 and ArkType also throw an error when converting a File schema to JSON Schema, and it feels wrong to switch the type when converting to JSON Schema just because of OpenAPI and how files are transported over HTTP. But I understand your arguments and I don't want people to be blocked by the current behavior if this is a common case.

@fabian-hiller
Copy link
Owner

I just added 3 new configs with overrideSchema, overrideAction and overrideRef. overrideSchema can be used to change the output for v.file() in the meantime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request priority This has priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants