Skip to content

fix: special case jinja #1612

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open

Conversation

wolfv
Copy link
Member

@wolfv wolfv commented May 2, 2025

There is a confusing behavior about types with rattler-build and minijinja.

When we have something like:

context:
  variable: ${{ "1234" }}

Then in the currently released version of rattler-build, 1234 will become an integer because the Jinja evaluates to 1234 which will be interpreted by YAML as a integer.

However, we can fix this by special casing Jinja expressions (starting with ${{ and ending with }}). We can fix this by evaluating the Jinja as an expression and look at the Jinja type that comes out.

If it is a string, we can then quote the string and forbid YAML to coerce this type, so that a string stays a string.

This will also fix this situation:

context:
  env_var: ${{ env.get("INTEGER") }}  # <- now this will always be a string, if not explicitly casted

Where we may run into trouble is with something like:

context:
  joined: ${{ "1" }}234

This would be evaluated as integer 1234 with the current code.

However, we could also decide to allow booleans or numbers to only come from simple Jinja expressions and not proper "templates" (ie. always cast anything more complex than a single expression to string).

@wolfv
Copy link
Member Author

wolfv commented May 3, 2025

With this we could also revisit #971 and #1423

@wolfv
Copy link
Member Author

wolfv commented May 3, 2025

We can change this further to allow Jinja expressions to return list and map objects to YAML, and then ingest them as lists or YAMLs from Jinja.

We could then also make the following use case work, where a user expands a list into YAML:

context:
  mylist: [a,b,c]

requirements:
  run: 
    - ${{ mylist }}  # should expand to a SequenceNode and not a ScalarNode

@zelosleone
Copy link
Collaborator

Using this recipe:

context:
  cuda_version: ${{ env.get("RAPIDS_CUDA_VERSION") }}
  cuda_major: ${{ (env.get("RAPIDS_CUDA_VERSION") | split("."))[0] }}

package:
  name: test-package
  version: 1.0

build:
  script:
    - if: cuda_major == "12"
      then:
        - echo "cuda_major is a STRING and is 12"
      else:
        - echo "cuda_major is NOT a STRING or is not 12 (value was ${{ cuda_major }})" 

we still cannot get the 12 as string, i am still working on it to check if we can make a solution

$env:RAPIDS_CUDA_VERSION="12.8.0"; cargo run -- build --recipe recipe.yaml

on powershell

@wolfv
Copy link
Member Author

wolfv commented May 7, 2025

Indeed, it seems to go wrong somewhere in the code.

context:
  cuda_version: ${{ env.get("RAPIDS_CUDA_VERSION") }}
  cuda_major: ${{ (env.get("RAPIDS_CUDA_VERSION") | split("."))[0] }}
  cuda_major_is_string: ${{ (env.get("RAPIDS_CUDA_VERSION") | split("."))[0] is string }}

Running cargo r -- build --recipe ./examples/ajinja --render-only

Gives us:

[
  {
    "recipe": {
      "schema_version": 1,
      "context": {
        "cuda_version": "12.8.0",
        "cuda_major": 12,
        "cuda_major_is_string": true
      },
      "package": {
        "name": "test-package",
        ...

Copy link

@beckermr beckermr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a strong -1 on this. Jinja is fundamentally a string templating engine. It is being used to build yaml by manipulating strings. So if you want the example in this post to come out as a string, you need to include quotes in the yaml itself via

"${{ '1234' }}"

Special casing jinja2 to behave in odd ways that force types to cross into the yaml is going to lead to odd breakages/cases, and breaks the mental model of what jinja2 is doing.

@zelosleone
Copy link
Collaborator

zelosleone commented May 7, 2025

@wolfv There is a huge issue that is not producable in WSL and windows, but it is in all github runners, which is mainly about the unit tests (pkg_hash is the most serious one) I tried to solve it, change the formatters for hash, change the calculation logic and none of them passes. It also doesn't recognize

${{ PYTHON }} --help

now, which is another test fails now. Tried to fix that as well but the only solution i could find for 5 hours :D was making it

- python

@wolfv wolfv force-pushed the special-case-jinja branch from 127b104 to 5a4f518 Compare May 7, 2025 20:30
@wolfv
Copy link
Member Author

wolfv commented May 7, 2025

@beckermr the problem that we are fixing is the following:

foobar: ${{ "1234" }} <- would evaluate to an integer

This is quite surprising I find. Having to wrap Jinja expressions in quotes to force a string is more cumbersome IMO. The code in this PR works quite well and is less surprising when dealing with env vars and similar things.

@beckermr
Copy link

beckermr commented May 7, 2025

Yes. Jinja always does string substitutions. The types of items do not cross the boundary of the ${{ }} expression.

The issue with having the types cross is that for people who use jinja properly as a string templating, it becomes ambiguous and confusing.

This case is now really hard to predict:

var: "${{ "1234" }}"

Does the types cross the boundary of the jinja quote the internal quotes this pr implicitly adds? Does it ignore them? Does it turn them to single quotes? Does it render a nonsense yaml value? Why is one of these obviously the answer when the others are not?

This PR actually makes the interaction of the jinja templating with the yaml LESS clear.

The rule for jinja2 is simple:

"jinja produces strings that are inserted into the yaml."

Anything else is more complicated and surprising as you work through the edge cases.

@wolfv
Copy link
Member Author

wolfv commented May 7, 2025

For this case:

var: "${{ "1234" }}"

The (outer) quotes will always win and force whatever is inside to be a string.

The more ambigous case (but not really, due to outer quotes) would be:

var: "${{ 1234 }}"

where the inner type is an integer but the quotes cast it to a string.

@beckermr
Copy link

beckermr commented May 7, 2025

Yep. Those examples exactly illustrate my point.

I have written out explicitly how the jinja+yaml works for four cases below.

a

var: ${{ 1234 }}

renders using the following steps

  1. jinja parses+evals the text inside ${{ }} (it results in an int 1234)

  2. jinja renders the final value to a string "1234"

  3. jinja inserts the value of the string into the YAML
    This yields

    var: 1234
    
  4. YAML parses var: 1234 to a dict with a key var and a value 1234 as an int

b

var: "${{ "1234" }}"

renders using these steps

  1. jinja parses+evals the text inside ${{ }} (it results in a string "1234")

  2. jinja renders the final value to a string (since step 1 yielded a string already, this step is a no-op)

  3. jinja inserts the value of the string into the YAML
    This yields

    var: "1234"
    
  4. YAML parses var: "1234" to a dict with a key var and a value "1234" as a string

c

var: "${{ 1234 }}"

renders using these steps

  1. jinja parses+evals the text inside ${{ }} (it results in an int 1234)

  2. jinja renders the final value to a string "1234"

  3. jinja inserts the value of the string into the YAML
    This yields

    var: "1234"
    
  4. YAML parses var: "1234" to a dict with a key var and a value "1234" as a string

d

var: ${{ "1234" }}

renders using the following steps

  1. jinja parses+evals the text inside ${{ }} (it results in a string "1234")

  2. jinja renders the final value to a string (since step 1 yielded a string already, this step is a no-op)

  3. jinja inserts the value of the string into the YAML
    This yields

    var: 1234
    
  4. YAML parses var: 1234 to a dict with a key var and a value 1234 as an int

This PR would change the behavior of case d to insert a quoted string instead of the value of the string (so step 3 changes) based on the type of the underlying jinja2 expression (computed in case 1).

However, if that rule is applied consistently, then case b should yeild var: ""1234"" with the extra set of quotes since in case b the type of the jinja2 expression ends up as a string in step 1.

The only way to tell b should be different than d in an automated way is to figure out what type YAML would parse and then account for that. That extra logic where the jinja2 parser has to reach out beyond itself to figure out why type to return breaks the clean separation between the jinja2 parsing and the yaml parsing.

The downstream consequence is that we've deviated from the standard behavior for these tools meaning that

  1. Folks who pick up off-the-shelf jinja2 and yaml parsers will get different results compared to rattler-build.
  2. Existing code bases that parse v1 recipes (like the bot, conda-recipe-manager, conda-smithy etc.) would need complicated special-casing / behavior to implement this rule.
  3. People who understand how templating engines work will be surprised by the results.

This entire thing reminds me of selectors in conda build v0 recipes. Instead of using the tools as intended, extra syntax and meaning was introduced by using python eval in comments in yaml. This choice caused non-trivial issues, work, and pain for the entire ecosystem. Let's please not make this mistake again.

@wolfv
Copy link
Member Author

wolfv commented May 16, 2025

I think I can live with your point @beckermr.

However, one thing that doesn't work right now is:

context:
   specs: [a,b,c]

requirements:
  run:
    - ${{ specs }}

And I think we should make that work (this will be currently interpreted as - "[a, b, c]"

The other thing that will be much harder to make work is:

context:
   toml_contents: ${{ load_from_file("pyproject.toml") }}

We could potentially serialize the file as JSON on a single line and insert it that way into the YAML to get something that we could later index into. But not sure if that is a great idea.

One workaround would be something like:

context:
  _loader: |
    {% set toml = load_from_file("pyproject.toml") %}
  version: ${{ toml.version }}

But we don't advertise this capability widely :)

@beckermr
Copy link

Unfortunately these special cases are also outside standard jinja2 and yaml behavior. Thus I don't think you can change their behavior without the same consequences as above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants