Skip to content

Native string interpolation syntax #570

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

brandonchinn178
Copy link
Contributor

@brandonchinn178 brandonchinn178 commented Jan 11, 2023

Related to #569, but is an orthogonal feature that could be added or not added independently of it. Additionally, I expect this to be more controversial than multiline string support. Regardless, there hasn't been a proposal for this yet, so this will at least get discussion going on an official channel.

Related discussions:

Rendered

Updates

Rough timeline, copied from a comment I made below:

  1. Finish string parsing refactor ✅
  2. Prototype string interpolation in GHC, experiment with the different options live ✅
    • Branch: wip/interpolated-strings
    • 2024-09-20: Started work
    • 2024-12-24: Got single-line interpolated strings working end-to-end
  3. Compile the different options into a doc ✅
    • 2025-01-20: Done!
  4. Do a community poll on the doc ✅
  5. Update proposal with the winner of the poll + document other options under Alternatives ✅
    • 2025-05-01: Updated proposal

@guibou
Copy link
Contributor

guibou commented Jan 11, 2023

Do you have any idea how your new syntax can be made extensible so it allows formatting, ala python "f-string", or https://hackage.haskell.org/package/PyF?

@nomeata
Copy link
Contributor

nomeata commented Jan 11, 2023

That's a big bike shed to paint, I expect :-)

I'm wondering if, these days, a feature like this could not begin as a library (likely using a GHC plugin). It'd be as opt in as a language extension, a bit less convenient to obtain (but not too bad in a cabal project), but allow quicker iteration. When it has become reasonable stable and reasonable popular, turning it into a native feature (for even easier access and better error messages, I presume) can then be discussed.

@JakobBruenker's plugin for banged monadic sub expressions is an example for that path.

@JakobBruenker
Copy link
Contributor

JakobBruenker commented Jan 11, 2023

Worth noting that syntax plugins, with what GHC offers at the moment, are a bit annoying if the GHC parser doesn't already accept what you want your syntax to be. (You'd need a pre-processor with -pgmF, but that doesn't easily handle Haskell sub-expressions...)

The s"..." syntax in particular though I don't think should be affected by that, since it's already valid syntax. (and my plugin wasn't affected by that, either)

@noughtmare
Copy link
Contributor

noughtmare commented Jan 11, 2023

To me it sounds like the principles that lead people to avoid Template Haskell are fixable. I think the main complaint against template Haskell is that it can run arbitrary code at compile time. But that can be addressed by adding a language extension like PureQuasiQuotes that only allows quasiquotes that only require the Quote typeclass and not the full Quasi type class (which brings in MonadIO). If you really want to be sure no malicious code gets executed you can combine it with a sandbox/vm and Safe Haskell (for now).

Or are there more reasons to avoid quasi quotes?

@brandonchinn178
Copy link
Contributor Author

More reasons people avoid Template Haskell: slow compilation, bad recompilation, and bad cross-compile support. The first two are potentially fixable, but at least the last one would involve redesigning how TH works.

I think PureQuasiQuote is a step in the right direction, but idk if it would solve those problems. Plus, as I mention in the proposal, it wouldnt allow reusing features like multiline string support, so youd have to reimplement the multiline indentation algorithm (not a big deal, but still)

@ChickenProp
Copy link

My feeling is that

  1. In general I think doing interpolation through template haskell seems fine. It doesn't work for everyone, but are there enough people it doesn't work for, to justify reifying this as a language feature?
  2. This particular approach seems costly. The vast majority of packages are going to "need" to define at least one Interpolate instance, requiring a minor version bump (per PVP) and perhaps some CPP for backwards compatibility. "Need" in scare quotes because sure, they could just not do that... but users are going to want it. I'd be nervous about imposing this on people. (I also don't love that it embeds String still deeper into the ecosystem, but eh, probably that ship has sailed and if we ever manage to move away from it, this won't make things that much harder.)
  3. Relatedly, this approach looks untested? If we're doing this, I'd expect us to take an existing TH-based (or plugin-based) interpolation library and say "we now support this style of interpolation natively, and if you want a different interpolation style there's still TH". Or perhaps to offer some way of making many different interpolation styles available without TH. Why propose a whole new one instead of something that already exists?

@brandonchinn178
Copy link
Contributor Author

I'm wondering if, these days, a feature like this could not begin as a library

@nomeata yeah, that's fair (which also relates to @ChickenProp's point). I might prototype a GHC plugin for this at some point

@nomeata
Copy link
Contributor

nomeata commented Jan 12, 2023

I’d also like to advertise the idea of giving quasiquotes better multi-line syntax in #569 (comment)

@re-xyr
Copy link

re-xyr commented Jan 12, 2023

My concern is that if you interpolate in some Text then they're first converted to Strings and then the whole thing is converted back to Text, which seems pretty inefficient to me.

@brandonchinn178
Copy link
Contributor Author

@re-xyr yes, you're right. With that performance issue and @endgame's comments about Interpolate allowing string interpolation to get around safe interpolation (e.g. with SqlQuery), I decided to change the design to the Interpolate s/InterpolateValue s a version. It should be safer and better scalable, but it's certainly a degree more complex, using more recent features like MultiParamTypeClasses.

I'll be prototyping this approach here — hopefully I'll be able to make progress on this in the near future

@brandonchinn178
Copy link
Contributor Author

I'm wondering if, these days, a feature like this could not begin as a library (likely using a GHC plugin).

@nomeata in this case, I think I'll need to use a preprocessor, so that I can prototype #569 at the same time, which isn't valid syntax currently.

One thing I just realized: it doesn't seem like GHC supports multiple -pgmF options, so if someone wants to try out both string interpolation and some other proposal, they won't be able to. Puts a damper in recommending this as a standard process

@JakobBruenker
Copy link
Contributor

@brandonchinn178 I haven't tried it, but I think you can by writing a script that runs both pre-processors and use that as pre-processor. But admittedly that's annoying (and only works if the pre-processors are compatible with each other.)

@nomeata
Copy link
Contributor

nomeata commented Jan 13, 2023

Yes, preprocessors are probably just good enough for demo propotypes, but not for production quality (like GHC plugin based approaches might be)

@konsumlamm
Copy link
Contributor

To me it sounds like the principles that lead people to avoid Template Haskell are fixable. I think the main complaint against template Haskell is that it can run arbitrary code at compile time. But that can be addressed by adding a language extension like PureQuasiQuotes that only allows quasiquotes that only require the Quote typeclass and not the full Quasi type class (which brings in MonadIO). If you really want to be sure no malicious code gets executed you can combine it with a sandbox/vm and Safe Haskell (for now).

Or are there more reasons to avoid quasi quotes?

Apart from the already mentioned problems, most string interpolation libraries depend on haskell-src-meta, which is a huge dependency. I think this is a big reason to avoid them, as the costs outweigh the gains (it's just nicer syntax for ++/show after all). If on the other hand, GHC had native string interpolation syntax, there wouldn't be many reasons not to use it. Another solution of course is to improve the ways to build interpolation quasiquoters (see https://gitlab.haskell.org/ghc/ghc/-/issues/20862), but given the comments on that issue, I don't have much hope that something like that will ever happen. haskell-src-meta also has the issue that it's not always up to date and more prone to have errors, e.g. see haskell-party/haskell-src-meta#4.

Another (minor) problem with quasiquoters is that they don't have nice syntax highlighting and I don't see how that could be realistically implemented for 3rd party libraries.

@brandonchinn178
Copy link
Contributor Author

FYI I have a working prototype at https://github.com/brandonchinn178/string-syntax + I've updated the proposal

@yy0zz
Copy link

yy0zz commented Jun 3, 2023

"printf is partial and unsafe, which especially safety-conscious people might always stay away from anyway."

You could patch "printf" to use the default formatting when the types don't match instead of crashing. E.g. printf "%d" "a" is "a" or replacement char. (Ugly, I know).

Or #387

@subterfugue
Copy link

@yy0zz Other than you personally mentioning that it's ugly, I would also like to mention that even with that merged proposal it's still not as convenient as what @brandonchinn178's proposal achieves. Speaking of, how come there's no activity? It would be really nice if we get this into Haskell...

@aryah47
Copy link

aryah47 commented Jun 2, 2024

far, far too saccharine for me (ie too much special syntactic sugar) -- whatever the semantics of this special s""" ... """ syntax should be, a normal variadic printf-like function (some_fn """ ... """) that takes a (multiline) string argument seems to be able to implement just as well.

@googleson78
Copy link
Contributor

printf is not comparable to string interpolation, as it achieves a different thing, imo. For example, with a printf style, you still need to match up the index of a following argument with the corresponding % in the string you're printfing in if you want to know what you'll actually be putting in your final string.

(as an aside, from looking at/using various type safe printf libraries in Haskell, I'm not convinced most of them offer much more of a benefit compared to using Text.concat or a lot of invocations of (<>))

@tomjaguarpaw
Copy link
Contributor

tomjaguarpaw commented Feb 19, 2025

Thanks to @brandonchinn178 for implementing the prototypes. Based on that, I'd like to make two suggestions to the committee:

  1. If string interpolation is added to GHC, let it be an extensible TH-based one (introduced by @TeofilC at Native string interpolation syntax #570 (comment), and implemented by @brandonchinn178 as extensible-th). An extensible TH-based one will be strictly more general than the others: i.e. the others can be implemented in terms of it, but it can't be implemented in terms of the others. I have sketched how this works on Discourse.

  2. Please require that any new types added to support the interpolation feature be abstract. It is really important from the point of view of stability that we don't bake in concrete types to this feature, otherwise we paint ourselves into a corner and can never extend it. For example, in @brandonchinn178's extensible-th implementation, the interpolator is given a type like

    myUserDefinedInterpolator :: [Either String (Q Exp)] -> Q Exp
    myUserDefinedInterpolator = <body>

    I cannot stress strongly enough that the types in the final design should be abstract, that is, something like this

    newtype Interpolator =
      MkInterpolator_NotExported ([Either String (Q Exp)] -> Q Exp)
    
    makeInterpolator ::
      ([Either String (Q Exp)] -> Q Exp) ->
      Interpolator
    makeInterpolator = MkInterpolator_NotExported

    so that users define their interpolators like

    myUserDefinedInterpolator :: Interpolator
    myUserDefinedInterpolator = makeInterpolator <body>

    This avoids committing to a concrete representation for interpolators, which allows us to find a better one (perhaps better performing) later, or extend interpolators to support more features.

  3. [EDIT] Keeping Interpolator abstract allows to backend to avoid TH when the implementation of a particular interpolator doesn't require it. For example, the backend could use

    newtype Interpolator =
      MkInterpolator_NotExported ([Either String (Q Exp)] -> Q Exp)
      | forall a c. (Typeable a, Typeable c) => MkHasClassInterpolator_NotExported ([Either String (HasClass c)] -> a
    
    makeInterpolator ::
      ([Either String (Q Exp)] -> Q Exp) ->
      Interpolator
    makeInterpolator = MkInterpolator_NotExported
    
    makeHasClassInterpolator ::
      (Typeable a. Typeable c) =>
      ([Either String (HasClass c)] -> a) ->
      Interpolator
    makeHasClassInterpolator = MkHasClassInterpolator_NotExported

    (I'm not sure whether the type variables should be Typeable existentials or whether they should be parameters of Interpolator. I think that element of the design needs to be fleshed out for extensible-th to be truly the most general.)

    [EDIT: This doesn't work because it only allows the backend to be chosen at run time, whereas we need to choose at compile time. See Native string interpolation syntax #570 (comment) . But I'm sure something along these lines will work.]

@konsumlamm
Copy link
Contributor

  1. If string interpolation is added to GHC, let it be an extensible TH-based one (introduced by @TeofilC at Native string interpolation syntax #570 (comment), and implemented by @brandonchinn178 as extensible-th).

As mentioned before, there are some problems with Template Haskell: #570 (comment).

@tomjaguarpaw
Copy link
Contributor

One benefit of keeping Interpolator abstract is that the backend doesn't actually have to use TH for implementations that don't demand it. I'll update my comment to make that clear.

@brandonchinn178
Copy link
Contributor Author

brandonchinn178 commented Feb 19, 2025

@tomjaguarpaw can you explain (or even better, make a PR into the prototype repo) how swapping the backend would work? Currently, the implementation would be a plain rewrite, but with multiple backends, GHC would have to decide how to rewrite by inspecting the value of the Interpolator?

EDIT: The benefit of an abstract interpolator is also still not clear to me. We're still asking the user to commit to providing us a function of type [Either String ...] -> .... If we find a better representation in the future, presumably the user would still have to rewrite their body with the new type. If this is just for backwards compatibility, how is it any different than just supporting interpolators of either type?

@tomjaguarpaw
Copy link
Contributor

Currently, the implementation would be a plain rewrite, but with multiple backends, GHC would have to decide how to rewrite by inspecting the value of the Interpolator?

Correct. The simplest thing to do would be to use TH for everything. But that runs afoul of your objections at #570 (comment), so the alternative is to choose the between backends at compile time. I realise that my sketch above only allows the choice to be made at run time, so it's not good enough. I'll have a think about how the choice can be made at compile time. I guess it requires a type level tag, which is a bit unfortunate, but maybe unavoidable.

@TeofilC
Copy link
Contributor

TeofilC commented Feb 19, 2025

I think a nice way to avoid having to always use TH would be to hard-code some of the translations into GHC. I think we'd at least want s"..." to be handled like this. We could also add some others if we wanted, eg, something that does typeclass metaprogramming like extensible-hasclass but just based on the types involved. And TH would only be invoked if we encounter a name we don't recognise as a builtin.

Note that this is analogous to how [e| ... |] is hard-coded into GHC (and doesn't require running TH) while [foo| ... |] calls a quasiquoter via TH (although this differs because we would struggle to write [e| |] as a quasiquoter).

@tomjaguarpaw
Copy link
Contributor

I think a nice way to avoid having to always use TH would be to hard-code some of the translations into GHC

Sure, but the question is how do you support user-defined interpolations, some of which want to use TH and some of which don't (because they use HasClass, for example).

@TeofilC
Copy link
Contributor

TeofilC commented Feb 19, 2025

Sure, but the question is how do you support user-defined interpolations, some of which want to use TH and some of which don't (because they use HasClass, for example).

This is slightly different to how hasclass works in https://github.com/brandonchinn178/string-syntax#extensible-hasclass, but you could have something like (just a sketch):

-- Original
hasclass"SELECT * FROM user WHERE name = ${name}"

-- Desugared
interpolate
  [ Left "SELECT * FROM user WHERE name = "
  , Right (HasClass name)
  ]

where we have:

class InterpolateHasClass c a | a -> c where
  interpolate :: Either String (HasClass c) -> a

So you can still have user-defined ones with this more limited pattern without requiring it to take up the interpolator namespace.

This does come with the trade-off that it is less explicit, since we are relying on typeclasses more.

@michaelpj
Copy link
Contributor

michaelpj commented Feb 19, 2025

Another data point I remembered: Scala has had customizable string interpolators approximately forever. The implementation makes use of subtyping, but essentially you get a list of the string parts, and then a list of typed expression parts that go in between the string parts. This isn't easy to do in Haskell without committing to the expressions all having the same type. The result type, however, is not changeable, they only have one string type.

Note that this works only at runtime: no compile-time cleverness is possible. So that's some evidence that other language communities have got away without that feature.

@TeofilC
Copy link
Contributor

TeofilC commented Feb 21, 2025

Note that this works only at runtime: no compile-time cleverness is possible [with string interpolation in Scala]. So that's some evidence that other language communities have got away without that feature.

I found a few libraries that seem to support compile time checks on string interpolation in Scala (it sounds like they use macros, but I haven't looked into the details). So it does sound like it's possible to do this in Scala.

@brandonchinn178
Copy link
Contributor Author

Final survey for gathering feedback on the various designs:
https://discourse.haskell.org/t/ghc-string-interpolation-final-survey/11895

@brandonchinn178 brandonchinn178 force-pushed the string-interpolation branch 4 times, most recently from c00781e to 33ba51b Compare May 2, 2025 07:34
@brandonchinn178
Copy link
Contributor Author

Proposal is updated with survey results: https://brandonchinn178.github.io/ghc-string-interpolation-prototypes/results/

Please take a look at the updated proposal.

cc @sgraf812 @michaelpj

@brandonchinn178 brandonchinn178 force-pushed the string-interpolation branch 2 times, most recently from 388745a to 68378d4 Compare May 2, 2025 07:44
@TeofilC
Copy link
Contributor

TeofilC commented May 2, 2025

My feeling is that the QualifiedLiteral stuff would be best served well by being split into another proposal. I'm not on the committee so take that with a pinch of salt.

@endgame
Copy link

endgame commented May 4, 2025

I agree with @TeofilC: I think it's worth thinking about whether -XQualifiedLiterals is worth extracting into another proposal, designing it in a way that it will dovetail nicely with interpolation, getting it merged, and then proposing a clear version of string interpolation that builds upon it.

It seems easy to me to say that overloaded literals might be cool and that we can iron out the details in another proposal, but I think there are some non-obvious interactions lurking there. First off, you need to justify why (e.g., for vectors) Vector.[1, 2, 3] is better than Vector.fromList [1, 2, 3]. The main reason I can think of is if qualified literals select an alternate class for overloaded literal resolution (as -XQualifiedDo sort-of does, by selecting that module's versions of (>>=), etc.)

It is not clear to me why the proposal's current text says Text."foo" desugars to Text.fromString "foo" but the SQL example desugars to an interpolation. It could be that every qualified string literal is an interpolation, but that also seems surprising to me. My initial impression of a qualified string literal class would be that I'd be getting the IsString instance from the qualifying module, but I'm not sure what flexibility that gives us in practice.

It is also not clear to me how qualified numeric literals would behave because the conversion functions are still entangled with the Num hierarchy.

Another interaction point: if we had -XQualifiedLiterals agreed upon, and nailed down exactly what that meant for strings, then -XStringInterpolation could be defined almost entirely in terms of desugaring s"foo" to Some.Default.Interpolating.Module."foo". You could even have the extension flag take an optional argument specifying the module to use.

@brandonchinn178 brandonchinn178 force-pushed the string-interpolation branch from 68378d4 to adcfcd5 Compare May 7, 2025 02:35
@brandonchinn178
Copy link
Contributor Author

Thanks @TeofilC and @endgame. I realized that qualified literals had some interesting edge cases to hammer out. I broke out a separate proposal here: #698

I would prefer to put this proposal on hold until the conclusion of that proposal, in case it gets rejected, I might want to rethink this proposal. With any luck, it'll be less controversial than string interpolation 🤞

@sgraf812
Copy link
Contributor

From a cursory read, I like what I'm seeing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs revision The proposal needs changes in response to shepherd or committee review feedback
Development

Successfully merging this pull request may close these issues.