Generative AI policy: things that go into git #161

rossabaker · 2025-03-21T02:47:42Z

rossabaker
Mar 21, 2025
Maintainer

Typelevel should have guidelines for accepting contributions assisted by AI. Quoting myself on #160:

The absence of a policy is either tacit acceptance, or sets someone up to have their AI-assisted-but-substantial work blindsided by an AI skeptic.

The scope here is code, docs, and generally anything else that we commit into Git and distribute under license. The low-effort clutter in issues and chat is covered in #160.

For balance¹ and performance art², the following conversation starters were AI assisted.

Do not expand if you are allergic to AI

inevitable in modern development
dependence on proprietary tools
ecologically unsustainable
discouragement of human learning
assistance with documentation
increased contribution volume
impact on the sense of community
decreased code quality
exploration of new ideas
increased security risks
automation of tedious tasks
risk of copyright infringement
bridges skill gaps
difficulty in reviewing

Nah. The model has its own biases, then I curated the list. ↩
Some might call it brazen hypocrisy. ↩

rossabaker · 2025-03-21T02:56:44Z

rossabaker
Mar 21, 2025
Maintainer Author

Prior art:

Apache Software Foundation: allowed, with significant restrictions
NetBSD: assumed tainted, requires prior approval
Gentoo: expressly forbidden
Linux Foundation: allowed, but some onus on Contributor
Debian: punted
Python Software Foundation: seems more like Generative AI policy: comments, chat, etc. #160!
Eclipse Foundation: generally favorable, but encourages transparency
Mozilla: not as focused on code as some

I'd like to add and study an example of a project that explicitly accepts such contributions without hesitation.

3 replies

rossabaker Apr 7, 2025
Maintainer Author

More prior art:

Elementary OS: broad ban, with exception for translation to English
Servo: general ban with proposal to relax

rossabaker Apr 12, 2025
Maintainer Author

Delan Azabani's post on generative AI in Servo is a great read about both AI and governance.

jducoeur Apr 12, 2025
Maintainer

Seems like a generally good analysis.

Stepping back from the ethical and IP issues for a minute, the cost/benefit analysis keeps being the sticking point for me. The heart of the problem is that the AI-generated submissions we are receiving seem to too often be uncritical -- people submitting generated code without 100% understanding that code. That's flatly bad, and we need to make clear that it's unacceptable.

Beyond that, honestly, it gets complicated. AFAIK, we have no reliable way of detecting Copiloted code aside from "bad smell". And if it wasn't for the IP issues, I wouldn't actually have a problem with some amount of AI involvement provided the contributor truly owns their contribution and has done the work to clean it up before submitting it. In the long run, I think that's probably going to be the correct stance.

So I keep winding up with different viewpoints for "now" vs "maybe later". I think that LLMs are gradually going to become involved with code development as a matter of routine -- certainly that's the way the wind is blowing, and while Shopify's position is extreme, it doesn't appear unusual. Hence, I'm reluctant to speak in long-term absolutes.

That said, for now, the legal and ethical issues are badly unsettled, and the submissions we're getting are tending to be AI slop rather than meaningful contributions, which wastes maintainer time (which is our most precious resource).

So while I sympathize with people who want to use these tools, IMO it's probably appropriate to say "no" as a good heuristic for the time being, with an eye towards reviewing the landscape again in a year or two. If the programming community has developed better habits and the legal environment has clarified -- if we get to the point where there is no meaningful difference between LLM-assisted and human-written submissions -- we can change our minds later.

rossabaker · 2025-03-21T03:13:58Z

rossabaker
Mar 21, 2025
Maintainer Author

Scope:

Let's see how heated this discussion gets, but I hope we can achieve alignment across organization projects, and am inclined to let affiliate projects set their own course.

0 replies

djspiewak · 2025-03-21T14:57:28Z

djspiewak
Mar 21, 2025
Maintainer

Alright I have a lot of thoughts on this. I will also note that I'm probably not entirely unbiased since I work for NVIDIA and so I have some vested interests in this. With that said though, all my opinions are my own and do not in any way reflect the position of my employer.

I think AI assisted coding is inevitable. Actually that's not really strong enough: by most measurements, it's not a future inevitability but a present reality. I've seen different numbers on this, but they're all mostly consistent that well over half of all software engineers are using AI to some extent in the production of code today. So… it's happening.

There's a whole spectrum here though. On one end of the spectrum is stuff like copilot or the various jetbrains features, which is basically just spicy autocomplete. It doesn't save you that much time, it's often wrong, but it's super low effort so a lot of people use it. On the other end of the spectrum are things like Cursor, which basically turns programming into a giant code review exercise. Cursor tends to work incredibly well for some things and can be a many order-of-magnitude level productivity enhancer. It can also completely fumble the ball on certain problems, and then you just end up falling back to good old-fashioned hand coding. And then finally in the middle you have stuff like writing vague plaudits into ChatGPT or Claude and copy/pasting the code it generates (usually with considerable editing). This case is basically just a Stack Overflow replacement.

I have used the last one on Cats Effect, though not super extensively as yet. For the cases where I used it, I was able to do in about 15 minutes what would have taken me several days otherwise (conservatively). It's not applicable everywhere though, and I couldn't use the outputs verbatim (nor was it necessary to do so).

Speaking of which, this is a good essay and I highly recommend we all read it for context: https://steve-yegge.medium.com/the-death-of-the-stubborn-developer-b5e8f78d326b You may not agree with Steve on this (or many other things), but he does a decent job of explaining the space into which AI is settling within our industry. I (obviously) work with a lot of people who leverage AI as a core part of their development workflow, and their experiences anecdotally align extremely closely with what Yegge describes.

So with all of that out of the way, I don't think it's productive or even possible to have a blanket "no". In the very near future, and more than likely the very real present, contributors will either walk away to avoid breaking such a rule or will simply lie about whether they leveraged generative AI in the creation of some code. For better or worse, we're fait accompli to some extent.

With that said, I'll take your bullets in turn very quickly and then conclude with some more concrete suggestions:

dependence on proprietary tools

Yeah this bothers me somewhat. I don't have a good answer for it. Arguably open weights models (like llama) don't really help this, either, since the datasets and training methodologies aren't available. Conversely though, this isn't that different from someone five years ago using IntelliJ to perform a large refactoring. It's also a proprietary tool generating code on someone's behalf, and we've long since accepted that this sort of thing is both reasonable and welcome in OSS.

ecologically unsustainable

One of the things I work on is data center provisioning and utilization for very very large AI model training, including (of course) power management and cooling. I can't say too much but I will say that this problem has been wildly overblown by most media. Power grids are immensely complex and not all megawatts of consumption (or production!) are created equal. So at the very least, if you're losing sleep over this, I would advise relaxing a bit.

discouragement of human learning

In my experience, this is happening only with very poor engineers leveraging AI tooling. Basically, bad engineers can now produce bad code in immense volume, while good engineers can also produce good code in immense volume. The latter continues to learn at essentially the same rate, while the former continues to eschew learning to the same degree. Thus, I'm not convinced there's any real paradigm shift here.

assistance with documentation

I'm open-minded, but I haven't seen this work particularly well. Documentation, done well, is a human-level communication effort. Even heavily edited and very expertly produced AI-generated documentation is, in my experience, not as good (though it can be good enough to be useful!). Amusingly, some models seem to do way better at this than others (just as some models are way better at coding than others).

increased contribution volume

If the code quality is good, then this would seem to be a good thing. If the code quality is bad then obviously it's a severe negative, regardless of how the code was created.

impact on the sense of community

I'm interested to see how this unfolds, but particularly with my proposed ban on fully generative engagement (issues/prs/profiles/etc) in the other thread, I don't see how it would have too much of an impact.

decreased code quality

This is more nuanced, but IMO the contributor (who must be a human!) is responsible for their own code quality, regardless of how that code came to be. If the quality is poor, then they shouldn't have submitted the PR without more modification and we shouldn't accept it without more modification. This certainly points to one very specific paradigm shift in the age of AI, which is that we all will be spending a lot more time reading and reviewing code and less time actually typing it. That's true for contributors and maintainers alike.

IMO maintainers can continue to self-regulate on this one. If they're getting low quality PRs from someone, then they're going to end up giving them less time and being more likely to close them. What we should explicitly discourage is a shifting overton window on code quality. AI doesn't change the need to have high quality, well-reviewed and maintainable code, despite what a lot of people say.

exploration of new ideas

Anecdata here, but the place where I used generative AI on Cats Effect was specifically in trying to get the kqueue signaling support working for LLVM multithreading. It's pretty easy to show that this particular signaling pattern really only exists in three codebases total: GoLang's runtime, Tokio, and Cats Effect, and the last of those three has a slightly different take on it. I'd say that definitely qualifies as exploring a new idea.

increased security risks

Code quality by another name. Agree with the risk but I think the answer is the same: we should not lower our acceptance standards for the merge button.

automation of tedious tasks

I guess this is just one aspect of it, but yes, agreed.

risk of copyright infringement

To me this is probably the biggest and most (today) insurmountable question. This whole thing is a legal gray area. OSS is obviously no stranger to legal gray areas in general but this feels like a whole new frontier and it's disconcerting. Fwiw, I've had Cursor spit back out at me code that I myself wrote a long time ago (the most amusing one was when it hallucinated something based on the contents of slides I used at Scala Days two years ago). That's of course a somewhat degenerate case since I own the copyright anyway, but it does speak to the danger.

Even when things aren't being repeated line-for-line, it's very clear that, at a minimum, copyright holders have minimal practical control over how their data is being used in training, and participation in the tooling ecosystem which arises from this is sort of a form of implicit endorsement. I'm obviously aware of all the arguments on both sides, and the truth is that I don't have a great set of answers.

For Typelevel specifically, it's probably not a particular legal risk since, as I noted, literally everyone is doing this now. So it's not like we're out on a limb by ourselves: the whole industry has to grapple with this together. So I think there's not a lot of actual risk to us, it's more of a moral hazard.

bridges skill gaps

Fwiw, I (unfortunately) haven't seen as much effect on this front as I would have hoped (and as others have prognosticated). The best description of the effects of AI-based developer tooling really does seem to be that you get more of whatever you were already doing.

difficulty in reviewing

Onus on the contributor IMO. Code that is difficult to review is bad code, regardless of where it comes from.

Overall, my position is that we should not attempt to police this (it's probably impossible anyway), but we should probably make it clear that we continue to believe that high quality, readable, and maintainable code as a textual artifact unto itself remains vital. We are not going to compromise on this front, regardless of what kinds of tooling is used to create the code itself. Maintainers will still hold pull requests to the same standards they have always held, and contributors who repeatedly produce low-quality PRs will likely find themselves deprioritized or, in extreme cases, peremptorily refused. Maintainers should be jealous of their time, as always.

0 replies

rossabaker · 2025-03-21T18:48:59Z

rossabaker
Mar 21, 2025
Maintainer Author

Alright. I'll fill out Gemini's survey, too.

inevitable in modern development

Yep.

dependence on proprietary tools

As the hippiest of the open source hippies here, I'm actually not worried! The organization should avoid imposing proprietary tools on everyone, but you using Cursor doesn't preclude me from using cat.

ecologically unsustainable

Concerned, but the data is hard to parse.

discouragement of human learning

I endorse the analogy that AI assistance is renting knowledge instead of buying. Especially because it doesn't imply a binary answer. If we don't merge what we don't understand, it's more of an individual concern.

assistance with documentation

I'm more optimistic for editing than materializing.

increased contribution volume

A good thing, if and only if the maintainer-to-contributor ratio is sustainable.

impact on the sense of community

Not worried, if and only if we are vigilantly anti-slop and can agree what's slop.

decreased code quality

If it's slop, we'll close it and boop them on the nose with the FAQ. If it's not slop but a little sloppy around the edges, maybe we need to revisit the definition of slop.

exploration of new ideas

An aggressive AI ban doesn't preclude AI-assisted prototyping. This is more about whether we force a clean implementation of a winning prototype.

increased security risks

Hang on to your butts. Look at the intentional seeding of training data with misinformation, and now replace lies about Ukraine with obfuscated malware.

automation of tedious tasks

I have been disappointed by this in practice, but, sign me up.

risk of copyright infringement

Multiple organizations are deferring the legwork to the contributor: we'll take your AI output, but the Contributor is responsible for making sure it's a fine slurry and not a copyrighted chunk. Many AI vendors are so confident in their blenders and their strainers that they offer legal indemnification in case a chunk is found, but only to paying customers. We have relied on handwavy implicit contracts that contributions are licensed. As the minefield gets denser, it may be time to reconsider a DCO.

It is also wild to me that anyone asks contributors to cite LLMs that won't cite their own sources, but the ASF guideline of a Generated-by: commit token is a nice pairing with a DCO.

bridges skill gaps

Nah, but that's not its job.

difficulty in reviewing

See decreased quality.

Missing from my list: consent.

I admire the Software Conservancy's Aspirational Target. Particularly, "FOSS in, FOSS out, and FOSS throughout." That's not the current reality.

My code is used in ways I wish it was not, but, tough. I licensed it that way. Big AI's courtroom successes may place it in that same bucket. But when Big AI turns around and argues their distillation of my intellectual property is fair use while distillations of their models are not: nope.

I am struggling to reconcile that "nope" with the "inevitable: yep".

0 replies

jducoeur · 2025-03-22T16:49:55Z

jducoeur
Mar 22, 2025
Maintainer

Lots of good points here. I think there's a higher-level point that in practice is central, and probably needs to be stated explicitly, which is reputation.

I mean, Daniel makes a good argument that we shouldn't have any sort of blanket ban on AI assistance. But we clearly want to ban uncritical AI slop. The difference is the critical thinking involved, and that's something that in practice we'll figure out gradually.

What we really want to avoid is wasting too much of our maintainers' time reviewing crap PRs, and it seems to me like reputation is the key concept here. Plain and simply, if someone seems to have a pattern of submitting slop PRs, we want them to go away.

This is an area where I don't know how much support GitHub provides, if any. But conceptually, what we want is to keep track of whether someone has been submitting bad PRs; if so, we want to ban them across at least all of the Organization projects. (The Affiliate projects are a weird gray area, but in a perfect world I think we'd maintain a ban list that Affiliates could opt into.)

(If GH doesn't provide machinery for this OOTB, there's a serious market niche for it, I suspect -- a way to manage and share blocklists seems like a tool whose time has come. Indeed, a web-of-trust model for sharing this sort of reputation data may turn out to be essential for survival in this new world.)

And automation aside, our messaging should be very clear here, that your PRs are your reputation in our world. If you are submitting good, useful PRs, we're happy to welcome you as a valuable member of the community, even if you are leveraging AI tools as part of crafting them. But if you are wasting maintainer time, you are a net-negative regardless of how you are creating the PRs, and we can and will ban you from further participation if you show a pattern of this.

I suspect that's the best focus for us in practice: less on how the PRs are generated, more on the end result.

The copyright issue is the hard one, yes, but it's really the same: what we care about is if you're violating copyright, regardless of how that comes about.

1 reply

rossabaker Mar 22, 2025
Maintainer Author

What we really want to avoid is wasting too much of our maintainers' time reviewing crap PRs, and it seems to me like reputation is the key concept here. Plain and simply, if someone seems to have a pattern of submitting slop PRs, we want them to go away.

I like this. The line will not necessarily between comments and PRs. I'm not going to link and embarrass the author here, but I'm looking at a PR now:

The patch is too small to raise copyright concerns.
It looks reasonable on first glance.
The description is orders of magnitude longer than the code and reeks of AI.
It refers to tests that may or may not need to change. Well, did they?
Nobody knows, because it didn't compile anyway!
The author is mass creating similar PRs in unrelated languages.

morgen-peschke · 2025-03-31T23:31:27Z

morgen-peschke
Mar 31, 2025

Note

Full disclosure up front: I'm a LLM skeptic

Opinions on AI in programming

It seems like the survey questions provide a reasonable way to give a baseline opinion, so I'll give my takes on them and a couple of issues I think they miss.

Feel free to skip to the bottom where I'll give my opinion on Generative AI policy.

Survey Questions

inevitable in modern development

I'm a bit skeptical about this for market reasons. We're currently in the honeymoon stage of pricing for most AI companies. Kind of like how Uber and Lyft were incredibly cheap until they'd managed to capture their markets, and which time the prices rose to being at least as expensive as taking a taxi used to be.

Reporting from Sept of last year indicated that OpenAI was operating at a $5 Billion/year loss¹. That's only going to be able to continue for so long and when that ends, I think we're going to see a steep drop in LLM usage in daily programming tasks as the cost per token rises (especially for OSS work that isn't funded by a company).

https://www.nytimes.com/2024/09/27/technology/openai-chatgpt-investors-funding.html if you have a subscription or https://www.thestreet.com/technology-1/open-ai-burning-cash-and-losing-billions for a paywall free summary.

dependence on proprietary tools

I'm a bit nervous about this, having been unable to compile some of Google's libraries back in the day because I didn't have access to one of their internal build libraries. I don't really care if a particular developer uses proprietary tools for their work, my concern is more aligned with LLM tooling becoming necessary for stuff like generating fuzzy input for tests.

ecologically unsustainable

I have concerns here, probably because I live in an area that's been in a drought for basically my entire lifetime 😅 . I do hope there will be a solution to this, mostly because data centers are used for much more than just running AI models, and partly because anything the helps the power+cooling problems is going to be generally helpful.

While I am sympathetic to the idea that the media is sensationalizing the problem, if articles like https://hbr.org/2024/07/the-uneven-distribution-of-ais-environmental-impacts are even within the correct order of magnitude, I'm going to want to see some hard numbers before I stop worrying about it.

discouragement of human learning

The studies on this are not promising. Most of the one's I've seen focus on critical thinking in the context of general learning¹, but since that's a prerequisite skill for critical thinking in the context of software development, it worries me.

One example: https://www.axios.com/2025/03/30/teachers-ai-students-critical-thinking

assistance with documentation

I haven't seen enough examples to have an opinion on this one.

increased contribution volume

Having had a couple PRs sit at the "this looks good" stage for months (years, in one notable case), I think we're probably already at the point where the volume of contributions is overwhelming the available reviewers.

Sifting through an increased volume of contributions will only be tenable if (1) there's a way to filter out the worst of the slop and (2) the increased volume of contributions results in the development of competent reviewers to grow the pool.

impact on the sense of community

For as much as I keep hearing about the Death Of The Internet, I'm not sure how much of a practical issue it is, as I mostly see people coalescing around communities away from the bots.

The biggest risk I see here is that our communities already have an (admittedly at least somewhat deserved) reputation for being insular. Having to sift through a bunch of AI Slop does mean we're going to have some false negatives and reject actual humans. And that kind of sucks.

decreased code quality

The most recent study I've seen on this is from GitClear¹ which points to a downward pressure on code quality driven primarily by lack of reuse and code churn. I'd be curious to see if the increasing context windows will end up helping with that, but I'm not particularly optimistic because I expect that the size of context windows in actual use will end up shrinking again as the cost inevitably rises.

https://www.gitclear.com/coding_on_copilot_data_shows_ais_downward_pressure_on_code_quality

exploration of new ideas

I don't have enough exposure to this one to comment on specifics, but I'm skeptical for reasons I'll get into later.

increased security risks

This is going depend a lot on how well AI companies can secure their training data. So far, I'm not particularly impressed.

automation of tedious tasks

I've heard good things here, my main doubt is if it ends up being worth the downsides.

risk of copyright infringement

I think this is the biggest danger, and the uncertainty is sufficient that (at least for me) it may be worth rejecting PRs that have used Generative AI in their production at least until some of these cases are resolved: https://www.bakerlaw.com/services/artificial-intelligence-ai/case-tracker-artificial-intelligence-copyrights-and-class-actions/

bridges skill gaps

I'm not sure this is actually something that LLMs can do. They can work around a skill gap, but I haven't really seen much of anything indicating that using an LLM to assist with coding actually helps build the skills needed for the user to be able to later accomplish the task on their own.

This is part of the reason that automating away the tasks that are the stepping stones for Junior Developers is such an issue (the other being that we won't have Senior Devs if nobody can make a living a Junior Dev).

difficulty in reviewing

This is kind of the same sort of problem as the potential to overwhelm reviewers with the increased quantity of PRs. If it's bad enough it can't be reviewed, then it shouldn't be merged - but the question is "how much easier will this make it to create bad PRs?"

Non-Survey Issues

Training Set Drift

One of the problems with Generative AI assisted coding is that it's limited by it's training set. Similar to how Generative AI images and text tend to have a particular sameness to them, code produced by LLMs is going to be restricted by it's training set and struggle to explore new ideas, leverage new APIs or libraries, and generally keep up with the evolution of programming techniques.

This is mostly an issue because, since the AI companies are operating at a loss and training is so expensive, they're going to want to use a particular trained model as long as they can get away with. I'm concerned that leaning on LLMs for programming is going to tend to ossify our idioms and techniques.

This wouldn't be as much of an issue if we didn't need to review these PRs (or at least filter out stuff like "Restructure fs2 to leverage the Cake pattern").

Other Ethical Considerations

Aside from issues around the ethics of how training data is procured, there was very troubling reporting back in 2023 about the human cost of annotating the training data¹, and it's unclear if the situation has improved at all.

While it's certainly not the first time a technology has been built on considerable human suffering, I can at least opt out of this one.

https://www.cjr.org/tow_center/qa-uncovering-the-labor-exploitation-that-powers-ai.php

Thoughts on Generative AI Policy

Given the current backlog of unresolved court cases around Generative AI and copyright, I think that accepting code written using an LLM is irresponsibly risky.

Once those have been resolved, it may be worth considering. At the current moment I wouldn't put too much effort going over the other considerations in detail, unless the legal risk is acceptable, as most of the objections will have (hopefully) evolved.

0 replies

typelevel.scala

Generative AI policy: things that go into git #161

Uh oh!

Uh oh!

rossabaker Mar 21, 2025 Maintainer

Footnotes

Replies: 6 comments · 4 replies

Uh oh!

Uh oh!

rossabaker Mar 21, 2025 Maintainer Author

Uh oh!

rossabaker Apr 7, 2025 Maintainer Author

Uh oh!

rossabaker Apr 12, 2025 Maintainer Author

Uh oh!

jducoeur Apr 12, 2025 Maintainer

Uh oh!

rossabaker Mar 21, 2025 Maintainer Author

Uh oh!

djspiewak Mar 21, 2025 Maintainer

Uh oh!

rossabaker Mar 21, 2025 Maintainer Author

Uh oh!

jducoeur Mar 22, 2025 Maintainer

Uh oh!

rossabaker Mar 22, 2025 Maintainer Author

Uh oh!

morgen-peschke Mar 31, 2025

Opinions on AI in programming

Survey Questions

Non-Survey Issues

Training Set Drift

Other Ethical Considerations

Thoughts on Generative AI Policy

rossabaker
Mar 21, 2025
Maintainer

Replies: 6 comments 4 replies

rossabaker
Mar 21, 2025
Maintainer Author

rossabaker Apr 7, 2025
Maintainer Author

rossabaker Apr 12, 2025
Maintainer Author

jducoeur Apr 12, 2025
Maintainer

rossabaker
Mar 21, 2025
Maintainer Author

djspiewak
Mar 21, 2025
Maintainer

rossabaker
Mar 21, 2025
Maintainer Author

jducoeur
Mar 22, 2025
Maintainer

rossabaker Mar 22, 2025
Maintainer Author

morgen-peschke
Mar 31, 2025