Generative AI policy: things that go into git #161
Replies: 6 comments 4 replies
-
Prior art:
I'd like to add and study an example of a project that explicitly accepts such contributions without hesitation. |
Beta Was this translation helpful? Give feedback.
-
Scope: Let's see how heated this discussion gets, but I hope we can achieve alignment across organization projects, and am inclined to let affiliate projects set their own course. |
Beta Was this translation helpful? Give feedback.
-
Alright I have a lot of thoughts on this. I will also note that I'm probably not entirely unbiased since I work for NVIDIA and so I have some vested interests in this. With that said though, all my opinions are my own and do not in any way reflect the position of my employer. I think AI assisted coding is inevitable. Actually that's not really strong enough: by most measurements, it's not a future inevitability but a present reality. I've seen different numbers on this, but they're all mostly consistent that well over half of all software engineers are using AI to some extent in the production of code today. So… it's happening. There's a whole spectrum here though. On one end of the spectrum is stuff like copilot or the various jetbrains features, which is basically just spicy autocomplete. It doesn't save you that much time, it's often wrong, but it's super low effort so a lot of people use it. On the other end of the spectrum are things like Cursor, which basically turns programming into a giant code review exercise. Cursor tends to work incredibly well for some things and can be a many order-of-magnitude level productivity enhancer. It can also completely fumble the ball on certain problems, and then you just end up falling back to good old-fashioned hand coding. And then finally in the middle you have stuff like writing vague plaudits into ChatGPT or Claude and copy/pasting the code it generates (usually with considerable editing). This case is basically just a Stack Overflow replacement. I have used the last one on Cats Effect, though not super extensively as yet. For the cases where I used it, I was able to do in about 15 minutes what would have taken me several days otherwise (conservatively). It's not applicable everywhere though, and I couldn't use the outputs verbatim (nor was it necessary to do so). Speaking of which, this is a good essay and I highly recommend we all read it for context: https://steve-yegge.medium.com/the-death-of-the-stubborn-developer-b5e8f78d326b You may not agree with Steve on this (or many other things), but he does a decent job of explaining the space into which AI is settling within our industry. I (obviously) work with a lot of people who leverage AI as a core part of their development workflow, and their experiences anecdotally align extremely closely with what Yegge describes. So with all of that out of the way, I don't think it's productive or even possible to have a blanket "no". In the very near future, and more than likely the very real present, contributors will either walk away to avoid breaking such a rule or will simply lie about whether they leveraged generative AI in the creation of some code. For better or worse, we're fait accompli to some extent. With that said, I'll take your bullets in turn very quickly and then conclude with some more concrete suggestions:
Yeah this bothers me somewhat. I don't have a good answer for it. Arguably open weights models (like llama) don't really help this, either, since the datasets and training methodologies aren't available. Conversely though, this isn't that different from someone five years ago using IntelliJ to perform a large refactoring. It's also a proprietary tool generating code on someone's behalf, and we've long since accepted that this sort of thing is both reasonable and welcome in OSS.
One of the things I work on is data center provisioning and utilization for very very large AI model training, including (of course) power management and cooling. I can't say too much but I will say that this problem has been wildly overblown by most media. Power grids are immensely complex and not all megawatts of consumption (or production!) are created equal. So at the very least, if you're losing sleep over this, I would advise relaxing a bit.
In my experience, this is happening only with very poor engineers leveraging AI tooling. Basically, bad engineers can now produce bad code in immense volume, while good engineers can also produce good code in immense volume. The latter continues to learn at essentially the same rate, while the former continues to eschew learning to the same degree. Thus, I'm not convinced there's any real paradigm shift here.
I'm open-minded, but I haven't seen this work particularly well. Documentation, done well, is a human-level communication effort. Even heavily edited and very expertly produced AI-generated documentation is, in my experience, not as good (though it can be good enough to be useful!). Amusingly, some models seem to do way better at this than others (just as some models are way better at coding than others).
If the code quality is good, then this would seem to be a good thing. If the code quality is bad then obviously it's a severe negative, regardless of how the code was created.
I'm interested to see how this unfolds, but particularly with my proposed ban on fully generative engagement (issues/prs/profiles/etc) in the other thread, I don't see how it would have too much of an impact.
This is more nuanced, but IMO the contributor (who must be a human!) is responsible for their own code quality, regardless of how that code came to be. If the quality is poor, then they shouldn't have submitted the PR without more modification and we shouldn't accept it without more modification. This certainly points to one very specific paradigm shift in the age of AI, which is that we all will be spending a lot more time reading and reviewing code and less time actually typing it. That's true for contributors and maintainers alike. IMO maintainers can continue to self-regulate on this one. If they're getting low quality PRs from someone, then they're going to end up giving them less time and being more likely to close them. What we should explicitly discourage is a shifting overton window on code quality. AI doesn't change the need to have high quality, well-reviewed and maintainable code, despite what a lot of people say.
Anecdata here, but the place where I used generative AI on Cats Effect was specifically in trying to get the kqueue signaling support working for LLVM multithreading. It's pretty easy to show that this particular signaling pattern really only exists in three codebases total: GoLang's runtime, Tokio, and Cats Effect, and the last of those three has a slightly different take on it. I'd say that definitely qualifies as exploring a new idea.
Code quality by another name. Agree with the risk but I think the answer is the same: we should not lower our acceptance standards for the merge button.
I guess this is just one aspect of it, but yes, agreed.
To me this is probably the biggest and most (today) insurmountable question. This whole thing is a legal gray area. OSS is obviously no stranger to legal gray areas in general but this feels like a whole new frontier and it's disconcerting. Fwiw, I've had Cursor spit back out at me code that I myself wrote a long time ago (the most amusing one was when it hallucinated something based on the contents of slides I used at Scala Days two years ago). That's of course a somewhat degenerate case since I own the copyright anyway, but it does speak to the danger. Even when things aren't being repeated line-for-line, it's very clear that, at a minimum, copyright holders have minimal practical control over how their data is being used in training, and participation in the tooling ecosystem which arises from this is sort of a form of implicit endorsement. I'm obviously aware of all the arguments on both sides, and the truth is that I don't have a great set of answers. For Typelevel specifically, it's probably not a particular legal risk since, as I noted, literally everyone is doing this now. So it's not like we're out on a limb by ourselves: the whole industry has to grapple with this together. So I think there's not a lot of actual risk to us, it's more of a moral hazard.
Fwiw, I (unfortunately) haven't seen as much effect on this front as I would have hoped (and as others have prognosticated). The best description of the effects of AI-based developer tooling really does seem to be that you get more of whatever you were already doing.
Onus on the contributor IMO. Code that is difficult to review is bad code, regardless of where it comes from. Overall, my position is that we should not attempt to police this (it's probably impossible anyway), but we should probably make it clear that we continue to believe that high quality, readable, and maintainable code as a textual artifact unto itself remains vital. We are not going to compromise on this front, regardless of what kinds of tooling is used to create the code itself. Maintainers will still hold pull requests to the same standards they have always held, and contributors who repeatedly produce low-quality PRs will likely find themselves deprioritized or, in extreme cases, peremptorily refused. Maintainers should be jealous of their time, as always. |
Beta Was this translation helpful? Give feedback.
-
Alright. I'll fill out Gemini's survey, too.
Yep.
As the hippiest of the open source hippies here, I'm actually not worried! The organization should avoid imposing proprietary tools on everyone, but you using Cursor doesn't preclude me from using cat.
Concerned, but the data is hard to parse.
I endorse the analogy that AI assistance is renting knowledge instead of buying. Especially because it doesn't imply a binary answer. If we don't merge what we don't understand, it's more of an individual concern.
I'm more optimistic for editing than materializing.
A good thing, if and only if the maintainer-to-contributor ratio is sustainable.
Not worried, if and only if we are vigilantly anti-slop and can agree what's slop.
If it's slop, we'll close it and boop them on the nose with the FAQ. If it's not slop but a little sloppy around the edges, maybe we need to revisit the definition of slop.
An aggressive AI ban doesn't preclude AI-assisted prototyping. This is more about whether we force a clean implementation of a winning prototype.
Hang on to your butts. Look at the intentional seeding of training data with misinformation, and now replace lies about Ukraine with obfuscated malware.
I have been disappointed by this in practice, but, sign me up.
Multiple organizations are deferring the legwork to the contributor: we'll take your AI output, but the Contributor is responsible for making sure it's a fine slurry and not a copyrighted chunk. Many AI vendors are so confident in their blenders and their strainers that they offer legal indemnification in case a chunk is found, but only to paying customers. We have relied on handwavy implicit contracts that contributions are licensed. As the minefield gets denser, it may be time to reconsider a DCO. It is also wild to me that anyone asks contributors to cite LLMs that won't cite their own sources, but the ASF guideline of a
Nah, but that's not its job.
See decreased quality. Missing from my list: consent. I admire the Software Conservancy's Aspirational Target. Particularly, "FOSS in, FOSS out, and FOSS throughout." That's not the current reality. My code is used in ways I wish it was not, but, tough. I licensed it that way. Big AI's courtroom successes may place it in that same bucket. But when Big AI turns around and argues their distillation of my intellectual property is fair use while distillations of their models are not: nope. I am struggling to reconcile that "nope" with the "inevitable: yep". |
Beta Was this translation helpful? Give feedback.
-
Lots of good points here. I think there's a higher-level point that in practice is central, and probably needs to be stated explicitly, which is reputation. I mean, Daniel makes a good argument that we shouldn't have any sort of blanket ban on AI assistance. But we clearly want to ban uncritical AI slop. The difference is the critical thinking involved, and that's something that in practice we'll figure out gradually. What we really want to avoid is wasting too much of our maintainers' time reviewing crap PRs, and it seems to me like reputation is the key concept here. Plain and simply, if someone seems to have a pattern of submitting slop PRs, we want them to go away. This is an area where I don't know how much support GitHub provides, if any. But conceptually, what we want is to keep track of whether someone has been submitting bad PRs; if so, we want to ban them across at least all of the Organization projects. (The Affiliate projects are a weird gray area, but in a perfect world I think we'd maintain a ban list that Affiliates could opt into.) (If GH doesn't provide machinery for this OOTB, there's a serious market niche for it, I suspect -- a way to manage and share blocklists seems like a tool whose time has come. Indeed, a web-of-trust model for sharing this sort of reputation data may turn out to be essential for survival in this new world.) And automation aside, our messaging should be very clear here, that your PRs are your reputation in our world. If you are submitting good, useful PRs, we're happy to welcome you as a valuable member of the community, even if you are leveraging AI tools as part of crafting them. But if you are wasting maintainer time, you are a net-negative regardless of how you are creating the PRs, and we can and will ban you from further participation if you show a pattern of this. I suspect that's the best focus for us in practice: less on how the PRs are generated, more on the end result. The copyright issue is the hard one, yes, but it's really the same: what we care about is if you're violating copyright, regardless of how that comes about. |
Beta Was this translation helpful? Give feedback.
-
Note Full disclosure up front: I'm a LLM skeptic Opinions on AI in programmingIt seems like the survey questions provide a reasonable way to give a baseline opinion, so I'll give my takes on them and a couple of issues I think they miss. Feel free to skip to the bottom where I'll give my opinion on Generative AI policy. Survey Questions
I'm a bit skeptical about this for market reasons. We're currently in the honeymoon stage of pricing for most AI companies. Kind of like how Uber and Lyft were incredibly cheap until they'd managed to capture their markets, and which time the prices rose to being at least as expensive as taking a taxi used to be. Reporting from Sept of last year indicated that OpenAI was operating at a $5 Billion/year loss1. That's only going to be able to continue for so long and when that ends, I think we're going to see a steep drop in LLM usage in daily programming tasks as the cost per token rises (especially for OSS work that isn't funded by a company).
I'm a bit nervous about this, having been unable to compile some of Google's libraries back in the day because I didn't have access to one of their internal build libraries. I don't really care if a particular developer uses proprietary tools for their work, my concern is more aligned with LLM tooling becoming necessary for stuff like generating fuzzy input for tests.
I have concerns here, probably because I live in an area that's been in a drought for basically my entire lifetime 😅 . I do hope there will be a solution to this, mostly because data centers are used for much more than just running AI models, and partly because anything the helps the power+cooling problems is going to be generally helpful. While I am sympathetic to the idea that the media is sensationalizing the problem, if articles like https://hbr.org/2024/07/the-uneven-distribution-of-ais-environmental-impacts are even within the correct order of magnitude, I'm going to want to see some hard numbers before I stop worrying about it.
The studies on this are not promising. Most of the one's I've seen focus on critical thinking in the context of general learning1, but since that's a prerequisite skill for critical thinking in the context of software development, it worries me.
I haven't seen enough examples to have an opinion on this one.
Having had a couple PRs sit at the "this looks good" stage for months (years, in one notable case), I think we're probably already at the point where the volume of contributions is overwhelming the available reviewers. Sifting through an increased volume of contributions will only be tenable if (1) there's a way to filter out the worst of the slop and (2) the increased volume of contributions results in the development of competent reviewers to grow the pool.
For as much as I keep hearing about the Death Of The Internet, I'm not sure how much of a practical issue it is, as I mostly see people coalescing around communities away from the bots. The biggest risk I see here is that our communities already have an (admittedly at least somewhat deserved) reputation for being insular. Having to sift through a bunch of AI Slop does mean we're going to have some false negatives and reject actual humans. And that kind of sucks.
The most recent study I've seen on this is from GitClear1 which points to a downward pressure on code quality driven primarily by lack of reuse and code churn. I'd be curious to see if the increasing context windows will end up helping with that, but I'm not particularly optimistic because I expect that the size of context windows in actual use will end up shrinking again as the cost inevitably rises.
I don't have enough exposure to this one to comment on specifics, but I'm skeptical for reasons I'll get into later.
This is going depend a lot on how well AI companies can secure their training data. So far, I'm not particularly impressed.
I've heard good things here, my main doubt is if it ends up being worth the downsides.
I think this is the biggest danger, and the uncertainty is sufficient that (at least for me) it may be worth rejecting PRs that have used Generative AI in their production at least until some of these cases are resolved: https://www.bakerlaw.com/services/artificial-intelligence-ai/case-tracker-artificial-intelligence-copyrights-and-class-actions/
I'm not sure this is actually something that LLMs can do. They can work around a skill gap, but I haven't really seen much of anything indicating that using an LLM to assist with coding actually helps build the skills needed for the user to be able to later accomplish the task on their own. This is part of the reason that automating away the tasks that are the stepping stones for Junior Developers is such an issue (the other being that we won't have Senior Devs if nobody can make a living a Junior Dev).
This is kind of the same sort of problem as the potential to overwhelm reviewers with the increased quantity of PRs. If it's bad enough it can't be reviewed, then it shouldn't be merged - but the question is "how much easier will this make it to create bad PRs?" Non-Survey IssuesTraining Set DriftOne of the problems with Generative AI assisted coding is that it's limited by it's training set. Similar to how Generative AI images and text tend to have a particular sameness to them, code produced by LLMs is going to be restricted by it's training set and struggle to explore new ideas, leverage new APIs or libraries, and generally keep up with the evolution of programming techniques. This is mostly an issue because, since the AI companies are operating at a loss and training is so expensive, they're going to want to use a particular trained model as long as they can get away with. I'm concerned that leaning on LLMs for programming is going to tend to ossify our idioms and techniques. This wouldn't be as much of an issue if we didn't need to review these PRs (or at least filter out stuff like "Restructure fs2 to leverage the Cake pattern"). Other Ethical ConsiderationsAside from issues around the ethics of how training data is procured, there was very troubling reporting back in 2023 about the human cost of annotating the training data1, and it's unclear if the situation has improved at all. While it's certainly not the first time a technology has been built on considerable human suffering, I can at least opt out of this one. Thoughts on Generative AI PolicyGiven the current backlog of unresolved court cases around Generative AI and copyright, I think that accepting code written using an LLM is irresponsibly risky. Once those have been resolved, it may be worth considering. At the current moment I wouldn't put too much effort going over the other considerations in detail, unless the legal risk is acceptable, as most of the objections will have (hopefully) evolved. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Typelevel should have guidelines for accepting contributions assisted by AI. Quoting myself on #160:
The scope here is code, docs, and generally anything else that we commit into Git and distribute under license. The low-effort clutter in issues and chat is covered in #160.
For balance1 and performance art2, the following conversation starters were AI assisted.
Do not expand if you are allergic to AI
Footnotes
Nah. The model has its own biases, then I curated the list. ↩
Some might call it brazen hypocrisy. ↩
Beta Was this translation helpful? Give feedback.
All reactions