Formulate a framework for responsible LLM usage when coding, plus practical recommendations#21
Formulate a framework for responsible LLM usage when coding, plus practical recommendations#21spwoodcock wants to merge 23 commits intomainfrom
Conversation
2a147ea to
6266350
Compare
6266350 to
cf94368
Compare
dakotabenjamin
left a comment
There was a problem hiding this comment.
Thorough and well-researched. Many problems are identified, but the mitigations look weak in some places, and may fail to address the problem. I've noted a couple in my in-line comments.
We could also add some processes around the following to support some of the guidelines:
- measuring the impact of AI usage on maintainers (time spent vs. contributions?)
- for maintainers, checklist or clear factors for when to reject contributions
|
Thanks for taking the time to review @dakotabenjamin! I really appreciate your input as someone that cares about this a lot & you make some very valid points - I'll try to address them all 😄 |
Co-authored-by: DK Benjamin <dakota.benjamin@hotosm.org>
|
|
||
| Page content to come. | ||
|
|
||
| We plan to research and recommend the best open models to use, as alternatives to proprietary services. |
There was a problem hiding this comment.
Can be also generic in above document, Try to prefer open source based models than proprietary ones ! Like this
There was a problem hiding this comment.
It would be nice to provide some guidance though! It's easy to provide guidelines, but then someone gets stuck on what actual tools / models to use. We can probably provide some decent options after a little research
There was a problem hiding this comment.
@spwoodcock , is my understanding this would be a recommendation only? I think (just as our internal team is experimenting, other contributors might as well)
There was a problem hiding this comment.
Yeah for sure, simply a recommendation. Users are free to use whatever models they like.
If one day there is an org / model that we really disagree with their methods, we could possibly make a 'banned models' list
spwoodcock
left a comment
There was a problem hiding this comment.
I made a few fixes / updates 👍
|
@spwoodcock Do you think we should also mention about code license , if AI is being used to write code from some other repo or somewhere else try to see or match compatible license ? I know license with AI agents are kinda shady topic now , its very difficult to know but I feel like we make author aware at-least ! And also about code documentation: May be we can add ( something like this ) : Try to document why those changes rather than what every-line does which is self explanatory ( AI tends to do that ) |
Thanks for all the comments @kshitijrajsharma! I addressed your points and updated with suggestions 😃 For the point about licensing, there is a comment about this in "The LLVM Project's AI policy states it clearly: using AI tools to regenerate copyrighted material does not remove the copyright, and contributors remain responsible for ensuring nothing infringing enters their work [4]. The risk includes inadvertently incorporating copyrighted code or text into publicly released outputs." Do you think that is enough to cover it, or we should be more explicit somewhere, perhaps in |
|
Thanks @mjvanderveen! (your input really helped to craft this) Things remaining to do:
Now the question of the hour: how do we identify LLM generated code? I made a start on this here, but would love some input from anyone on the heuristics they have encountered to help do this, and potentially any tools they have have tested to automate it 🙏 |
77050c6 to
8e23500
Compare
dakotabenjamin
left a comment
There was a problem hiding this comment.
Would like to see more input from members of the team before giving explicit approval, but to me this is a great addition to the documentation. Once merged I'll also add it to HOT AI policy as reference.
|
I think you did great research Sam and provided a clear framework, also incorporated most feedback both from internal users as from our discussions with partners. I do suggest we first run it by our full tech team meeting before merging. |
|
Adding an idea I read here that I really like: Good use for AI / LLMs:
|
|
Remaining things to update:
Once complete, we will discuss in our next team meeting, then merge once happy with it👍 |
As I understand it, there are two axes of copyright risk with AI contributions. One is the regeneration of copyrighted material and the other is the copyright-ability of the AI portions of the contribution. I link to the US legal side of this concern, though as I understand it, the EU side is similar. In short, disclosure is critical for the author asserting the copyright on contributions that include AI written bits. |
Thats a really valuable document, thanks for sharing @smathermather ❤️ Also one of the toughest points to assess, where there are no real remediation strategies possible. Either (1) get swept along with the crowd of hiding from it (2) view it as acceptable risk and a ethical negative, weighted up against the ethical positives of the work we do (3) refuse to engage due to the concerns. Tough call that we need to discuss more and work out a way forward for! Open to any input or suggestions from people! Also, its been noted that the remediation strategies on the environmental front are a bit weak, which I agree. Orgs could possibly do some back of the envelope calcs for how much usage there might be from our own team, approximate the kWh usage, then donate this amount to effective charities and orgs in this space? This is obviously not an acceptable strategy for the whole world to engage in, but let's be real: most orgs don't care aren't going to saturate the funding and effectiveness of these charities. I would promote https://www.effectiveenvironmentalism.org/climate-charities Again, far from perfect, but it would go some way to acknowledging the problem and attempting to solve it in a roundabout way. Note I'm commenting entirely on my own behalf, and don't represent the views of HOT. I haven't sought approval to see if remediative donations is an option. |
|
|
||
| One excellent, well-tested PR is worth more than ten AI-generated patches that each require maintainer effort to evaluate. Quality over quantity. Always. | ||
|
|
||
| ### Prefer Existing Libraries |
There was a problem hiding this comment.
This section is mentioned above too. It could possibly be removed, or perhaps its worth reiterating an important point
| - [ ] Error handling does not leak sensitive information | ||
| - [ ] No unnecessary permissions or access scopes | ||
| - [ ] SQL queries are parameterised (no string concatenation) | ||
| - [ ] File paths are sanitised against traversal attacks |
There was a problem hiding this comment.
Probably worth clarifying / linking somewhere for how to best do this
|
|
||
| This file is read by AI coding agents (Copilot, Claude Code, Cursor, etc.) when they work on your codebase. It tells the AI what your standards are, what's off-limits, and how to behave. Think of it as onboarding instructions - but for machines. | ||
|
|
||
| **What to include:** |
There was a problem hiding this comment.
Mention:
- MADR, link to section below
- Tech decisions or paths already explored and discounted - do not attempt these approaches
| AI tools must not be used to fix issues labelled `good first issue`. | ||
| These exist for human learning. | ||
|
|
||
| For full policy details, see: https://docs.hotosm.org/ai-assisted-coding |
There was a problem hiding this comment.
Change to a relative link
| - **Code quality**: SonarQube Cloud is free for open source projects to use, assisting code quality and security compliance. | ||
| - **Dependency checking**: OWASP [DependencyCheck](https://github.com/dependency-check/DependencyCheck) or [OSV Scanner](https://github.com/google/osv-scanner) can be used to ensure dependencies are updated to avoid latest security vulnerabilities. It's also recommended to use [Renovate bot](https://github.com/renovatebot/renovate) to regularly update dependencies. | ||
| - **Secrets scanning**: [GitLeaks](https://github.com/gitleaks/gitleaks) can be integrated as a pre-commit hooks or CI action to prevent accidental commit of org secrets. | ||
| - **Licensing and copyright**: [ScanCode Toolkit](https://github.com/aboutcode-org/scancode-toolkit) can be used to scan for copyright breaches in your code and non-compliance with license requirements. |
There was a problem hiding this comment.
Let's test this one out & see how it performs!
| **Key points for reviewers:** | ||
|
|
||
| - If a PR is marked AI-assisted, ask "why this approach?" - the answer tells you if the contributor understands the code. | ||
| - Watch for: verbose AI-style PR descriptions, generic variable names, unnecessary complexity, dependencies that seem unrelated. |
There was a problem hiding this comment.
Simply link to the section above instead of listing out the same signs of AI contribution
|
|
||
| ## Introduction | ||
|
|
||
| AI coding tools have moved from novelty to daily workflow in under two years. Andrej Karpathy coined the term "vibe coding" in early 2025 - describing developers who prompt AI, accept all suggestions, and barely read the output. By early 2026, he had already moved on, calling the practice outdated and advocating instead for "agentic engineering": careful, supervised AI-assisted development with full human oversight [1]. While early-2025 AI models were shown in some cases to have a net negative impact on developer productivity [21], models have improved significantly by early 2026, alongside growing efforts within open-source communities to establish appropriate governance and usage policies. |
There was a problem hiding this comment.
It might be too soon to have an authoritative source or paper on this, but once there is, it should be added!
Evidence is primarily anecdotal for now, speaking with devs in different orgs, observing the need for communities to catch up to the pace of model development & implement policies.
Sure there is some hype as well, but where there is smoke, there is generally fire too (even if its just smouldering embers for now...).
Watch this space for some actual hard stats
|
|
||
| ### 1.4 Labour and Exploitation | ||
|
|
||
| The refinement of AI models often relies on low-paid human labour for data labelling and content moderation, frequently in low- and middle-income economies. The training data itself was often collected without consent from its creators. Using these tools means participating in a supply chain with unresolved ethical questions about consent, compensation, and intellectual property [20]. |
There was a problem hiding this comment.
Would be appreciated if someone has the time to research this one a little deeper. Its hard to not be implicated here, so we need to ensure the risks aren't too great.
Despite that, its not the highest concern on the list for me personally. There are so many industries and practices globally that have a terrible human rights record. I would argue that long hours curating training data is low on the list of moral injustices out there (we need to put this in perspective of potential good that is derived from the tools that we work on). But again, this hunch needs to be proven by hard data before I can be substantiated fully.
| **Mitigation approaches:** | ||
|
|
||
| - Produce open-source software that partners can adopt freely. | ||
| - Advocate for and invest in open-source models that can run locally. |
There was a problem hiding this comment.
Related to the open models guidance page to be completed.
But we should also provide guidance on how to set up and use these tools in any easy way.
If there is an obvious usability gap (ideally identified through discussion with less tech literate community members - those dabbling with code solutions, who didn't work as software devs previously), we should definitely try to fill them!
I considered a wrapper of sorts to simply run Ollama. But honestly Ollama is pretty simple as it is, as attested to by @emi420. As mentioned, we should seek to identify pain points, and help in the best way we can
|
|
||
| AI tools are demonstrably helpful when assisting someone who already understands the codebase and the broader technical landscape, but they are far less reliable as a substitute for that understanding. | ||
|
|
||
| **Guidance on appropriate AI use:** |
There was a problem hiding this comment.
Remove this a perhaps defer to the section in doc 2 that has more detail
|
Some of this starts to get addressed above, but as I sent this via side channel and Sam suggested in issue is fine: This is an opus. I appreciate both the thoroughness regarding the problem space, specific challenges, and possible remediation(s) but also the state of the art for responses across projects that have addressed LLM contributions explicitly in their covenants. Overall, the biggest challenge I see is the remediation question: specifically the challenge of copyright (legal challenge); for labor violations that underpin or are related to those copyright challenges (ethical challenge); for jurisdictional challenges associated with concepts of fair use (possible legal challenges outside US legal frameworks); the existence of untainted, truly open models with known corpus (legal, ethical, and digital sovereignty challenge); and the lack of any clear accounting / signal for decision making on the above with regard to environmental impacts. These documents serve as a great framing and direction for use of transformer models that are built with consent, documentation of corpus, known licensing and labor practices, as well as resource use. IMO, any substantive ethical use of LLMs in dev work requires a list and possibly the development of such models, and constraint to allowed models with known provenance. |
|
Getting there bit by bit! As promised, I did some 'back-of-the-envelope' calcs to determine a reasonable emissions offset donation for our LLM usage. It uses many assumptions and fudge factors, but overall suggests that energy usage at best could be reasonably low, and at worst (as with large models, regular heavy refactoring) is still manageable for now, although definitely not sustainable into the future: The overall summary is that I think HOT should donate ~$300 to effective climate policy and advocacy charities (again, this is my opinion and I have not run it by anyone else yet...). The last and most difficult thing to address is the legal / ethical concerns raised by @smathermather about copyright infringement and lack of truly 'open training data' models. We need to decide if:
I'll leave this question in the open for a bit, probably until next week, to allow anyone to comment - would really love some additional feedback here, as I'm a single fallible human being that has many blind spots and is certainly prone to errors in judgement 😄 Comments from @LeenDhondt below: @spwoodcock , option 2 is for me not an option. We need to learn how to navigate this new disruption in our space, not avoid it |
|
Thanks @LeenDhondt - agree with need to navigate & not avoid 👍 To put the energy issue in context, I added this: The energy usage for our team is a drop in the ocean compared to travel emissions. I still think donations would be great, considering the distributed team and need to meet, but not as initially thought for LLM usage. |
|
Also, to comment on the copyright issue. To summarise, US courts + Copyright Office take the position that purely AI-generated outputs are not copyrightable, unless they have an element of human authorship (arrangement, substantive edits, creative transformation, etc). For software we care about In Cory Doctorow’s blog post (from FSF), he argues that model training is not infringement. Training involves copying for analysis and extracting statistical patterns, and copyright law has historically permitted analysis of copyrighted works. If we expand copyright to prohibit training, it would mostly benefit large rights-holders more than individual creators. Cory is mostly discussing artistic work here, but for software it still applies when copyright code was used for training. The first question remains legally unsettled, so this framework can't really comment. For the second point, we can meaningful address it through the suggested mitigation methods: keep a human in the loop, require contributor disclosure, small PRs, and try our best to detect AI generated content. All AI generated content should be treated as untrusted and not committed blindly. I would recommend reading the linked blog - it has some nice points. Also articulated in this video shared by Shoaib (for those that prefer):
|
|
Sharing what I’m experimenting on, in case you want to also experiment yourself. Taking account the recent events of mega-giant AI companies discussing if their products will be used for autonomous killing systems and mass surveillance or not, but also in line with reducing dependency from paid and closed-source software, I’ve started to do more experiments with open and local models and tools. Currently I’m testing OpenCode (which is similar to Claude Code) connected to a local Ollama instance serving the new qwen3.5 model. I’m also testing a new strategy, maybe some of you are already doing this. Instead of “chatting” with the model directly, I do the following:
The results looks very promising with Qwen3.5. In theory this model offers better performance than Sonnet 4.5 (from Anthropic) but it’s open (open weights) and it runs locally. Also, following this methodology, while it takes some time and in some cases it’s easier to just write the code, it provides more control and better quality. I’ll share about this in a doc when I have the time. Note: running qwen3.5:27b on my system (chip: Apple M2 Max, memory: 32gb) feels quite slow but it works, qwen3:8b works really fast. I have to test new versions of qwen3.5 that are available in Ollama starting today. |
I actually know very little how this analysis applies outside the US, but it is important to note that this analysis applies specifically to copyright in the US. Doctorow is partially right: under US law, training is (likely) not infringement. But fair use doesn't apply outside the US. I would be interested if anyone has an inkling of how e.g. EU infringement questions are likely to play out.
Yes I almost highlighted the reverse centaur in reference to this portion of the docs, though I think this accountability is important in the context of contributors, as with a FOSS project it's the only safe default. But for an org / corporate body, reverse centaur / responsibility laundering of LLMs is a concern, especially if / when LLM use is required or expected.
Very interesting. I'm looking forward to the open models list being populated. |
Fixes #18
The problem
This PR
Please review, comment, and contradict whatever is written here as needed.
This is supposed to be a collaborative learning exercise to work on these difficult challenges together.
Disclaimer: Yes the documents were initially synthesised from linked references using Claude Opus 4.6 - text summarisation is where LLMs shine after all... the content has then been reviewed and edited by me, to give us a starting point. I added in a few additional perspectives from our ongoing calls with partner organisations too.
Notes
Based on documents:
https://docs.google.com/document/d/1F9C1aaE2CW9JmEmJlOCkuc9Lr_M-YybXXVyHlqDe_GY
https://docs.google.com/document/d/1M85SirgyyQrS33r4l4ta6JDWJg9OgO0BIVKBSoogVZE
https://docs.google.com/document/d/1uMT9EMd50NUCwRj5CTJg2ShQRWbxcI-U3g5oFFS6oMA