The jury is still out on the long-term risks of trusting software development to generative AI LLM bots, argues this expert

Chatbots are taking the tech world and the rest of the world by storm. AI large language model (LLM) tools can write things in seconds that would take humans hours or days: everything from research papers to poems to press releases, and yes, to computer code in multiple programming languages.

But if your organization is thinking of joining that parade, make sure it does not take your organization by the wrong kind of storm…because it could.

Of course, that was met with another storm declaring that it is absurd to call for a worldwide pause on any technology that already exists.

An asset to development but…

It turns out that, even in its early iterations, generative AI is quite good at some things. Given the right prompt, or set of prompts, chatbots can respond with amazing substance in seconds.

And as long as the massive amount of data it relies on is accurate, what you get is accurate as well.

But, it turns out, the output is not always perfect. One reviewer concluded that if we ask ChatGPT to deliver a complete application, it will fail. What it will do is help someone who already knows how to code, to build specific routines and get specific tasks done.

Another commentary noted that the datasets that generative AI chatbots are trained can be filled with biases, and some of the answers they spit out could be questionable.

Taylor Armerding, Cybersecurity Advocate, Synopsys Software Integrity Group

Generative risks to note

Recently, code written by GitHub’s generative AI development tool Copilot (created in partnership with OpenAI and described as a descendent of GPT-3) did not catch an open source licensing conflict.

Ignoring licensing conflicts can be very costly. One of the most famous examples involves the case of Cisco when it failed to comply with requirements of GNU’s General Public License, under which its Linux routers and other open source software programs were distributed.

Other risks of overconfidence in generative AI for software development:

    • As every vendor of AI LLM tools has acknowledged, the chatbots are only as good as the dataset they have been trained on. And as has been shown with ChatGPT, they will declare falsehoods with the same level of confidence that they declare truth.
    • In short, they need adult supervision, as any human developer would. It is possible to produce biased results, to be in breach of license terms, or for a set of code recommended by the tools to have a security vulnerability.
    • An AI tool could recommend a code snippet to implement a certain common function, and for that snippet to become commonly used. If a vulnerability is subsequently discovered in that snippet, it will be a systemic risk across many organizations. While vulnerabilities are found in just about every human-written codebase, with AI code that is broadly used, the scale of impact is much, much higher.
    • That means software written by chatbots needs the same level of testing scrutiny that human-written code does: via a full suite of automated testing tools for static and dynamic analysis; software composition analysis to find open source vulnerabilities and licensing conflicts; and pen testing before production.
    • Finally, rigorous testing with AppSec tools would help organizations identify and mitigate compliance, security, and operational risks stemming from adoption of AI-assisted tools.

Of course, there is general agreement that AI LLMs are still in an embryonic stage. They will only become more capable, likely for both better and worse.

As such tools become good enough to require less supervision over time, the question of how that translates into reliable trust remains open.

In the meantime, do consider using generative AI chatbots only for what they are good at. And then remember that you still need to supervise them.