Skip to main content

Microsoft's new AI tool can tell you when your code sucks

AI
(Image credit: Shutterstock)

Microsoft researchers have developed an Artificial Intelligence (AI) solution that they believe can help programmers debug their applications faster and more accurately.

Called BugLab, the AI is based on a “hide and seek” game model, and works in a fashion similar to how the Generative Adversarial Networks (GAN) are created.

Detailing the research in a blog post, researchers Miltos Allamanis (Principal Researcher) and Marc Brockschmidt (Senior Principal Research Manager) explained how they created two networks and pitted them against each other, much like how hide and seek is played.

Competition

One network is designed to create bugs, both big and small, into existing code, while the other is created to find them. As the game goes on, and both “participants” get better at it, the AI comes to a point where it’s good enough to identify bugs hidden in actual code. 

The two models were trained jointly, without labeled data, in a self-supervised way, over “millions of code snippets”, the researchers explained. 

Even though the idea was to create a program that can identify arbitrarily complex bugs, these are still “outside the reach of modern AI methods”, the researchers claim. Instead, they focused on commonly appearing bugs, such as incorrect comparisons, incorrect Boolean operators, variable misuses, and similar bugs. 

False alarms standing in the way

Testing was done on Python, and after training the app, it was time to test it in real life. 

“To measure performance, we manually annotate a small dataset of bugs from packages in the Python Package Index with such bugs and show that models trained with our “hide-and-seek” method are up to 30% better compared to other alternatives, e.g., detectors trained with randomly inserted bugs,” the blog added. 

The duo described the results as “promising”, as about a quarter (26%) of bugs could be found and fixed automatically. What’s more, among the bugs detected were 19 previously unknown ones. 

Still, there were many false positives, leading the researchers to conclude that a lot more training is needed before such a model could be practically deployed.