San Francisco-based startup, Cognition AI, is trying to completely rehaul the software engineering landscape through its new AI assistant, Devin.
The AI assistant can plan and execute complex engineering tasks, learning from its experiences and rectifying mistakes along the way. Equipped with essential developer tools like a shell, code editor, and browser, Devin operates within a sandboxed compute environment, mirroring the setup of a human developer.
Devin stands out due to its ability to actively collaborate with users during software development, Cognition AI said in a blog post. This includes providing real-time progress updates, accepting feedback, and working together to make design choices. Overall, Devin acts as a seamless partner in the software development process, the company claimed.
Devin’s functionalities are diverse. It can learn unfamiliar technologies, build and deploy apps end-to-end, autonomously find and fix code-based bugs, train and fine-tune its AI models, address bugs and feature requests in open-source repositories, and contribute to mature production repositories. Its internet scouting abilities allow it to access educational resources quickly, enabling it to address complex tasks efficiently.
Notably, Devin’s capabilities extend to real-world tasks, as it successfully completed an assignment on Upwork. The assignment entailed making inferences based on computer vision technology to assess a damaged road.
In terms of performance, Devin has undergone evaluation on the SWE-bench benchmark, where it excels in resolving real-world GitHub issues. With an end-to-end resolution rate of 13.86%, Devin surpasses previous benchmarks of 1.96% by a significant margin. Even when given exact files to edit, previous models could only resolve 4.80% of issues, Cognition AI said in the blog post.
Scott Wu, the founder and CEO of Cognition, spoke to Bloomberg and emphasized the complexity of teaching AI to be a programmer. He highlighted the intricate decision-making and forward-thinking abilities required. Devin’s capability to handle multiple steps of a software engineering project while maintaining focus underscores its advanced reasoning and planning abilities.
Significant consequences for AI providers and users
Despite the excitement surrounding Devin’s capabilities, there are voices of caution within the industry. Yariv Adan, Senior Director at Google, noted on LinkedIn that recent developments in software are “super interesting” and will have significant consequences for both providers and users. This sentiment reflects the broader implications of AI-driven advancements in software development.
On the other hand, Alex Atallah, co-founder and former CTO of OpenSea, expressed his enthusiasm for Devin’s capabilities, describing it as the first AI agent that feels like a real, useful person on the other end in a post he wrote on X. He praised Devin’s ability to provide status updates and offer visibility into its actions, highlighting the unique experience it provides to users.
“Devin is unique and an attempt to structurally solve problems and challenges faced in the software development cycle. It will expedite the time to market and, at the same time, help develop bug-free alpha versions, leading to stable applications in a short period of time,” said Faisal Kawoosa, chief analyst and founder at Techarc.
Evolving role of software engineers
The emergence of Devin signals a shift towards prompt-to-action engineering, potentially impacting the roles of conventional software engineers. While this may lead to the removal of specific lower-level engineering jobs, it also signifies the evolution of the AI industry.
As AI-driven technologies continue to advance, the role of AI workers like Devin will become increasingly prominent in software development.
Currently, Devin remains non-public, with access limited to select customers as Cognition AI continues to refine its technology. However, the company plans to broaden access in the future, with a vision extending beyond coding to develop AI agents for various disciplines, as per a report by VentureBeat.