The world's most powerful AI programmer: With GPT-4o blessing, it only takes 84 seconds to complete the request-AI-php.cn

Everyone is on the waitlist.

Large models are rapidly advancing on the road of "replacing human programmers".

In March this year, artificial intelligence software engineer Devin detonated the AI community. The product is powered by OpenAI's GPT-4 basic large language model (LLM) and can independently write and edit after receiving natural language text instructions. code.

But in the field of generative AI, rapid development is the main theme, and now the technology is iterating again.

This week, a Y Combinator-backed startup called Cosine announced the launch of its own new autonomous AI engineer, Genie. The company said Genie easily outperformed Devin, scoring 30% on the third-party benchmark SWE-Bench, while Devin scored just 13.8%.

The new tool even surpasses Amazon’s Q and Factory’s Code Droid by 19% and is now the best performing AI programmer in the world. W Genie's performance on the SWE-Bench benchmark, and comparison with other AI code models.

The worlds most powerful AI programmer: With GPT-4o blessing, it only takes 84 seconds to complete the request

^{"This model is much more than a benchmark: it was trained from the ground up with the goal of thinking and acting like a human SWE (Software Engineer)," Co-Founder and CEO of Cosine Alistair Pullen said.}

Genie who can fix bugs and write code

As an advanced AI software engineering model, Genie can autonomously handle various coding tasks according to the instructions of human engineers, including bug fixes and functions Build, code refactoring, code testing, etc.

Genie can run completely autonomously or collaborate with users to complete tasks.

It supports multiple programming languages, as shown in the technical report, including JavaScript, Python, TypeScript, TSX, Java, C#, C++, C, Rust, Scala, Kotlin, Swift, Golang, PHP, Ruby.

Cosine claims that Genie can simulate the cognitive processes of human engineers. "Let it observe how human engineers work and imitate the process." Alistair Pullen said.

Security issues have always been a concern for everyone. The code generated by Genie is stored in the user's GitHub repository, so Cosine will not retain a copy of the code, thus avoiding the security risks that come with it.

In addition, Cosine’s software platform has integrated Slack and system notifications, which is like an AI colleague, reminding users of status or flagging issues.

Alistair Pullen demonstrates how to use Genie to solve real-world problems. The target is an issue on GitHub. We only need to drop the link directly into it, and AI will automatically analyze the problem and start thinking about what files are needed to solve the problem until the requirements are met.

Then, Genie will start trying to break down the problem into many solution steps, and then generate code.

The worlds most powerful AI programmer: With GPT-4o blessing, it only takes 84 seconds to complete the request The next step is to run the code. If there is a problem with the generated code, it will automatically find the problem, analyze it, modify it, and then try to run it again.

The worlds most powerful AI programmer: With GPT-4o blessing, it only takes 84 seconds to complete the request Final output results: two files, 17 tests, only 84 seconds.

I don’t know how many times faster than human programmers.

Long context is powered by OpenAI models

Unlike many AI models that rely on base models supplemented by a handful of tools, Genie is developed through a proprietary process.

As far as models go, Genie is built on a (currently) non-universal variant of GPT-4o, which OpenAI allows Cosine to train as part of the Experimental Access Program.

We learned from the technical report that when researchers started building Genie, they could only fine-tune a relatively short context window model in the range of 16-32k.

In order to solve this problem, the team conducted a lot of early exploration of these models and trained them on a large data set of more than 100 million tokens. Although it was found that the architecture has certain advantages, it still faces A limit on the amount of information a model can process in a given amount of time.

After trying various compression/chunking methods, the team decided that the only solution was to use a larger context model, even though there was no model available at the time.

Fortunately, not long after, OpenAI models that ensured the training of long context appeared.

Cosine said in its blog post that they spent nearly a year organizing the data set. In the most recent training run, Genie was trained on billions of token data, and the selected data included The programming language that users are currently most concerned about. The following is the proportion of data in different programming languages in the process of training Genie:

The following is the proportion of data for different functions such as bug repair and reconstruction:

In terms of price, according to Pullen, Genie Pricing will initially be divided into two tiers:

Entry-level option priced around $20. This level will have some functions and usage restrictions, suitable for individuals and small teams;
Enterprise-level options provide extended functions and almost unlimited use, just like having an AI colleague who is proficient in coding. But pricing at this tier will be higher.

The launch of Genie has profound implications for software development teams, especially those looking to increase productivity and reduce the time spent on daily tasks. With its ability to autonomously handle complex programming challenges, Genie may change how engineering resources are allocated, allowing teams to focus on more strategic initiatives.

Pullen said that having engineering resources no longer being a limitation was a huge motivator for him, especially since starting the company. He believes that the value of an AI colleague who can quickly enter unknown code bases and solve unseen problems is obvious and has a huge impact on the world.

In the future, the company intends to expand its model portfolio to include smaller models for simple tasks and larger models capable of handling more complex challenges. Additionally, Cosine plans to expand its work into the open source community.

Genie is now available to some users, but wider access is not yet fully available.

Application address: https://cosine.sh/register

Founding team: only five people

Proposed Genie startup Cosine by Pullen, Sam Stenner and Yang Li Founded in 2022, its mission is to push the boundaries of AI by applying human reasoning to solve complex problems. Clearly, their efforts begin with software engineering.

Among them, Yang Li is a Chinese who graduated from Oxford University with a master's degree and was selected to the Forbes 30 Under 30 European list in 2021.

Cosine has raised $2.5 million in seed funding from Uphonest and SOMA Capital, with Lakestar, Focal and others also participating.

The team may be small, but Cosine has already made significant progress in the field of AI, and Genie is just the beginning.

"We firmly believe that we can build human-level reasoning capabilities for any job and industry," Pullen said in the announcement article.「软件工程只是最直观的起点，我们很快将会展示出我们正在研究的其他一切。」

^{参考内容：}

^{https://venturebeat.com/ai/4-considerations-to-help-organizations-implement-an-ai-code-of-conducts/}

^{https://cosine.sh/blog/genie-technical-report}

^{https://cosine.sh/blog/state-of-the-art}

The above is the detailed content of The world's most powerful AI programmer: With GPT-4o blessing, it only takes 84 seconds to complete the request. For more information, please follow other related articles on the PHP Chinese website!