Devin, the world’s first AI programmer intelligence developed by Cognition AI, an entrepreneurial team with 10 IOI gold medals, made the technology circle restless as soon as it was released.
In the demonstration, Devin can almost independently complete many tasks that require ordinary programmers to spend a lot of time to complete, and his performance is not inferior to ordinary programmers at all.
However, where is the boundary of product capabilities? There is a gap between actual experience and demonstration. It also depends on the effect after actual testing.
This guy from Stanford contacted the team as soon as Devin was released and gained the qualification to experience it first-hand.
He asked Devin to help it do several projects of varying difficulty, recorded a video, and wrote down his usage on Twitter Feel.
The next task is for Devin to make a website that allows ordinary users to play chess directly with large models.
Every time the user makes a move, the system will convert it into a prompt word and pass it to GPT- 4. GPT-4 will respond, and this response will be converted into specific steps to play chess and displayed on the chessboard.
According to my request, the system needs to be composed of quite a few components.
He is personally most concerned about whether Devin can do the following during the development process of this system:
What I didn’t expect was that Devin not only asked me to provide the API key, but also properly protected it during the trial process.
#However, Devin’s current feedback speed is still quite slow. I speculate that it is because there are far more agent prompts in the background than what is visible. Much more.
It took about 19 minutes from the time Xiaoge initiated the request to ask for the API key.
My guess is that if the lag is caused by them running a lot of prompts in the background, then the lag should speed up over time.
Because they can later access dedicated GPUs or work with Claude or OpenAI to lower latency (presumably GPT-4 or Claude Opus).
Devin first made a plan.
In the upper right corner, the user can switch the "Follow" state, so that the user can automatically move the screen to the #Devincurrently activated tab.
The little brother did not turn on the following state because he wanted to observe changes in various positions at any time.
The planner will keep updated with the current task at any time.
Shell looks no different from an ordinary Shell, but it’s really fun to use!
Devin will open multiple shells during the work process. At the bottom of the shell, the user can drag the blue slider to view the commands written by Devin.
The picture below is the non-rendered content of the chessboard when trying to debug it.
At the same time, the little brother asked it to perform another data analysis task.
My brother asked Devin to "create a map of Antarctica's sea water temperatures over the past fifty years."
Regarding this request, I think there are two aspects that may be challenging:
Devin reads readme files smartly like a good programmer and also performs some basic EDA to understand the data structure.
The data is actually an ascii file, which I find a bit strange.
When I click on one of the steps in the dialog "Debug Python Script...", it will open the code library section related to that step, so you can track a specific What happens at a point in time.
What I am more worried about is that if it is not necessary to ask for the API key, Devin seems to be unable to stop coding.
So he tried to see if he could change the request he made before or specify something else, interrupting Devin's coding process.
Because most users may change their minds or have something new they want to add to the system when coding, it is necessary to be able to handle this situation of.
This is a screenshot during the encoding process:
The browser interface is presented as follows:
Then the little brother mentioned another requirement for the data visualization task, asking the system to set high temperature to blue and low temperature to red.
In order not to interrupt the coding process, it seems that Devin started another working thread to record the temporary request of the brother.
Finally, Devin deployed the APP to Netlify, and an application was already online.
Link to the webpage: https://t.co/wTbtz2waDn
Just like human writing The same as the program, the first version must have bugs.
Because what I requested was the temperature record of Antarctica, it seemed that it was somewhat difficult for Devin to understand.
So I changed the requested location to North America.
The little brother did not give the results of Devin’s modification of the bug, just a preliminary summary Experience using the first website developed with Devin.
Let’s talk about the advantages first:
Devin has done a good job in productization, he has given The user experience is a complete product rather than just a simple dialog box.
AI is the most critical part of the system, but the product structure that supports the AI function is the highlight of Devin.
Devin can complete automatic deployment, API key protection, modify and add requirements at any time, and other very good functions.
The degree of completion of the product is very high, far exceeding the average demo.
Let’s talk about the shortcomings:
Devin’s reaction is still very slow. Of course, the little brother also said that because he used It is the 1M Starlink that connects to the Internet, so the slow response is most likely due to himself.
The second is that users are not allowed to directly edit the code themselves, and there is no way to complete it collaboratively.
Of course, the initial chess-playing application stumped Devin, and the deployment was not completed in the end. And the data visualization task seems to have some bugs.
Finally, I used Devin to make a chrom plug-in that can help users convert Github repo into Claude prompt.
##Plug-in download address: https://t.co/k3l8JTWK7Z
Netizen evaluationNetizens still felt a little disappointed after reading this actual test. After all, a junior programmer can do this task, but the results of Devin’s visualization project Only one web page with bugs was created.
It seems that Devin is essentially just a large model that can be accessed online. Now it is still difficult for him to solve practical problems.
The above is the detailed content of Devin's first-hand experience: The degree of completion is very high. I can't stop coding once I start coding, but I'm still far from being able to replace programmers.. For more information, please follow other related articles on the PHP Chinese website!