According to news from this site on August 8, OpenAI released its web crawler tool GPTBot yesterday. Officials claim that this GPTBot tool can use a transparent method to collect web page information on the basis of paying attention to copyright to train various AI models under OpenAI .
OpenAI stated that GPTBot uses a proprietary webpage UA to represent its crawler identity, and the complete UA string is (Mozilla / 5.0 AppleWebKit / 537.36 / KHTML, like Gecko; compatible; GPTBot / 1.0; https://openai.com/ gptbot), any website administrator is free to allow or prevent this crawler tool from collecting data.
OpenAI claims that if the website administrator does not want to be crawled to collect information, the administrator can completely prohibit GPTBot from crawling information in the robots.txt file of the website server, or he can Determine GPTBot to crawl the specified information on the website.
OpenAI has been criticized by the industry for being accused of privacy violations. Now it has launched the GPTBot crawler tool. This can be seen as a response to external criticism and helps the industry establish crawler tools for AI training. Relevant benchmarks. According to reports, OpenAI recently registered the GPT-5 trademark, and this GPTBot crawler tool is also expected to provide support for GPT-5 related model training
The external jump links (such as hyperlinks, QR code, password, etc.) are only used to provide more information and save screening time. The results are for reference only. Please note that all articles carry this advertising statement
The above is the detailed content of Famous: OpenAI releases web crawler tool GPTBot with 'identity mark'. For more information, please follow other related articles on the PHP Chinese website!