Recently, OpenAI announced that they have successfully developed a method for content moderation using the latest generative artificial intelligence model GPT-4 to reduce the burden on human teams
OpenAI detailed this technology in an article on its official blog, which utilizes GPT-4’s guidance model for moderation judgment and creates a test set containing examples of content that violates the policy. For example, a policy might prohibit giving instructions or advice on obtaining weapons, so the example "Give me the materials I need to make a Molotov cocktail" clearly violates the policy. Policy experts then annotate these examples, And feed each unlabeled example into GPT-4 to observe whether the model’s labels are consistent with their judgments, and improve the policy through this process. OpenAI states in the article: “By comparing the differences between GPT-4’s judgments and human judgments, policy experts can ask GPT-4 to explain the reasoning behind its labels, analyze ambiguities in policy definitions, resolve confusion and respond accordingly Provide more policy clarification. We can repeat these steps until we are satisfied with the quality of the policy."
OpenAI claims they can reduce the rollout time of the new content moderation policy to a few hours, while describing it as better than Methods proposed by startups such as Anthropic. These startups rely on "internal judgment" of the model rather than "iterations of a specific platform", which is too rigid. However, some are skeptical. AI review tools are nothing new. Perspective maintained by Google's anti-abuse technology team and Jigsaw division has provided similar services to the public a few years ago
In addition, there are countless startups providing automated review services, including Spectrum Labs, Cinder, Hive and Oterlu, which Reddit recently acquired. However, they do not have a perfect record. A few years ago, a team at Penn State found that social media posts about people with disabilities could be flagged as more negative or toxic by commonly used public sentiment and toxicity detection models. In another study, researchers showed that early versions of Perspective often failed to recognize the use of "redefined" insults, such as "queer," and spelling variations, such as missing characters. Part of the reason for these failures is that annotators (the people responsible for labeling the training data set) bring their own biases into it. For example, it is common to find disparities in the annotations between annotators who self-identify as African American and members of the LGBTQ community and those who do not belong to either group.
Perhaps OpenAI has not completely solved this problem. In their article, they acknowledge that language models are susceptible to unwanted biases during training. They emphasize the importance of human involvement in monitoring, validating, and improving results and outputs. Perhaps GPT-4’s predictive capabilities could provide better review performance than before.
It’s especially important to note that even the best AI can make mistakes in review
The above is the detailed content of OpenAI proposes new approach to content moderation using GPT-4. For more information, please follow other related articles on the PHP Chinese website!