Table of Contents
Privacy
Bias, toxicity, misinformation
Intellectual Property (IP)
Conclusion
Home Technology peripherals AI 'Image generation technology' wandering on the edge of the law: This paper teaches you to avoid becoming a 'defendant'

'Image generation technology' wandering on the edge of the law: This paper teaches you to avoid becoming a 'defendant'

Apr 11, 2023 pm 02:55 PM
ai technology

In recent years, AI-generated content (AIGC) has attracted much attention. Its content covers images, text, audio, video, etc. However, AIGC has become a double-edged sword and has been criticized for its irresponsible use. Controversial.

Once the image generation technology is not used properly, you may become a "defendant".

Recently, researchers from Sony AI and Wisdom Source have discussed the current issues of AIGC from many aspects and how to make AI-generated content more responsible.

Image generation technology wandering on the edge of the law: This paper teaches you to avoid becoming a defendant

Paper link: https://arxiv.org/pdf/2303.01325.pdf

This article focuses on three main issues that may hinder the healthy development of AIGC, including: (1)Privacy; (2)Bias, toxicity, misinformation; (3) Intellectual property (IP) risks.

Image generation technology wandering on the edge of the law: This paper teaches you to avoid becoming a defendant

By documenting the known and potential risks, as well as any possible AIGC abuse scenarios, this article aims to raise awareness of Concerns about the potential risks and abuse of AIGC and provide directions to address these risks to promote the development of AIGC in a more ethical and safe direction for the benefit of society.

Privacy

As we all know, large-scale basic models have a series of privacy leak problems.

Previous research has shown that intruders can generate sequences from trained GPT-2 models and identify those memorized sequences from the training set, [Kandpal et al., 2022] attribute the success of these privacy intrusions to the presence of duplicate data in the training set, and research has demonstrated that sequences that appear multiple times are more likely to be generated than sequences that appear only once.

Since the AIGC model is trained on large-scale web scraped data, the issues of overfitting and privacy leakage become particularly important.

For example, the Stable Diffusion model memorizes repeated images in the training data [Rombach et al., 2022c]. [Somepalli et al., 2022] demonstrated that a Stable Diffusion model blatantly copies images from its training data and generates simple combinations of foreground and background objects in the training data set.

Image generation technology wandering on the edge of the law: This paper teaches you to avoid becoming a defendant

Additionally, the model shows the ability to reconstruct memory, resulting in objects that are semantically identical to the original but pixel-wise objects of different forms. The existence of such images raises concerns about data memory and ownership.

Similarly, recent research shows that Google’s Imagen system also has problems leaking photos of real people and copyrighted images. In Matthew Butterick's recent lawsuit [Butterick, 2023], he pointed out that because all visual information in the system comes from copyrighted training images, the generated images, regardless of their appearance, must be the work of those training images. .

Similarly, DALL·E 2 suffered from a similar problem: it would sometimes copy images from its training data instead of creating new ones.

OpenAI found that this phenomenon occurred because the image was copied multiple times in the data set. Similarly, ChatGPT itself admitted that it had the risk of privacy leakage.

Image generation technology wandering on the edge of the law: This paper teaches you to avoid becoming a defendant

#In order to alleviate the privacy leakage problem of large models, many companies and researchers have made a lot of efforts in privacy defense. At the industrial level, Stability AI has recognized the limitations of Stable Diffusion.

To this end, they provide a website (https://rom1504.github.io/clip-retrieval/) to identify images remembered by Stable Diffusion.

In addition, art company Spawning AI has created a website called "Have I Been Trained" (https://haveibeentrained.com) to help users identify their photos or works Whether it is used for artificial intelligence training.

Image generation technology wandering on the edge of the law: This paper teaches you to avoid becoming a defendant

OpenAI attempts to solve privacy issues by reducing data duplication.

In addition, companies such as Microsoft and Amazon have banned employees from sharing sensitive data with ChatGPT to prevent employees from leaking confidentiality, because this information can be used for training future versions of ChatGPT.

At the academic level, Somepalli et al. studied an image retrieval framework to identify content duplication, and Dockhorn et al. also proposed a differential privacy diffusion model to ensure the privacy of the generative model.

Bias, toxicity, misinformation

The training data for the AIGC model comes from the real world. However, these data may inadvertently reinforce harmful stereotypes and exclude or marginalize certain people. groups and contain toxic data sources, which may incite hatred or violence and offend individuals [Weidinger et al., 2021].

Models trained or fine-tuned on these problematic datasets may inherit harmful stereotypes, social biases and toxicity, or even generate misinformation that leads to unfair discrimination and harm to certain social groups.

For example, the Stable Diffusion v1 model is primarily trained on the LAION-2B dataset, which only contains images with English descriptions. Therefore, the model is biased toward white people and Western cultures, and cues from other languages ​​may not be fully represented.

While subsequent versions of the Stable Diffusion model were fine-tuned on filtered versions of the LAION dataset, issues of bias persisted. Likewise, DALLA·E, DALLA·E 2 and Imagen also exhibit social bias and negative stereotypes of minority groups.

Additionally, Imagen has been shown to have social and cultural biases even when generating images of non-humans. Due to these issues, Google decided not to make Imagen available to the public.

In order to illustrate the inherent bias of the AIGC model, we tested Stable Diffusion v2.1. The images generated using the prompt "Three engineers running on the grassland" were all male. And none belong to a neglected minority group, which illustrates the lack of diversity in the resulting images.

Image generation technology wandering on the edge of the law: This paper teaches you to avoid becoming a defendant

In addition, the AIGC model may also produce incorrect information. For example, content generated by GPT and its derivatives may appear to be accurate and authoritative, but may contain completely false information.

Therefore, it may provide misleading information in some areas (such as schools, law, medicine, weather forecasts). For example, in the medical field, answers provided by ChatGPT about medical dosages may be inaccurate or incomplete, which could be life-threatening. In the field of transportation, if drivers follow the wrong traffic rules given by ChatGPT, it may lead to accidents or even death.

Many defensive measures have been taken against problematic data and models.

OpenAI fine-filters the original training data set and removes any violent or pornographic content in the DALLA·E 2 training data. However, filtering may introduce bias in the training data, These biases are then propagated to downstream models.

To solve this problem, OpenAI developed pre-training technology to mitigate bias caused by filters. In addition, in order to ensure that the AIGC model can reflect the current social situation in a timely manner, researchers must regularly update the data sets used by the model, which will help prevent the negative impact caused by information lag.

It is worth noting that although biases and stereotypes in source data can be reduced, they may still be spread or even exacerbated during the training and development of the AIGC model. Therefore, it is critical to assess the presence of bias, toxicity, and misinformation throughout the model training and development lifecycle, not just at the data source level.

Intellectual Property (IP)

With the rapid development and widespread application of AIGC, the copyright issue of AIGC has become particularly important.

In November 2022, Matthew Butterick filed a class action lawsuit against Microsoft subsidiary GitHub, accusing its product code generation service Copilot of infringing copyright laws. As with text-to-image models, some generative models have been accused of infringing on artists’ original rights to their work.

[Somepalli et al., 2022] shows that the images generated by Stable Diffusion may be copied from the training data. Although Stable Diffusion denies any ownership rights to the generated images and allows users to freely use them as long as the image content is legal and harmless, this freedom still triggers fierce disputes over copyright.

Generative models like Stable Diffusion are trained on large-scale images from the Internet without authorization from the intellectual property holder, and as such, some believe this violated their rights.

To address intellectual property issues, many AIGC companies have taken action.

For example, Midjourney has included a DMCA takedown policy in its terms of service, allowing artists to request that their work be removed from the dataset if they suspect copyright infringement.

Similarly, Stability AI plans to offer artists the option of excluding their work from the training set for future versions of Stable Diffusion. Additionally, text watermarks [He et al., 2022a; He et al., 2022b] can also be used to identify whether these AIGC tools use samples from other sources without permission.

For example, Stable Diffusion produced images with a Getty Images watermark [Vincent, 2023].

OpenAI is developing watermarking technology to identify text generated by GPT models, a tool that educators can use to detect plagiarism in assignments. Google has also applied Parti watermarks to the images it publishes. In addition to watermarks, OpenAI recently released a classifier for distinguishing between AI-generated text and human-written text.

Conclusion

Although AIGC is still in its infancy, it is expanding rapidly and will remain active for the foreseeable future.

In order for users and companies to fully understand these risks and take appropriate measures to mitigate these threats, we summarize the current and potential risks in the AIGC model in this article.

If these potential risks cannot be fully understood and appropriate risk prevention measures and safety guarantees are adopted, the development of AIGC may face significant challenges and regulatory obstacles. Therefore, we need broader community participation to contribute to a responsible AIGC.

Finally, thank you SonyAI and BAAI!

The above is the detailed content of 'Image generation technology' wandering on the edge of the law: This paper teaches you to avoid becoming a 'defendant'. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Rexas Finance (RXS) can surpass Solana (Sol), Cardano (ADA), XRP and Dogecoin (Doge) in 2025 Rexas Finance (RXS) can surpass Solana (Sol), Cardano (ADA), XRP and Dogecoin (Doge) in 2025 Apr 21, 2025 pm 02:30 PM

In the volatile cryptocurrency market, investors are looking for alternatives that go beyond popular currencies. Although well-known cryptocurrencies such as Solana (SOL), Cardano (ADA), XRP and Dogecoin (DOGE) also face challenges such as market sentiment, regulatory uncertainty and scalability. However, a new emerging project, RexasFinance (RXS), is emerging. It does not rely on celebrity effects or hype, but focuses on combining real-world assets (RWA) with blockchain technology to provide investors with an innovative way to invest. This strategy makes it hoped to be one of the most successful projects of 2025. RexasFi

Top 10 cryptocurrency exchange platforms The world's largest digital currency exchange list Top 10 cryptocurrency exchange platforms The world's largest digital currency exchange list Apr 21, 2025 pm 07:15 PM

Exchanges play a vital role in today's cryptocurrency market. They are not only platforms for investors to trade, but also important sources of market liquidity and price discovery. The world's largest virtual currency exchanges rank among the top ten, and these exchanges are not only far ahead in trading volume, but also have their own advantages in user experience, security and innovative services. Exchanges that top the list usually have a large user base and extensive market influence, and their trading volume and asset types are often difficult to reach by other exchanges.

What are the top ten platforms in the currency exchange circle? What are the top ten platforms in the currency exchange circle? Apr 21, 2025 pm 12:21 PM

The top exchanges include: 1. Binance, the world's largest trading volume, supports 600 currencies, and the spot handling fee is 0.1%; 2. OKX, a balanced platform, supports 708 trading pairs, and the perpetual contract handling fee is 0.05%; 3. Gate.io, covers 2700 small currencies, and the spot handling fee is 0.1%-0.3%; 4. Coinbase, the US compliance benchmark, the spot handling fee is 0.5%; 5. Kraken, the top security, and regular reserve audit.

Web3 trading platform ranking_Web3 global exchanges top ten summary Web3 trading platform ranking_Web3 global exchanges top ten summary Apr 21, 2025 am 10:45 AM

Binance is the overlord of the global digital asset trading ecosystem, and its characteristics include: 1. The average daily trading volume exceeds $150 billion, supports 500 trading pairs, covering 98% of mainstream currencies; 2. The innovation matrix covers the derivatives market, Web3 layout and education system; 3. The technical advantages are millisecond matching engines, with peak processing volumes of 1.4 million transactions per second; 4. Compliance progress holds 15-country licenses and establishes compliant entities in Europe and the United States.

'Black Monday Sell' is a tough day for the cryptocurrency industry 'Black Monday Sell' is a tough day for the cryptocurrency industry Apr 21, 2025 pm 02:48 PM

The plunge in the cryptocurrency market has caused panic among investors, and Dogecoin (Doge) has become one of the hardest hit areas. Its price fell sharply, and the total value lock-in of decentralized finance (DeFi) (TVL) also saw a significant decline. The selling wave of "Black Monday" swept the cryptocurrency market, and Dogecoin was the first to be hit. Its DeFiTVL fell to 2023 levels, and the currency price fell 23.78% in the past month. Dogecoin's DeFiTVL fell to a low of $2.72 million, mainly due to a 26.37% decline in the SOSO value index. Other major DeFi platforms, such as the boring Dao and Thorchain, TVL also dropped by 24.04% and 20, respectively.

How to avoid losses after ETH upgrade How to avoid losses after ETH upgrade Apr 21, 2025 am 10:03 AM

After ETH upgrade, novices should adopt the following strategies to avoid losses: 1. Do their homework and understand the basic knowledge and upgrade content of ETH; 2. Control positions, test the waters in small amounts and diversify investment; 3. Make a trading plan, clarify goals and set stop loss points; 4. Profil rationally and avoid emotional decision-making; 5. Choose a formal and reliable trading platform; 6. Consider long-term holding to avoid the impact of short-term fluctuations.

Hashbeat App: The highest regulated crypto cloud mining platform in 2025 with free Bitcoin mining rewards and daily spending Hashbeat App: The highest regulated crypto cloud mining platform in 2025 with free Bitcoin mining rewards and daily spending Apr 21, 2025 pm 06:21 PM

The most worth investing in 2025: Cloud mining strategy without eyeing the market If you want to invest in cryptocurrencies in 2025 and don’t want to pay attention to market fluctuations all the time, then cloud mining may be your ideal choice. Cloud mining can easily generate Bitcoin and other digital currencies without expensive mining machines and complex settings. A number of new cloud mining platforms have emerged in 2025, making it easier than ever to get started. Whether it is a novice novices or investors who pursue passive income, the following 11 platforms are worth paying attention to. Hashbeat app: a regulated crypto cloud mining platform that provides free Bitcoin mining rewards, daily payments. If you want to invest in low-risk, high-security, stable returns in cryptocurrency in 2025, Hashbeat app

Top 10 Virtual Currency Trading Websites Ranking (Latest Ranking in 2025) Top 10 Virtual Currency Trading Websites Ranking (Latest Ranking in 2025) Apr 21, 2025 pm 12:18 PM

The recommendations of cryptocurrency trading platforms for different needs are as follows: 1. Newbies are given priority to Coinbase and Binance because of their simple and easy to use interface; 2. High-frequency traders should choose OKX and Gate.io to enjoy low latency and low fees; 3. Institutions and large-value traders recommend Kraken and Gemini because of their compliance and insurance protection; 4. Users who explore small currencies are suitable for KuCoin and Huobi because of their innovation zone and small currencies support.

See all articles