Table of Contents
Curiosity not only kills the cat!
Home Technology peripherals AI AI curiosity doesn't just kill the cat! MIT's new reinforcement learning algorithm, this time the agent is 'difficult and easy to take all'

AI curiosity doesn't just kill the cat! MIT's new reinforcement learning algorithm, this time the agent is 'difficult and easy to take all'

Apr 13, 2023 pm 04:19 PM
ai algorithm mit

Everyone has encountered an age-old problem.

It’s a Friday night and you’re trying to pick a restaurant to eat at, but you don’t have a reservation. Should you wait in line at your favorite restaurant that’s packed with people, or try a new restaurant in the hope of discovering some tastier surprises?

The latter does have the potential to lead to surprises, but this kind of curiosity-driven behavior comes with risks: the food at that new restaurant you try might be even tastier.

Curiosity is the driving force for AI to explore the world, and there are countless examples - autonomous navigation, robot decision-making, optimized detection results, etc.

In some cases, machines use "reinforcement learning" to accomplish a goal. In this process, the AI ​​agent repeatedly learns from good behaviors that are rewarded and bad behaviors that are punished.

AI curiosity doesnt just kill the cat! MITs new reinforcement learning algorithm, this time the agent is difficult and easy to take all

Just like the dilemma humans face when choosing a restaurant, these agents are also trying to balance the time it takes to discover better actions (exploration) and taking the past that results in high rewards time of action (utilization).

Curiosity that is too strong will distract the agent from making a favorable decision, while curiosity that is too weak means that the agent will never be able to discover a favorable decision.

In pursuit of making AI agents have "just the right amount" of curiosity, researchers from MIT's Computer Science and AI Laboratory (CSAIL) created an algorithm that overcomes the problem of AI being too "curious" and problems with being distracted by the task at hand.

The algorithm they developed automatically increases curiosity when needed and decreases it if the agent has enough supervision from the environment that it already knows what to do.

AI curiosity doesnt just kill the cat! MITs new reinforcement learning algorithm, this time the agent is difficult and easy to take all

Paper link: https://williamd4112.github.io/pubs/neurips22_eipo.pdf

After testing with more than 60 video games, this algorithm Able to succeed in exploration tasks of varying difficulty, whereas previous algorithms could only solve easy or hard difficulties individually. This means that AI agents can use less data to learn decision rules and maximize incentives.

"If you have a good grasp of the exploration-exploitation trade-off, you can learn the correct decision rules more quickly, and anything less requires a lot of data, which can mean the consequences of It's suboptimal medical treatment, the site's profits are down, and the robot doesn't learn to do the right thing."

Pulkit Agrawal, one of the study leaders, a professor at MIT and director of the Improbable AI Laboratory, said. ​

Curiosity not only kills the cat!

It seems difficult to explain the psychological basis of curiosity from a psychological perspective. We have not yet fully understood the underlying neurological principles of this challenge-seeking behavior.

With reinforcement learning, this process is emotionally "pruned", stripping the problem down to its most basic level, but the technical implementation is quite complex.

Essentially, an agent should only be curious when there is not enough supervision to try different things, and if there is supervision, it must adjust its curiosity and reduce its curiosity.

In the test game tasks, a large part is that the small agent runs around the environment looking for rewards and performs a long series of actions to achieve some goals. This seems to be a logical test of the researchers' algorithm. platform.

AI curiosity doesnt just kill the cat! MITs new reinforcement learning algorithm, this time the agent is difficult and easy to take all

In experiments with games such as "Mario Kart" and "Montezuma's Revenge", researchers divided the above games into two different categories:

One is an environment with sparse supervision, where the agent receives less guidance, which is a "difficult" exploration game; the other is an environment with more intensive supervision, which is a "simple" exploration game.

Suppose in "Mario Kart", just remove all rewards, you don't know when an enemy kills you. You don't get any rewards when you collect a coin or jump over a pipe. The agent is only told at the end how it performed. This is a sparsely supervised environment, which is a difficult task. In this kind of task, algorithms that stimulate curiosity perform very well.

And if the agent is in a densely supervised environment, that is, there are rewards for jumping pipes, collecting coins, and killing enemies, then the best performance is the algorithm with no curiosity at all, because it often gets As a reward, just follow the process and you will get a lot without additional exploration.

AI curiosity doesnt just kill the cat! MITs new reinforcement learning algorithm, this time the agent is difficult and easy to take all

If you use an algorithm that encourages curiosity, the learning speed will be very slow.

Because a curious agent may try to run fast in different ways, wander around, and visit every corner of the game. These things are fun, but they don’t help the agent succeed in the game and receive rewards.

As mentioned above, in reinforcement learning, algorithms that stimulate curiosity and inhibit curiosity are generally used to correspond to sparsely supervised (difficult) and supervised intensive (simple) tasks respectively, and cannot be mixed.

This time, the MIT team’s new algorithm always performed well, no matter what the environment.

Future work may involve returning to a quest that has delighted and troubled psychologists for years: an appropriate measure of curiosity—no one really knows the right way to mathematically define curiosity.

Zhang Weihong, a doctoral student at MIT CSAIL, said:

Tune the algorithm for the problem you are interested in by improving the exploration algorithm. We need curiosity to solve challenging problems, but on some problems curiosity can degrade performance. Our algorithm eliminates the balancing burden of adjusting exploration and exploitation.

For problems that previously took a week to solve, the new algorithm can obtain satisfactory results within a few hours.

He is co-author of a new paper on this work with Eric Chen, 22, a CSAIL master of engineering at MIT.

Deepak Pathak, a teacher at Carnegie Mellon University, said:

"Intrinsic reward mechanisms like curiosity are the basis for guiding agents to discover useful and diverse behaviors, but this is not should come at the expense of doing well at a given task. This is an important question in AI, and this paper provides a way to balance this trade-off. See how this approach scales from games to the real world It will be a very interesting thing on robot intelligence."

Alison Gopnik, Distinguished Professor of Psychology and Associate Professor of Philosophy at the University of California, Berkeley, pointed out that one of the biggest challenges in current AI and cognitive science is how to Balance "exploration and utilization", the former is the search for information, the latter is the search for rewards.

"This paper uses impressive new technology to automate this work, designing an agent that can systematically balance curiosity about the world and desire for rewards, making AI intelligent The body has taken an important step towards becoming as smart as real children," he said.

References:

https://techxplore.com/news/2022-11-bad-ai-curious.html

https://www.csail. mit.edu/news/ensuring-ai-works-right-dose-curiosity

The above is the detailed content of AI curiosity doesn't just kill the cat! MIT's new reinforcement learning algorithm, this time the agent is 'difficult and easy to take all'. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Repo: How To Revive Teammates
1 months ago By 尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

How to dynamically create an object through a string and call its methods in Python? How to dynamically create an object through a string and call its methods in Python? Apr 01, 2025 pm 11:18 PM

In Python, how to dynamically create an object through a string and call its methods? This is a common programming requirement, especially if it needs to be configured or run...

How to use Go or Rust to call Python scripts to achieve true parallel execution? How to use Go or Rust to call Python scripts to achieve true parallel execution? Apr 01, 2025 pm 11:39 PM

How to use Go or Rust to call Python scripts to achieve true parallel execution? Recently I've been using Python...

How to solve the problem of missing dynamic loading content when obtaining web page data? How to solve the problem of missing dynamic loading content when obtaining web page data? Apr 01, 2025 pm 11:24 PM

Problems and solutions encountered when using the requests library to crawl web page data. When using the requests library to obtain web page data, you sometimes encounter the...

How to recover Debian mail server How to recover Debian mail server Apr 02, 2025 am 07:33 AM

Detailed Steps for Restoring Debian Mail Server This article will guide you on how to restore Debian Mail Server. Before you begin, it is important to remember the importance of data backup. Recovery Steps: Backup Data: Be sure to back up all important email data and configuration files before performing any recovery operations. This will ensure that you have a fallback version when problems occur during the recovery process. Check log files: Check mail server log files (such as /var/log/mail.log) for errors or exceptions. Log files often provide valuable clues about the cause of the problem. Stop service: Stop the mail service to prevent further data corruption. Use the following command: su

How to operate Zookeeper performance tuning on Debian How to operate Zookeeper performance tuning on Debian Apr 02, 2025 am 07:42 AM

This article describes how to optimize ZooKeeper performance on Debian systems. We will provide advice on hardware, operating system, ZooKeeper configuration and monitoring. 1. Optimize storage media upgrade at the system level: Replacing traditional mechanical hard drives with SSD solid-state drives will significantly improve I/O performance and reduce access latency. Disable swap partitioning: By adjusting kernel parameters, reduce dependence on swap partitions and avoid performance losses caused by frequent memory and disk swaps. Improve file descriptor upper limit: Increase the number of file descriptors allowed to be opened at the same time by the system to avoid resource limitations affecting the processing efficiency of ZooKeeper. 2. ZooKeeper configuration optimization zoo.cfg file configuration

How to do Oracle security settings on Debian How to do Oracle security settings on Debian Apr 02, 2025 am 07:48 AM

To strengthen the security of Oracle database on the Debian system, it requires many aspects to start. The following steps provide a framework for secure configuration: 1. Oracle database installation and initial configuration system preparation: Ensure that the Debian system has been updated to the latest version, the network configuration is correct, and all required software packages are installed. It is recommended to refer to official documents or reliable third-party resources for installation. Users and Groups: Create a dedicated Oracle user group (such as oinstall, dba, backupdba) and set appropriate permissions for it. 2. Security restrictions set resource restrictions: Edit /etc/security/limits.d/30-oracle.conf

In the ChatGPT era, how can the technical Q&A community respond to challenges? In the ChatGPT era, how can the technical Q&A community respond to challenges? Apr 01, 2025 pm 11:51 PM

The technical Q&A community in the ChatGPT era: SegmentFault’s response strategy StackOverflow...

See all articles