NeurIPS23 | 'Brain Reading' decodes brain activity and reconstructs the visual world-AI-php.cn

Home

Technology peripherals

NeurIPS23 | 'Brain Reading' decodes brain activity and reconstructs the visual world

PHPz

Jan 10, 2024 pm 02:54 PM

project

In this NeurIPS23 paper, researchers from the University of Leuven, the National University of Singapore, and the Institute of Automation of the Chinese Academy of Sciences proposed a visual "brain reading technology" that can learn from human brain activity. High resolution resolution of the image seen by the human eye.

In the field of cognitive neuroscience, people realize that human perception is not only affected by objective stimuli, but also deeply affected by past experiences. These factors work together to create complex activity in the brain. Therefore, decoding visual information from brain activity becomes an important task. Among them, functional magnetic resonance imaging (fMRI), as an efficient non-invasive technology, plays a key role in recovering and analyzing visual information, especially image categories

However, due to the noise of fMRI signals Due to the complexity of the characteristics and visual representation of the brain, this task faces considerable challenges. To address this problem, this paper proposes a two-stage fMRI representation learning framework, which aims to identify and remove noise in brain activity, and focuses on parsing neural activation patterns that are crucial for visual reconstruction, successfully reconstructing high-level images from brain activity. resolution and semantically accurate images.

NeurIPS23｜视觉「读脑术」：从大脑活动中重建你眼中的世界

Paper link: https://arxiv.org/abs/2305.17214

Project link: https://github.com/soinx0629/vis_dec_neurips/

The method proposed in the paper is based on dual contrast learning, cross-modal information intersection and diffusion models. It has achieved nearly 40% improvement in evaluation indicators on relevant fMRI data sets compared to the previous best models. In generating images Compared with existing methods, the quality, readability and semantic relevance have been improved perceptibly by the naked eye. This work helps to understand the visual perception mechanism of the human brain and is beneficial to promoting research on visual brain-computer interface technology. The relevant codes have been open source.

Although functional magnetic resonance imaging (fMRI) is widely used to analyze neural responses, accurately reconstructing visual images from its data remains challenging, mainly because fMRI data contain noise from multiple sources, which may obscure Neural activation mode, increasing the difficulty of decoding. In addition, the neural response process triggered by visual stimulation is complex and multi-stage, making the fMRI signal present a nonlinear complex superposition that is difficult to reverse and decode.

Traditional neural decoding methods, such as ridge regression, although used to associate fMRI signals with corresponding stimuli, often fail to effectively capture the nonlinear relationship between stimuli and neural responses. Recently, deep learning techniques, such as generative adversarial networks (GANs) and latent diffusion models (LDMs), have been adopted to model this complex relationship more accurately. However, isolating vision-related brain activity from noise and accurately decoding it remains one of the main challenges in the field.

To address these challenges, this work proposes a two-stage fMRI representation learning framework, which can effectively identify and remove noise in brain activities and focus on parsing neural activation patterns that are critical for visual reconstruction. . This method generates high-resolution and semantically accurate images with a Top-1 accuracy of 39.34% for 50 categories, exceeding the existing state-of-the-art technology.

A method overview is a brief description of a series of steps or processes. It is used to explain how to achieve a specific goal or complete a specific task. The purpose of a method overview is to provide the reader or user with an overall understanding of the entire process so that they can better understand and follow the steps within it. In a method overview, you usually include the sequence of steps, materials or tools needed, and problems or challenges that may be encountered. By describing the method overview clearly and concisely, the reader or user can more easily understand and successfully complete the required task

fMRI Representation Learning (FRL)

NeurIPS23｜视觉「读脑术」：从大脑活动中重建你眼中的世界

First stage: Pre-training dual contrast mask autoencoder (DC-MAE)

In order to distinguish shared brain activity patterns and individual noise among different groups of people, this paper introduces DC-MAE technology to pre-train fMRI representations using unlabeled data. DC-MAE consists of an encoder NeurIPS23｜视觉「读脑术」：从大脑活动中重建你眼中的世界 and a decoder , where NeurIPS23 | Brain Reading decodes brain activity and reconstructs the visual world takes the masked fMRI signal as input and is trained to predict the unmasked fMRI signal. The so-called “double contrast” means that the model optimizes the contrast loss in fMRI representation learning and participates in two different contrast processes.

In the first stage of contrastive learning, the samples NeurIPS23｜视觉「读脑术」：从大脑活动中重建你眼中的世界 in each batch containing n fMRI samples v are randomly masked twice, generating two different masked versions and , as a positive sample pair for comparison. Subsequently, 1D convolutional layers convert these two versions into embedded representations, which are fed into the fMRI encoder NeurIPS23 | Brain Reading decodes brain activity and reconstructs the visual world respectively. The decoder receives these encoded latent representations and produces predictions NeurIPS23｜视觉「读脑术」：从大脑活动中重建你眼中的世界 and . Optimize the model through the first contrast loss calculated by the InfoNCE loss function, that is, the cross-contrast loss:

NeurIPS23｜视觉「读脑术」：从大脑活动中重建你眼中的世界

In the second stage of contrastive learning, each unmasked original image NeurIPS23｜视觉「读脑术」：从大脑活动中重建你眼中的世界 and its corresponding masked image form a pair of natural positive samples. The here represents the image predicted by the decoder NeurIPS23 | Brain Reading decodes brain activity and reconstructs the visual world . The second contrast loss, which is the self-contrast loss, is calculated according to the following formula:

NeurIPS23｜视觉「读脑术」：从大脑活动中重建你眼中的世界

Optimizing the self-contrast loss NeurIPS23｜视觉「读脑术」：从大脑活动中重建你眼中的世界 can achieve occlusion reconstruction. Whether it is or NeurIPS23 | Brain Reading decodes brain activity and reconstructs the visual world , the negative sample comes from the same batch of instances. and are jointly optimized as follows: NeurIPS23｜视觉「读脑术」：从大脑活动中重建你眼中的世界 , where the hyperparameters and are used to adjust the weight of each loss item.

Phase 2: Adjustment using cross-modal guidance

Given the low signal-to-noise ratio and highly convolutional nature of fMRI recordings , it is crucial for fMRI feature learners to focus on brain activation patterns that are most relevant to visual processing and most informative for reconstruction

After the first stage of pre-training, the fMRI autoencoder is adjusted with image assistance to achieve fMRI reconstruction, and the second stage also follows this process. Specifically, a sample NeurIPS23｜视觉「读脑术」：从大脑活动中重建你眼中的世界 and its corresponding fMRI recorded neural response are selected from a batch of n samples. and NeurIPS23 | Brain Reading decodes brain activity and reconstructs the visual world are transformed into and respectively after blocking and random masking processing, and then are input to the image encoder NeurIPS23｜视觉「读脑术」：从大脑活动中重建你眼中的世界 and fMRI encoder NeurIPS23 | Brain Reading decodes brain activity and reconstructs the visual world respectively to generate and . To reconstruct fMRI, use the cross attention module to merge and :

NeurIPS23｜视觉「读脑术」：从大脑活动中重建你眼中的世界

W and b represent the weight and bias of the corresponding linear layer respectively. NeurIPS23｜视觉「读脑术」：从大脑活动中重建你眼中的世界 is the scaling factor, is the dimension of the key vector. CA is the abbreviation of cross-attention. After is added to NeurIPS23 | Brain Reading decodes brain activity and reconstructs the visual world , it is input into the fMRI decoder to reconstruct , and NeurIPS23｜视觉「读脑术」：从大脑活动中重建你眼中的世界 is obtained:

NeurIPS23｜视觉「读脑术」：从大脑活动中重建你眼中的世界

The image autoencoder is also performed Similar calculations, the output NeurIPS23 | Brain Reading decodes brain activity and reconstructs the visual world of the image encoder NeurIPS23｜视觉「读脑术」：从大脑活动中重建你眼中的世界 is merged with the output of through the cross attention module , and then used to decode the image , resulting in :

fMRI and image autoencoders are trained together by optimizing the following loss function:

NeurIPS23｜视觉「读脑术」：从大脑活动中重建你眼中的世界

When generating images, a latent diffusion model (LDM) can be used

NeurIPS23｜视觉「读脑术」：从大脑活动中重建你眼中的世界

After completing the first and second stages of FRL training, use the fMRI feature learner's encoder NeurIPS23 | Brain Reading decodes brain activity and reconstructs the visual world to drive a latent diffusion model (LDM) to generate images from brain activity. As shown in the figure, the diffusion model includes a forward diffusion process and a reverse denoising process. The forward process gradually degrades the image into normal Gaussian noise by gradually introducing Gaussian noise with varying variance.

This study generates images by extracting visual knowledge from a pre-trained label-to-image latent diffusion model (LDM) and using fMRI data as a condition. A cross-attention mechanism is employed here to incorporate fMRI information into LDM, following recommendations from stable diffusion studies. In order to strengthen the role of conditional information, the methods of cross attention and time step conditioning are used here. In the training phase, the VQGAN encoder NeurIPS23｜视觉「读脑术」：从大脑活动中重建你眼中的世界 and the fMRI encoder NeurIPS23 | Brain Reading decodes brain activity and reconstructs the visual world trained by the first and second stages of FRL are used to process the image u and fMRI v, and the fMRI encoder is fine-tuned while keeping the LDM unchanged. The loss The function is: NeurIPS23｜视觉「读脑术」：从大脑活动中重建你眼中的世界

where, NeurIPS23｜视觉「读脑术」：从大脑活动中重建你眼中的世界 is the noise plan of the diffusion model. In the inference phase, the process starts with standard Gaussian noise at time step T, and the LDM sequentially follows the inverse process to gradually remove the noise of the hidden representation, conditioned on the given fMRI information. When time step zero is reached, the hidden representation is converted into an image using the VQGAN decoder. NeurIPS23｜视觉「读脑术」：从大脑活动中重建你眼中的世界

Experiment

Reconstruction results

NeurIPS23｜视觉「读脑术」：从大脑活动中重建你眼中的世界 ##By working with DC-LDM, IC- Comparison of previous studies such as GAN and SS-AE, and evaluation on the GOD and BOLD5000 data sets show that the model proposed in this study significantly exceeds these models in accuracy, which is improved compared to DC-LDM and IC-GAN respectively. 39.34% and 66.7%

NeurIPS23｜视觉「读脑术」：从大脑活动中重建你眼中的世界 Evaluation on the other four subjects of the GOD dataset shows that even when DC-LDM is allowed to adjust on the test set In this case, the model proposed in this study is also significantly better than DC-LDM in the Top-1 classification accuracy of 50 ways, proving the reliability and superiority of the proposed model in reconstructing brain activity of different subjects.

The research results show that using the proposed fMRI representation learning framework and pre-trained LDM can better reconstruct the brain's visual activity, far exceeding the current baseline level. This work helps further explore the potential of neural decoding models

The above is the detailed content of NeurIPS23 | 'Brain Reading' decodes brain activity and reconstructs the visual world. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Repo: How To Revive Teammates

1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hello Kitty Island Adventure: How To Get Giant Seeds

4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

How Long Does It Take To Beat Split Fiction?

3 weeks ago By DDD

R.E.P.O. Save File Location: Where Is It & How to Protect It?

4 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7351

Java Tutorial

1628

CakePHP Tutorial

1353

Laravel Tutorial

1265

PHP Tutorial

1214

Related knowledge

From RLHF to DPO to TDPO, large model alignment algorithms are already 'token-level' Jun 24, 2024 pm 03:04 PM

The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com In the development process of artificial intelligence, the control and guidance of large language models (LLM) has always been one of the core challenges, aiming to ensure that these models are both powerful and safe serve human society. Early efforts focused on reinforcement learning methods through human feedback (RL

The author of ControlNet has another hit! The whole process of generating a painting from a picture, earning 1.4k stars in two days Jul 17, 2024 am 01:56 AM

It is also a Tusheng video, but PaintsUndo has taken a different route. ControlNet author LvminZhang started to live again! This time I aim at the field of painting. The new project PaintsUndo has received 1.4kstar (still rising crazily) not long after it was launched. Project address: https://github.com/lllyasviel/Paints-UNDO Through this project, the user inputs a static image, and PaintsUndo can automatically help you generate a video of the entire painting process, from line draft to finished product. follow. During the drawing process, the line changes are amazing. The final video result is very similar to the original image: Let’s take a look at a complete drawing.

Topping the list of open source AI software engineers, UIUC's agent-less solution easily solves SWE-bench real programming problems Jul 17, 2024 pm 10:02 PM

The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com The authors of this paper are all from the team of teacher Zhang Lingming at the University of Illinois at Urbana-Champaign (UIUC), including: Steven Code repair; Deng Yinlin, fourth-year doctoral student, researcher

Posthumous work of the OpenAI Super Alignment Team: Two large models play a game, and the output becomes more understandable Jul 19, 2024 am 01:29 AM

If the answer given by the AI model is incomprehensible at all, would you dare to use it? As machine learning systems are used in more important areas, it becomes increasingly important to demonstrate why we can trust their output, and when not to trust them. One possible way to gain trust in the output of a complex system is to require the system to produce an interpretation of its output that is readable to a human or another trusted system, that is, fully understandable to the point that any possible errors can be found. For example, to build trust in the judicial system, we require courts to provide clear and readable written opinions that explain and support their decisions. For large language models, we can also adopt a similar approach. However, when taking this approach, ensure that the language model generates

A significant breakthrough in the Riemann Hypothesis! Tao Zhexuan strongly recommends new papers from MIT and Oxford, and the 37-year-old Fields Medal winner participated Aug 05, 2024 pm 03:32 PM

Recently, the Riemann Hypothesis, known as one of the seven major problems of the millennium, has achieved a new breakthrough. The Riemann Hypothesis is a very important unsolved problem in mathematics, related to the precise properties of the distribution of prime numbers (primes are those numbers that are only divisible by 1 and themselves, and they play a fundamental role in number theory). In today's mathematical literature, there are more than a thousand mathematical propositions based on the establishment of the Riemann Hypothesis (or its generalized form). In other words, once the Riemann Hypothesis and its generalized form are proven, these more than a thousand propositions will be established as theorems, which will have a profound impact on the field of mathematics; and if the Riemann Hypothesis is proven wrong, then among these propositions part of it will also lose its effectiveness. New breakthrough comes from MIT mathematics professor Larry Guth and Oxford University

Unlimited video generation, planning and decision-making, diffusion forced integration of next token prediction and full sequence diffusion Jul 23, 2024 pm 02:05 PM

Currently, autoregressive large-scale language models using the next token prediction paradigm have become popular all over the world. At the same time, a large number of synthetic images and videos on the Internet have already shown us the power of diffusion models. Recently, a research team at MITCSAIL (one of whom is Chen Boyuan, a PhD student at MIT) successfully integrated the powerful capabilities of the full sequence diffusion model and the next token model, and proposed a training and sampling paradigm: Diffusion Forcing (DF). Paper title: DiffusionForcing:Next-tokenPredictionMeetsFull-SequenceDiffusion Paper address: https:/

arXiv papers can be posted as 'barrage', Stanford alphaXiv discussion platform is online, LeCun likes it Aug 01, 2024 pm 05:18 PM

cheers! What is it like when a paper discussion is down to words? Recently, students at Stanford University created alphaXiv, an open discussion forum for arXiv papers that allows questions and comments to be posted directly on any arXiv paper. Website link: https://alphaxiv.org/ In fact, there is no need to visit this website specifically. Just change arXiv in any URL to alphaXiv to directly open the corresponding paper on the alphaXiv forum: you can accurately locate the paragraphs in the paper, Sentence: In the discussion area on the right, users can post questions to ask the author about the ideas and details of the paper. For example, they can also comment on the content of the paper, such as: "Given to

Axiomatic training allows LLM to learn causal reasoning: the 67 million parameter model is comparable to the trillion parameter level GPT-4 Jul 17, 2024 am 10:14 AM

Show the causal chain to LLM and it learns the axioms. AI is already helping mathematicians and scientists conduct research. For example, the famous mathematician Terence Tao has repeatedly shared his research and exploration experience with the help of AI tools such as GPT. For AI to compete in these fields, strong and reliable causal reasoning capabilities are essential. The research to be introduced in this article found that a Transformer model trained on the demonstration of the causal transitivity axiom on small graphs can generalize to the transitive axiom on large graphs. In other words, if the Transformer learns to perform simple causal reasoning, it may be used for more complex causal reasoning. The axiomatic training framework proposed by the team is a new paradigm for learning causal reasoning based on passive data, with only demonstrations

See all articles