Deep learning image segmentation: an overview of network structure design-AI-php.cn

This article summarizes the innovations in network structure when using CNNs for image semantic segmentation. These innovations mainly include the design of new neural architectures (different depths, widths, connections and topologies) and the design of new components or layers. The former uses existing components to assemble complex large-scale networks, while the latter prefers to design underlying components. First, we introduce some classic semantic segmentation networks and their innovations, and then introduce some applications of network structure design in the field of medical image segmentation.

1. Image semantic segmentation network structure innovation

1.1 FCN network

FCN overall architecture

Simplified diagramThe FCN network is listed separately because the FCN network is the first network to solve the problem of semantic segmentation from a new perspective. Previous image semantic segmentation networks based on neural networks used image blocks centered on the pixel to be classified to predict the label of the central pixel. The network was generally constructed using the CNN FC strategy. Obviously, this method cannot utilize the global context information of the image. Moreover, the pixel-by-pixel reasoning speed is very low; while the FCN network abandons the fully connected layer FC and uses convolutional layers to build the network. Through the strategy of transposed convolution and different layer feature fusion, the network output is directly the prediction mask of the input image, which is efficient. and accuracy are greatly improved.

Deep learning image segmentation: an overview of network structure design

Schematic diagram of feature fusion of different layers of FCN

Innovation point: Full volume Product network (excluding fc layer); transposed convolution deconv (deconvolution); different layer feature map skip connection (addition)

1.2 Encoding structure (Enconder-decoder)

SegNetThe ideas of the FCN network are basically the same. The encoder part uses the first 13 layers of convolution of VGG16. The difference lies in the Upsampling method of the Decoder part. FCN obtains the upsampling result by adding the result obtained by deconv the feature map to the feature map of the corresponding size of the encoder; while SegNet uses the index of the maxpool of the Encoder part to upsample the Decoder part (original description: the decoder upsamples the lower resolution feature input maps. Specifically, the decoder uses pooling indices computed in the max-pooling step of the corresponding encoder to perform non-linear upsampling.).

Innovation point: Encoder-Decoder structure; Pooling indices.

Deep learning image segmentation: an overview of network structure design

SegNet Network

Deep learning image segmentation: an overview of network structure design

## Comparison of the Upsample method between SegNet and FCN

Innovation points: U-shaped structure; short-circuit channel (skip-connection)

Deep learning image segmentation: an overview of network structure design

U-NetNetwork

The V-Net network structure is similar to U-Net, except that the architecture adds skip connections and replaces 2D operations with 3D operations to process 3D images (volumetric images). And optimized for widely used segmentation metrics like Dice.

Deep learning image segmentation: an overview of network structure design

V-Net Network

Innovation point: Quite The 3D version of the U-Net network

FC-DenseNet (One Hundred Layers Tiramisu Network)（paper title: The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation) The network structure is composed of Dense Block and UNet architecture. The simplest version of this network is composed of two downsampling paths transitioning downward and two upsampling paths transitioning upward. It also contains two horizontal skip connections to splice the feature map from the downsampling path with the corresponding feature map in the upsampling path. The connection patterns in the upsampling path and the downsampling path are not exactly the same: in the downsampling path, there is a skip splicing path outside each dense block, resulting in a linear increase in the number of feature maps, while in the upsampling path there is no such operation. (One more thing, the abbreviation of this network can be Dense Unet, but there is a paper called Fully Dense UNet for 2D Sparse Photoacoustic Tomography Artifact Removal, which is a paper on photoacoustic imaging artifact removal. I have seen many blogs citing this article. The illustrations in the paper talk about semantic segmentation, which is not the same thing at all =_=||, just be able to distinguish it yourself.)

Deep learning image segmentation: an overview of network structure design

##FC-DenseNet (Hundred-Layer Tiramisu Network)

Innovation point: Integration of DenseNet and U-Net networks (from the perspective of information exchange Look, dense connections are indeed more powerful than residual structures)

1) DeepLabV1: Fusion of convolutional neural network and probability graph model: CNN CRF, which improves segmentation and positioning accuracy;

Deep learning image segmentation: an overview of network structure design

2) DeepLabV2: ASPP (expanded spatial pyramid pooling); CNN CRF

Deep learning image segmentation: an overview of network structure design

3 ) DeepLabV3: Improved ASPP, adding 1*1 convolution and global avg pool; compared the effects of cascaded and parallel atrous convolutions.

Deep learning image segmentation: an overview of network structure design

Cascade Atrous Convolution

Deep learning image segmentation: an overview of network structure design

Parallel Atrous Convolution (ASPP)

4) DeepLabV3: Add the idea of encoding and decoding architecture, add a decoder module to extend DeepLabv3; apply depth separable convolution to ASPP and decoder module; improved Xception as Backbone.

Deep learning image segmentation: an overview of network structure design

DeepLabV3

In general, the core contributions of DeepLab series: dilated convolution; ASPP; CNN CRF (V1 only Using CRF with V2, it should be that V3 and V3 solve the problem of blurred segmentation boundaries through deep networks, and the effect is better than adding CRF)

PSPNet（pyramid scene parsing network) improves the network's ability to utilize global context information by aggregating context information from different areas. In SPPNet, the feature maps of different levels generated by pyramid pooling are finally flattened and concatenated, and then sent to the fully connected layer for classification, eliminating the limitation of CNN requiring a fixed input size for image classification. In PSPNet, the strategy used is: pooling-conv-upsample, and then spliced to obtain the feature map, and then perform label prediction.

Deep learning image segmentation: an overview of network structure design

##PSPNet network

Innovation point: Multi-scale pooling ization to better leverage global image-level prior knowledge to understand complex scenes

RefineNet by refining intermediate activation maps and hierarchically connecting them to combine multiple scales Activate while preventing sharpness loss. The network consists of independent Refine modules, each Refine module consists of three main modules, namely: Residual Convolutional Unit (RCU), Multi-Resolution Fusion (MRF) and Chain Residual Pooling (CRP). The overall structure is somewhat similar to U-Net, but a new combination method is designed at the jump connection (not simple concat). Personally, I think that this structure is actually very suitable as an idea for your own network design. You can add many CNN modules used in other CV problems, and using U-Net as the overall framework, the effect will not be too bad.

Deep learning image segmentation: an overview of network structure design

##RefineNet Network

Innovation point: Refine module

1.3 Reduce the computational complexity of the network structure

There is also a lot of work dedicated to reducing the computational complexity of the semantic segmentation network. Some methods to simplify the structure of deep networks: tensor decomposition; channel/network pruning; sparse connections. There are also some that use NAS (Neural Architecture Search) to replace manual design to search the structure of modules or the entire network. Of course, the GPU resources required by AutoDL will dissuade a large number of people. Therefore, some people use random search to search for much smaller ASPP modules, and then build the entire network model based on the small modules.

Lightweight network design is the consensus in the industry. For mobile deployment, it is impossible to equip each machine with a 2080ti. In addition, power consumption, storage and other issues will also limit the promotion and application of the model. However, if 5G becomes popular, all data can be processed in the cloud, which will be very interesting. Of course, in the short term (ten years), we don’t know whether full-scale deployment of 5G is feasible.

1.4 Network structure based on attention mechanism

The attention mechanism can be defined as: using subsequent layer/feature map information to select and locate the most judgmental (or salient) in the input feature map )part. It can simply be thought of as a way of weighting feature maps (the weights are calculated through the network). According to the different functions of the weights, it can be divided into channel attention mechanism (CA) and spatial attention mechanism (PA). The FPA (Feature Pyramid Attention) network is a semantic segmentation network based on the attention mechanism, which combines the attention mechanism and the spatial pyramid to extract precise features for pixel-level labeling without using dilation. Convolutional and human-designed decoder networks.

1.5 Network structure based on adversarial learning

Goodfellow et al. proposed an adversarial method to learn deep generative models in 2014. Generative adversarial networks (GANs) need to train two at the same time. Models: a generative model G that captures the distribution of the data, and a discriminative model D that estimates the probability that a sample came from the training data.

● G is a generative network, which receives a random noise z (random number), and generates an image through this noise

● D is a discriminative network, which determines whether an image is Not "real". Its input parameter is x (a picture), and the output D(x) represents the probability that x is a real picture. If it is 1, it means 100% is a real picture, and the output is 0, which means it cannot be real. picture.

G’s training procedure is to maximize the probability of D error. It can be proved that in the space of any functions G and D, there is a unique solution such that G reproduces the training data distribution, and D=0.5. During the training process, the goal of the generation network G is to try to generate real pictures to deceive the discriminant network D. The goal of D is to try to distinguish the fake images generated by G from the real images. In this way, G and D constitute a dynamic "game process", and the final equilibrium point is the Nash equilibrium point. In the case where G and D are defined by a neural network, the entire system can be trained with backpropagation.

Deep learning image segmentation: an overview of network structure design

GANs network structure diagramInspired by GANs, Luc et al. trained a semantic segmentation network (G) and a confrontation Network (D), the adversarial network distinguishes segmentation maps from ground truth or semantic segmentation networks (G). G and D continue to play games and learn, and their loss functions are defined as:

Deep learning image segmentation: an overview of network structure design

GANs loss function

Deep learning image segmentation: an overview of network structure design

Review the original GAN loss function: The loss function of GANs embodies the idea of a zero-sum game. The loss function of the original GANs is as follows:

Deep learning image segmentation: an overview of network structure design

The calculation position of the loss is at the output of D (discriminator), and the output of D is generally a fake/true judgment, so the overall situation can be considered to be a binary cross-entropy function. It can be seen from the form of the loss function of GANs that training is divided into two parts:

The first is the maxD part, because training generally first trains D while keeping G (generator) unchanged. The training goal of D is to correctly distinguish fake/true. If we use 1/0 to represent true/fake, then for the first item E, because the input is sampled from real data, we expect D(x) to approach 1, which is the first Items are larger. In the same way, the second item E input samples data generated from G, so we expect D(G(z)) to approach 0 better, which means that the second item is larger again. So this part is the expectation that training will make the whole bigger, which is the meaning of maxD. This part only updates the parameters of D.

The second part keeps D unchanged (no parameter update) and trains G. At this time, only the second item E is useful. The key is here, because we want to confuse D, so at this time the label is set to 1 (we know it is fake, so it is called confusion). We hope that the output of D(G(z)) is close to 1, that is, the smaller this term is, the better. This is minG. Of course, the discriminator is not so easy to fool, so at this time the discriminator will produce a relatively large error. The error will update G, and then G will become better. I didn’t fool you this time, so I can only work harder next time. (Quoted from https://www.cnblogs.com/walter-xh/p/10051634.html). At this time, only the parameters of G are updated.

Looking at GANs from another perspective, the discriminator (D) is equivalent to a special loss function (composed of a neural network, different from traditional L1, L2, cross-entropy and other loss functions).

In addition, GANs have a special training method and have problems such as gradient disappearance and mode collapse (there seems to be a way to solve it at present), but its design idea is indeed a great invention in the era of deep learning.

1.6 Summary

Most of the image semantic segmentation models based on deep learning follow the encoder-decoder architecture, such as U-Net. Research results in recent years have shown that dilated convolution and feature pyramid pooling can improve U-Net style network performance. In Section 2, we summarize how these methods and their variants can be applied to medical image segmentation.

2. Application of network structure innovation in medical image segmentation

This section introduces some research results on the application of network structure innovation in 2D/3D medical image segmentation.

2.1 Segmentation method based on model compression

In order to achieve real-time processing of high-resolution 2D/3D medical images (such as CT, MRI and histopathology images, etc.), researchers have proposed a variety of compression models Methods. Weng et al. used NAS technology to apply to the U-Net network and obtained a small network with better organ/tumor segmentation performance on CT, MRI and ultrasound images. Brugger redesigned the U-Net architecture by utilizing group normalization and Leaky-ReLU (leaky ReLU function) to make the network's storage efficiency for 3D medical image segmentation more efficient. Some people have also designed dilated convolution modules with fewer parameters. Some other model compression methods include weight quantization (sixteen-bit, eight-bit, binary quantization), distillation, pruning, etc.

2.2 Segmentation method of encoding-decoding structure

Drozdal proposed a method that applies a simple CNN to normalize the original input image before feeding the image into the segmentation network, improving Improved the segmentation accuracy of singleton microscope image segmentation, liver CT, and prostate MRI. Gu proposed a method of using dilated convolution in the backbone network to retain contextual information. Vorontsov proposed a graph-to-graph network framework that converts images with ROI to images without ROI (for example, images with tumors are converted to healthy images without tumors), and then the tumors removed by the model are added to the new healthy images. , to obtain the detailed structure of the object. Zhou et al. proposed a method for skip connection rewiring of the U-Net network and performed it on nodule segmentation in chest low-dose CT scans, nuclear segmentation in microscopy images, liver segmentation in abdominal CT scans, and colonoscopy. Performance was tested on a polyp segmentation task in the examination video. Goyal applied DeepLabV3 to dermoscopic color image segmentation to extract skin lesion areas.

2.3 Segmentation method based on attention mechanism

Nie proposed an attention model, which can segment the prostate more accurately than the baseline model (V-Net and FCN). SinHa proposed a network based on a multi-layer attention mechanism for abdominal organ segmentation in MRI images. Qin et al. proposed a dilated convolution module to preserve more details of 3D medical images. There are many other papers on blood image segmentation based on attention mechanisms.

2.4 Segmentation network based on adversarial learning

Khosravan proposed an adversarial training network for pancreatic segmentation from CT scans. Son uses generative adversarial networks for retinal image segmentation. Xue uses a fully convolutional network as a segmentation network in a generative adversarial framework to segment brain tumors from MRI images. There are other papers that successfully apply GANs to medical image segmentation problems, so I won’t list them one by one.

2.5 RNN-based segmentation model

Recurrent neural network (RNN) is mainly used to process sequence data. Long short-term memory network (LSTM) is an improved version of RNN. LSTM introduces self-loop (self-loops) enable the gradient flow to be maintained for a long time. In the field of medical image analysis, RNNs are used to model temporal dependencies in image sequences. Bin et al. proposed an image sequence segmentation algorithm that integrates a fully convolutional neural network and RNN, and incorporates information in the time dimension into the segmentation task. Gao et al. used CNN and LSTM to model temporal relationships in brain MRI slice sequences to improve segmentation performance in 4D images. Li et al. first used U-Net to obtain the initial segmentation probability map, and then used LSTM to segment the pancreas from 3D CT images, which improved the segmentation performance. There are many other papers that use RNN for medical image segmentation, so I will not introduce them one by one.

2.6 Summary

This part of the content is mainly about the application of segmentation algorithms in medical image segmentation, so there are not many innovation points. It is mainly about the application of different formats (CT or RGB, pixel range, image resolution, etc.) and the characteristics of data in different parts (noise, object shape, etc.), the classic network needs to be improved for different data to adapt to the input data format and characteristics, so that it can better complete the segmentation task. Although deep learning is a black box, the design of the overall model still has rules to follow. What strategies solve what problems and what problems they cause can be chosen based on the specific segmentation problem to achieve optimal segmentation performance.

Some references:

1.Deep Semantic Segmentation of Natural and Medical Images: A Review

2.NAS-Unet: Neural architecture search for medical image segmentation. IEEE Access, 7:44247–44257, 2019.

3.Boosting segmentation with weak supervision from image-to-image translation. arXiv preprint arXiv: 1904.01636, 2019

4.Multi-scale guided attention for medical image segmentation. arXiv preprint arXiv:1906.02849,2019.

5.SegAN : Adversarial network with multi-scale L1 loss for medical image segmentation.

6.Fully convolutional structured LSTM networks for joint 4D medical image segmentation. In 2018 IEEE7 https://www.cnblogs .com/walter-xh/p/10051634.html

The above is the detailed content of Deep learning image segmentation: an overview of network structure design. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks ago By DDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

WWE 2K25: How To Unlock Everything In MyRise

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7463

CakePHP Tutorial

1376

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

Related knowledge

Methods and steps for using BERT for sentiment analysis in Python Jan 22, 2024 pm 04:24 PM

BERT is a pre-trained deep learning language model proposed by Google in 2018. The full name is BidirectionalEncoderRepresentationsfromTransformers, which is based on the Transformer architecture and has the characteristics of bidirectional encoding. Compared with traditional one-way coding models, BERT can consider contextual information at the same time when processing text, so it performs well in natural language processing tasks. Its bidirectionality enables BERT to better understand the semantic relationships in sentences, thereby improving the expressive ability of the model. Through pre-training and fine-tuning methods, BERT can be used for various natural language processing tasks, such as sentiment analysis, naming

Analysis of commonly used AI activation functions: deep learning practice of Sigmoid, Tanh, ReLU and Softmax Dec 28, 2023 pm 11:35 PM

Activation functions play a crucial role in deep learning. They can introduce nonlinear characteristics into neural networks, allowing the network to better learn and simulate complex input-output relationships. The correct selection and use of activation functions has an important impact on the performance and training results of neural networks. This article will introduce four commonly used activation functions: Sigmoid, Tanh, ReLU and Softmax, starting from the introduction, usage scenarios, advantages, disadvantages and optimization solutions. Dimensions are discussed to provide you with a comprehensive understanding of activation functions. 1. Sigmoid function Introduction to SIgmoid function formula: The Sigmoid function is a commonly used nonlinear function that can map any real number to between 0 and 1. It is usually used to unify the

Beyond ORB-SLAM3! SL-SLAM: Low light, severe jitter and weak texture scenes are all handled May 30, 2024 am 09:35 AM

Written previously, today we discuss how deep learning technology can improve the performance of vision-based SLAM (simultaneous localization and mapping) in complex environments. By combining deep feature extraction and depth matching methods, here we introduce a versatile hybrid visual SLAM system designed to improve adaptation in challenging scenarios such as low-light conditions, dynamic lighting, weakly textured areas, and severe jitter. sex. Our system supports multiple modes, including extended monocular, stereo, monocular-inertial, and stereo-inertial configurations. In addition, it also analyzes how to combine visual SLAM with deep learning methods to inspire other research. Through extensive experiments on public datasets and self-sampled data, we demonstrate the superiority of SL-SLAM in terms of positioning accuracy and tracking robustness.

Latent space embedding: explanation and demonstration Jan 22, 2024 pm 05:30 PM

Latent Space Embedding (LatentSpaceEmbedding) is the process of mapping high-dimensional data to low-dimensional space. In the field of machine learning and deep learning, latent space embedding is usually a neural network model that maps high-dimensional input data into a set of low-dimensional vector representations. This set of vectors is often called "latent vectors" or "latent encodings". The purpose of latent space embedding is to capture important features in the data and represent them into a more concise and understandable form. Through latent space embedding, we can perform operations such as visualizing, classifying, and clustering data in low-dimensional space to better understand and utilize the data. Latent space embedding has wide applications in many fields, such as image generation, feature extraction, dimensionality reduction, etc. Latent space embedding is the main

Understand in one article: the connections and differences between AI, machine learning and deep learning Mar 02, 2024 am 11:19 AM

In today's wave of rapid technological changes, Artificial Intelligence (AI), Machine Learning (ML) and Deep Learning (DL) are like bright stars, leading the new wave of information technology. These three words frequently appear in various cutting-edge discussions and practical applications, but for many explorers who are new to this field, their specific meanings and their internal connections may still be shrouded in mystery. So let's take a look at this picture first. It can be seen that there is a close correlation and progressive relationship between deep learning, machine learning and artificial intelligence. Deep learning is a specific field of machine learning, and machine learning

Super strong! Top 10 deep learning algorithms! Mar 15, 2024 pm 03:46 PM

Almost 20 years have passed since the concept of deep learning was proposed in 2006. Deep learning, as a revolution in the field of artificial intelligence, has spawned many influential algorithms. So, what do you think are the top 10 algorithms for deep learning? The following are the top algorithms for deep learning in my opinion. They all occupy an important position in terms of innovation, application value and influence. 1. Deep neural network (DNN) background: Deep neural network (DNN), also called multi-layer perceptron, is the most common deep learning algorithm. When it was first invented, it was questioned due to the computing power bottleneck. Until recent years, computing power, The breakthrough came with the explosion of data. DNN is a neural network model that contains multiple hidden layers. In this model, each layer passes input to the next layer and

How to use CNN and Transformer hybrid models to improve performance Jan 24, 2024 am 10:33 AM

Convolutional Neural Network (CNN) and Transformer are two different deep learning models that have shown excellent performance on different tasks. CNN is mainly used for computer vision tasks such as image classification, target detection and image segmentation. It extracts local features on the image through convolution operations, and performs feature dimensionality reduction and spatial invariance through pooling operations. In contrast, Transformer is mainly used for natural language processing (NLP) tasks such as machine translation, text classification, and speech recognition. It uses a self-attention mechanism to model dependencies in sequences, avoiding the sequential computation in traditional recurrent neural networks. Although these two models are used for different tasks, they have similarities in sequence modeling, so

Improved RMSprop algorithm Jan 22, 2024 pm 05:18 PM

RMSprop is a widely used optimizer for updating the weights of neural networks. It was proposed by Geoffrey Hinton et al. in 2012 and is the predecessor of the Adam optimizer. The emergence of the RMSprop optimizer is mainly to solve some problems encountered in the SGD gradient descent algorithm, such as gradient disappearance and gradient explosion. By using the RMSprop optimizer, the learning rate can be effectively adjusted and the weights adaptively updated, thereby improving the training effect of the deep learning model. The core idea of the RMSprop optimizer is to perform a weighted average of gradients so that gradients at different time steps have different effects on weight updates. Specifically, RMSprop calculates the square of each parameter

See all articles