Linear layer is one of the most commonly used layers in deep learning and plays an important role in neural networks. It is widely used in tasks such as image classification, object detection, and speech recognition. This article will focus on the role of linear layers on multi-dimensional tensors.
First, let’s review the basic principles of linear layers. For an input tensor The parameters of W are (n_out, n_in), and the shape of b is (n_out,). n_in represents the size of the input tensor, and n_out represents the size of the output tensor. Assume that the input tensor is a one-dimensional tensor x∈R^n_in, and the output tensor is also a one-dimensional tensor y∈R^n_out. In the linear layer, the input tensor is linearly transformed by the weight matrix W, plus the bias vector b, to obtain the output tensor y. This linear transformation can be expressed as y = Wx b. Among them, each row of W represents the weight vector of an output neuron of the linear layer, and each element of b represents the bias value of the corresponding output neuron. Each element of the final output tensor y is obtained by performing a dot product of the weight vector of the corresponding output neuron and the input tensor, plus the corresponding offset value.
Now, suppose we have a multi-dimensional tensor X whose shape is (n_1,n_2,…,n_k). We need to pass it to a linear layer to produce an output tensor Y with shape (m_1,m_2,…,m_l). At this time, what should we do?
First, we need to flatten X into a one-dimensional tensor. This process is often called a "levelling" operation and can be implemented using the view function in PyTorch. Specifically, we can change the shape of X to (n_1\times n_2\times...\times n_k,), that is, arrange the elements of all dimensions in one column. In this way, we get a one-dimensional tensor x, whose size is n_{in}=n_1\times n_2\times…\times n_k.
Next, we can pass x to the linear layer and get the output tensor y. Specifically, we can use the calculation formula of the linear layer:
y=Wx b
Here, the shape of W is (m_{out} ,n_{in}), the shape of b is (m_{out},), m_{out} represents the size of the output tensor. The result of multiplication Wx is a one-dimensional tensor with shape (m_{out},). After adding offset b, the output tensor y with shape (m_{out},) is obtained.
Finally, we need to convert y back to the form of a multidimensional tensor. Specifically, we can use the view function in PyTorch to change the shape of y to (m_1, m_2,...,m_l). In this way, we get the final output tensor Y.
It should be noted that when flattening a multi-dimensional tensor into a one-dimensional tensor, we need to ensure that the order of the elements in the tensor remains unchanged. For example, suppose we have a two-dimensional tensor X of shape (2,3):
X=\begin{bmatrix}1&2&3\4&5&6\end{bmatrix}
We need to flatten it into a one-dimensional tensor. If we use view(-1) to implement it, the result will be:
x=[1,2,3,4,5,6]
Here, we arrange the two rows of elements (1,2) and (4,5) together, causing the order to change. Therefore, the correct operation should be to use view(-1) to flatten the tensor, and then use view(1,-1) to convert it back to its original shape:
x =\begin{bmatrix}1&2&3&4&5&6\end{bmatrix}
##X=\begin{bmatrix}1&2&3\4&5&6\end{bmatrix} This way we can correctly pass the multidimensional tensor to the linear layer and get the correct output tensor. It should be noted that the role of a linear layer on a multi-dimensional tensor can be seen as an independent linear transformation for each sample. For example, suppose we have a four-dimensional tensor X with shape (N, C, H, W), where N represents the number of samples, C represents the number of channels, and H and W represent the height and width respectively. We can expand The linear layer performs an independent linear transformation on each sample to obtain an output tensor Y with a shape of (N,m_{out}). Finally, we can restore Y to its original shape (N,m_1,m_2,…,m_l) along the first dimension. In short, the role of a linear layer on a multi-dimensional tensor can be seen as an independent linear transformation for each sample. In practical applications, we usually flatten multi-dimensional tensors into one-dimensional tensors and pass them to linear layers. The flattening operation needs to ensure that the order of the elements remains unchanged, otherwise it will lead to incorrect calculation results. Finally, we need to restore the output tensor to its original shape for the next step of calculation.The above is the detailed content of What is the interaction principle between multidimensional tensors and linear layers?. For more information, please follow other related articles on the PHP Chinese website!