Keras Dense Layer Output Shape Conundrum
In Keras, the Dense layer has long been documented to flatten its input before applying the dot product with the kernel. However, recent behavior suggests otherwise.
Problem:
As illustrated in the test code below, the Dense layer's output maintains the last axis of the input tensor:
input1 = layers.Input((2,3)) output = layers.Dense(4)(input1) print(output)
Output:
<tf.Tensor 'dense_2/add:0' shape=(?, 2, 4) dtype=float32>
Answer:
Contrary to the documentation, the Dense layer doesn't flatten the input. Instead, it applies its operation independently along the last axis. Thus, given an input of shape (n_dim1, n_dim2, ..., n_dimk), the output shape becomes (n_dim1, n_dim2, ..., m), where m is the number of units in the Dense layer.
Implications:
This behavior makes TimeDistributed(Dense(...)) and Dense(...) functionally equivalent. Additionally, since the weights are shared across the last axis, the Dense layer with input shape (n_dim1, n_dim2, ..., n_dimk) has only m * n_dimk m (bias parameters per unit) trainable parameters.
Visual Illustration:
[Image of a neural network with a Dense layer applied to an input with multiple dimensions]
This illustration depicts how the Dense layer's operation is applied independently along the last axis of the input tensor.
The above is the detailed content of When Does the Keras Dense Layer Flatten Input?. For more information, please follow other related articles on the PHP Chinese website!