- Activation function is a key part of neural network design.
- The modern default activation function for hidden layers is the ReLU function.
- The activation function for output layers depends on the type of prediction problem.
Activation Functions
An activation function in a neural network defines how the
weighted sum of the input is transformed into an output from a node or nodes in
a layer of the network.
Sometimes the activation function is called a “transfer
function”. Many activation functions are nonlinear.
The choice of activation function has a large impact on the
capability and performance of the neural network, and difference activation may
be used in difference parts of model.
Technically, the activation function is used within or after
the internal processing of each node in the network, although networks are
designed to use the same activation function for all nodes in a layer.
A network may have three types of layers: input layers
that take raw input from the domain, hidden layers that take input from another
layer and pass output to another layer, and output layers that make a
prediction.
All hidden layers use the same activation function. The
output layer will use a difference activation from the hidden layers.
There are many types of activation functions used in neural
networks, although perhaps only a small number of functions used in practice
for hidden layers and output layers.
Activation for Hidden Layers
A hidden layer receives input from another layer (such as
another hidden layer or an input layer) and provides output to another layer
(such as another hidden layer or output layer).
A neural network may have zero or more hidden layer.
There are perhaps three activation functions you may want to consider for use in hidden layer, they are:
- Rectified Linear Activation (ReLU)
- Logistic (Sigmoid)
- Hyperbolic Tangent (Tanh)
This is not an exhaustive list of activation functions used
for hidden layers, but they are the most commonly used.
ReLU Hidden Layer Activation Function
It is common because it is both simple to implement and effective at overcoming the limitations of other previously popular activation function, such as Tanh and Sigmoid.
The ReLU function is calculate as follows:
- max(0.0, x)
Sigmoid Hidden Layer Activation Function
It is the same function used in the logistic regression classification algorithm.
Input: any real value.
Output: values in the range 0 to 1.
The sigmoid activation function is calculated as follows:
- 1.0 / (1.0 + e^-x)
Plot of Inputs vs. Outputs for the Sigmoid Activation Function |
Tanh Hidden Layer Activation Function
- (e^x – e^-x) / (e^x + e^-x)
Plot of Inputs vs. Outputs for the Tanh Activation Function |
How to choose Hidden Layer Activation Function
From mid to late 1990s to 2010s: use Tanh as default.
The activation function is chosen based on the type of architecture network.
Modern neural network models with common architectures such as MLP and CNN will use the ReLU activation function.
Recurrent networks still common use Tanh or Sigmoid. For example, the LSTM uses Sigmoid for recurrent connections and Tanh for output.
- Multilayer Perceptron(MLP): ReLU activation function.
- Convolution Neural Network(CNN): ReLU activation function.
- Recurrent Neural Network: Tanh and/or Sigmoid activation function
Activation for Output Layers
- Linear
- Logistic(Sigmoid)
- Softmax
Linear Output Activation Function
Plot of Inputs vs. Outputs for the Linear Activation Function |
Sigmoid Output Activation Function
Softmax Output Activation Function
The softmax function outputs a vector that sum to 1.0 that can be interpreted as probabilities of class membership.
The input to the function is a vector of real values and the output is a vector of the same length with values that sum to 1.0 like probabilities.
The softmax function is calculated as follows:
- e^x / sum(e^x)
How to Choose an Output Activation Function
- Regression: One node, linear activation.
- Binary Classification: One node, sigmoid activation.
- Multiclass Classification: One node per class, softmax activation.
- Multilabel Classification: One node per class, sigmoid activation.
Reference
How to Choose an Activation Function for Deep Learning (machinelearningmastery.com)
No comments:
Post a Comment