Sigmoid Function
The Sigmoid function σ(x) = 1/(1+e^(-x)) maps any value to the range (0, 1) – historically important as activation function, today primarily for binary classification.
Sigmoid maps values to 0-1 – the classic activation function for binary classification, replaced by ReLU in hidden layers due to vanishing gradients.
Explanation
Sigmoid was the first popular activation function in neural networks. Today it's mainly used as output activation for binary classification (probability 0-1). In hidden layers, it was replaced by ReLU.
Marketing Relevance
Fundamental for understanding neural networks and logistic regression.
Origin & History
The logistic function was described by Pierre François Verhulst in 1838. In neural networks, sigmoid dominated until around 2010. With ReLU (2010+), it became clear that sigmoid causes vanishing gradients in deep nets. Today only used as output layer for binary decisions.
Comparisons & Differences
Sigmoid Function vs. ReLU
Sigmoid saturates at extreme values (vanishing gradient); ReLU has no upper saturation and trains faster.
Sigmoid Function vs. Tanh
Sigmoid: output 0-1; Tanh: output -1 to +1 (zero-centered, often better in hidden layers). Both suffer from vanishing gradients.