Step Decay (Learning Rate)
Simplest learning rate schedule strategy that reduces the LR by a factor after fixed intervals (epochs or steps).
Step decay reduces LR abruptly at fixed intervals – the simplest schedule strategy, but now mostly replaced by cosine annealing.
Explanation
Typical: LR is reduced by factor 0.1 every 30 epochs. Simple to implement and understand, but less smooth than cosine annealing.
Marketing Relevance
Step decay was standard in computer vision for years (ResNet paper). Now mostly replaced by cosine annealing or one-cycle.
Common Pitfalls
Abrupt LR drops can destabilize training. Timing and factor must be manually tuned. Less efficient than smooth schedules.
Origin & History
Step decay was standard in ImageNet training recipes (AlexNet 2012, VGG 2014, ResNet 2015). Cosine annealing (2017) and one-cycle (2018) showed consistently better results and replaced step decay as standard.
Comparisons & Differences
Step Decay (Learning Rate) vs. Cosine Annealing
Step decay is staircase (abrupt jumps); cosine annealing is smooth and continuous – gentler transition usually leads to better results.
Step Decay (Learning Rate) vs. Exponential Decay
Step decay lowers discretely at fixed points; exponential decay lowers continuously with exponential factor. Exponential is smoother but harder to tune.