Loss Landscape
The multi-dimensional surface representing loss as a function of model parameters – the "mountain" that gradient descent descends.
The loss landscape shows loss as a function of all parameters – flat minima generalize better, sharp ones are more fragile. SGD tends to find flatter minima than Adam.
Explanation
Loss landscapes of modern networks have many local minima, saddle points, and flat regions. Flatter minima often generalize better.
Marketing Relevance
Understanding the loss landscape explains why certain optimizers, learning rates, and batch sizes work better.
Common Pitfalls
Visualizations are 2D projections of high-dimensional spaces. Flatness ≠ always better generalization. Local minima less problematic than often assumed.
Origin & History
Li et al. (2018) developed visualization methods for loss landscapes of deep networks ("Visualizing the Loss Landscape of Neural Nets"). The paper showed that skip connections smooth the landscape and facilitate training.
Comparisons & Differences
Loss Landscape vs. Loss Function
Loss function defines what is measured (e.g., cross-entropy); loss landscape shows how this value behaves across all possible parameter configurations.
Loss Landscape vs. Gradient Descent
The loss landscape is the map; gradient descent is the hiker searching for the path downhill.