usually prone to overfitting given they are often over-parameterized
- We can usually add regularization terms to the objective functions
- Early stopping
- Adding noise
- structural regularization, via adding dropout
dropout
a case of structural regularization
a technique of randomly drop each node with probability