LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998).
Gradient-based learning applied to document recognition.
Simonyan, Karen, and Zisserman. "Very deep convolutional
networks for large-scale image recognition." (2014) VGG-16
Simplified version of Krizhevsky, Alex, Sutskever, and Hinton.
"Imagenet classification with deep convolutional neural networks."
NIPS 2012 AlexNet
He, Kaiming, et al. "Deep residual learning for image
recognition." CVPR. 2016. ResNet
Szegedy, et al. "Inception-v4, inception-resnet and the impact
of residual connections on learning." (2016)
Canziani, Paszke, and Culurciello. "An Analysis of Deep Neural
Network Models for Practical Applications." (May 2016).
classification and localization
Redmon, Joseph, et al. "You only look once: Unified, real-time
object detection." CVPR (2016)
Liu, Wei, et al. "SSD: Single shot multibox detector." ECCV
Girshick, Ross, et al. "Fast r-cnn." ICCV 2015
Ren, Shaoqing, et al. "Faster r-cnn: Towards real-time object
detection with region proposal networks." NIPS 2015
Redmon, Joseph, et al. "YOLO9000, Faster, Better, Stronger."
Long, Jonathan, et al. "Fully convolutional networks for
semantic segmentation." CVPR 2015
Noh, Hyeonwoo, et al. "Learning deconvolution network for
semantic segmentation." ICCV 2015
Pinheiro, Pedro O., et al. "Learning to segment object
candidates" / "Learning to refine object segments", NIPS 2015 /
Li, Yi, et al. "Fully Convolutional Instance-aware Semantic
Segmentation." Winner of COCO challenge 2016.
弱监督学习 Weak supervision
Joulin, Armand, et al. "Learning visual features from large
weakly supervised data." ECCV, 2016
Oquab, Maxime, "Is object localization for free? –
Weakly-supervised learning with convolutional neural networks",
Doersch, Carl, Abhinav Gupta, and Alexei A. Efros.
"Unsupervised visual representation learning by context
prediction." ICCV 2015.
Ren, Mengye, et al. "Normalizing the Normalizers: Comparing
and Extending Network Normalization Schemes." 2017
Salimans, Tim, and Diederik P. Kingma. "Weight normalization:
A simple reparameterization to accelerate training of deep neural
networks." NIPS 2016.
Ba, Jimmy Lei, Jamie Ryan Kiros, and Geoffrey E. Hinton.
"Layer normalization." 2016.
Ioffe, Sergey, and Christian Szegedy. "Batch normalization:
Accelerating deep network training by reducing internal covariate
shift." ICML 2015
Understanding deep learning requires rethinking
generalization, C. Zhang et al., 2016.
On Large-Batch Training for Deep Learning: Generalization Gap
and Sharp Minima, N. S. Keskar et al., 2016
1. A strong optimizer is not necessarily a strong
2. DL optimization is non-convex but bad local minima and
saddle structures are rarely a problem (on common DL tasks).
3. Neural Networks are over-parametrized but can still
4. Stochastic Gradient is a strong implicit regularizer.
5. Variance in gradient can help with generalization but can
hurt final convergence.
6. We need more theory to guide the design of architectures
and optimizers that make learning faster with fewer labels.
7. Overparametrize deep architectures
8. Design architectures to limit conditioning issues:
（1）Use skip / residual connections
（2）Internal normalization layers
（3）Use stochastic optimizers that are robust to bad
9. Use small minibatches (at least at the beginning of
10. Use validation set to anneal learning rate and do early
11. Is it very often possible to trade more compute for less
overfitting with data augmentation and stochastic regularizers
12. Collecting more labelled data is the best way to avoid