. Basic Element in Neural Network Reading List

Basic Element in Neural Network Reading List

Network Structure

  • NIN: Network in network. (2014)
  • DSN: Deeply supervised nets. (2015)
  • Semi-supervised learning with ladder networks. (2015)
  • Deconstructing the ladder network architecture. (2015)
  • DFNs: Deeply-fused nets. (2016)
  • Highway Networks: Training very deep networks. (2015)
  • ResNet: Deep residual learning for image recognition. (2016)
  • Deep networks with stochastic depth. (2016)
    This paper show that many layers of ResNet contribute very little and can in fact be randomly dropped during training. Therefore this paper shortens ResNets by randomly dropping layers during training to allow better information and gradient flow.
  • Bridging the gaps between residual learning, recurrent neural networks and visual cortex
  • Fractalnet: Ultra-deep neural networks without residuals. (2016)
    This paper repeatedly combine several parallel layer sequences with different number of convolutional blocks to obtain a large nominal depth, while maintaining many short paths in the network.
  • Resnet in resnet: Generalizing residual architectures.
  • Wide residual networks.
  • Residual Networks Behave Like Ensembles of Relatively Shallow Networks. (2016)
  • Wider or Deeper: Revisiting the ResNet Model for Visual Recognition. (2016)
  • DenseNet: Densely Connected Convolutional Networks. (2017)
    This paper brings up a new network structure. For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers.
  • DRNs: Dilated Residual Networks. (2017)
  • STN: Spatial Transformer Networks. (2015)
    This paper proposes a new learnable module, the Spatial Transformer, which giving neural networks the ability to actively spatially transform feature maps, conditional on the feature map itself, without any extra training supervision or modification to the optimisation process. The spatial transformer mechanism is split into three parts. In order of computation, first a localisation network takes the input feature map, and through a number of hidden layers outputs the parameters of the spatial transformation that should be applied to the feature map – this gives a transformation conditional on the input. Then, the predicted transformation parameters are used to create a sampling grid, which is a set of points where the input map should be sampled to produce the transformed output. This is done by the grid generator. Finally, the feature map and the sampling grid are taken as inputs to the sampler, producing the output map sampled from the input at the grid points.


  • Normalized convolution: Normalized and differential convolution. (1993)
    The idea of normalized convolution is to “focus” the convolution operator on the part of the input that truly describes the input signal, avoiding the interpolation of noise or missing information.
  • Dilated Conv: Multi-scale contest aggregation by dilated convolutions.
  • Dilated Conv: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs.
  • Deconvolution: Adaptive deconvolutional networks for mid and high level feature learning. (2011)
  • Deconvolution: Visualizing and understanding convolutional networks. (2014)
  • Flattened convolutional: Flattened Convolutional Neural Networks for Feedforward Acceleration. (2014)
  • PixelDCL: Pixel Deconvolutional Networks. (2017)
    One of the key limitations of deconvolutional operations is that they result in the so-called checkerboard problem. This is caused by the fact that no direct relationship exists among adjacent pixels on the output feature map. To address this problem, we propose the pixel deconvolutional layer (PixelDCL) to establish direct relationships among adjacent pixels on the up-sampled feature map.
    This PixelDCL is very similiar with DenseNet.

Loss Function

  • Contrastive Loss: Gated Siamese Convolutional Neural Network Architecture for Human Re-Identification. (2016)
  • Triplet Loss: FaceNet. (2015)
  • Quadruplet Loss: Beyond triplet loss: a deep quadruplet network for person re-identification. (2017)
  • Margin Sample Mining Loss: A Deep Learning Based Method for Person Re-identification. (2017)
    This paper give the hard sample mining version of Triplet Loss and Quadruplet Loss.


  • DropOut: Improving neural networks by preventing co-adaptation of feature detectors. (2012)
  • Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. (2015)