Knowledge Transfer

- Model compression. (2006)
- Distilling the Knowledge in a Neural Network. (2014)
- FitNets: Hints for Thin Deep Nets. (2014)

This paper aim to address the network compression problem by taking advantage of depth. It propose a novel approach to train thin and deep networks, called FitNets, to compress wide and shallower (but still deep) networks. The method is rooted in the recently proposed Knowledge Distillation (KD) and extends the idea to allow for thinner and deeper student models. It introduce intermediate-level hints from the teacher hidden layers to guide the training process of the student, i.e., want the student network (FitNet) to learn an intermediate representation that is predictive of the intermediate representations of the teacher network. Hints allow the training of thinner and deeper networks. - Accelerating convolutional neural networks with dominant convolutional kernel and knowledge pre-regression. (2016 ECCV)
- Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. (2017)
- A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. (2017)
- Like What You Like: Knowledge Distill via Neuron Selectivity Transfer. (2017)

This paper propose a novel knowledge transfer method by treating it as a distribution matching problem. Particularly, it match the distributions of neuron selectivity patterns between teacher and student networks. To achieve this goal, it devises a new KT loss function by minimizing the Maximum Mean Discrepancy (MMD) metric between these distributions combined with the original loss function.

In the other word, this paper add a MMD loss term to the loss function of FitNets.

Network Pruning

- Learning both Weights and Connections for Efficient Neural Networks. (2015)

This paper reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections. Our method prunes redundant connections using a three-step method. First, we train the network to learn which connections are important. Next, we prune the unimportant connections. Finally, we retrain the network to fine tune the weights of the remaining connections. - Channel pruning for accelerating very deep neural networks. (2017)
- Pruning filters for efficient ConvNets. (2017)
- ThiNet: A filter level pruning method for deep neural network compression. (2017)
- Pruning convolutional neural networks for resource efficient inference. (2017)

Network Quantization

- Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or -1. (2016)
- XNOR-Net: ImageNet classification using binary convolutional neural networks. (2016)

- Siamese Neural Networks for One-shot Image Recognition. (2015)
- Matching Networks for One Shot Learning. (2016)
- Meta-Learning with Memory-Augmented Neural Networks. (2016)
- Meta-Learning with Temporal Convolutions. (2017)
- Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. (2017)
- Meta Networks. (2017)