Awesome Knowledge-Distillation
Awesome Knowledge-Distillation
Different forms of knowledge
Knowledge from logitsKnowledge from intermediate layersGraph-basedMutual InformationSelf-KDStructured KnowledgePrivileged Information KD + GANKD + Meta-learningData-free KDKD + AutoMLKD + RLMulti-teacher KD
Knowledge Amalgamation(KA) - zju-VIPA Cross-modal KD & DAApplication of KD
for NLP Model Pruning or QuantizationBeyond
Different forms of knowledge
Knowledge from logits
Distilling the knowledge in a neural network. Hinton et al. arXiv:1503.02531Learning from Noisy Labels with Distillation. Li, Yuncheng et al. ICCV 2017Training Deep Neural Networks in Generations:A More Tolerant Teacher Educates Better Students. arXiv:1805.05551Knowledge distillation by on-the-fly native ensemble. Lan, Xu et al. NIPS 2018Learning Metrics from Teachers: Compact Networks for Image Embedding. Yu, Lu et al. CVPR 2019Relational Knowledge Distillation. Park, Wonpyo et al, CVPR 2019Like What You Like: Knowledge Distill via Neuron Selectivity Transfer. Huang, Zehao and Wang, Naiyan. 2017On Knowledge Distillation from Complex Networks for Response Prediction. Arora, Siddhartha et al. NAACL 2019On the Efficacy of Knowledge Distillation. Cho, Jang Hyun and Hariharan, Bharath. arXiv:1910.01348. ICCV 2019[noval]Revisit Knowledge Distillation: a Teacher-free Framework. Yuan, Li et al. arXiv:1909.11723Improved Knowledge Distillation via Teacher Assistant: Bridging the Gap Between Student and Teacher. Mirzadeh et al. arXiv:1902.03393Ensemble Distribution Distillation. ICLR 2020Noisy Collaboration in Knowledge Distillation. ICLR 2020On Compressing U-net Using Knowledge Distillation. arXiv:1812.00249Distillation-Based Training for Multi-Exit Architectures. Phuong, Mary and Lampert, Christoph H. ICCV 2019Self-training with Noisy Student improves ImageNet classification. Xie, Qizhe et al.(Google) CVPR 2020Variational Student: Learning Compact and Sparser Networks in Knowledge Distillation Framework. arXiv:1910.12061Preparing Lessons: Improve Knowledge Distillation with Better Supervision. arXiv:1911.07471Adaptive Regularization of Labels. arXiv:1908.05474Positive-Unlabeled Compression on the Cloud. Xu, Yixing(HUAWEI) et al. NIPS 2019Snapshot Distillation: Teacher-Student Optimization in One Generation. Yang, Chenglin et al. CVPR 2019QUEST: Quantized embedding space for transferring knowledge. Jain, Himalaya et al. CVPR 2020(pre)Conditional teacher-student learning. Z. Meng et al. ICASSP 2019Subclass Distillation. Müller, Rafael et al. arXiv:2002.03936MarginDistillation: distillation for margin-based softmax. Svitov, David & Alyamkin, Sergey. arXiv:2003.02586An Embarrassingly Simple Approach for Knowledge Distillation. Gao, Mengya et al. MLR 2018Sequence-Level Knowledge Distillation. Kim, Yoon & Rush, Alexander M. arXiv:1606.07947Boosting Self-Supervised Learning via Knowledge Transfer. Noroozi, Mehdi et al. CVPR 2018Meta Pseudo Labels. Pham, Hieu et al. ICML 2020
Knowledge from intermediate layers
Fitnets: Hints for thin deep nets. Romero, Adriana et al. arXiv:1412.6550Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. Zagoruyko et al. ICLR 2017Knowledge Projection for Effective Design of Thinner and Faster Deep Neural Networks. Zhang, Zhi et al. arXiv:1710.09505A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning. Yim, Junho et al. CVPR 2017Paraphrasing complex network: Network compression via factor transfer. Kim, Jangho et al. NIPS 2018Knowledge transfer with jacobian matching. ICML 2018Self-supervised knowledge distillation using singular value decomposition. Lee, Seung Hyun et al. ECCV 2018Variational Information Distillation for Knowledge Transfer. Ahn, Sungsoo et al. CVPR 2019 9
Knowledge Distillation via Instance Relationship Graph. Liu, Yufan et al. CVPR 2019Knowledge Distillation via Route Constrained Optimization. Jin, Xiao et al. ICCV 2019Similarity-Preserving Knowledge Distillation. Tung, Frederick, and Mori Greg. ICCV 2019MEAL: Multi-Model Ensemble via Adversarial Learning. Shen,Zhiqiang, He,Zhankui, and Xue Xiangyang. AAAI 2019A Comprehensive Overhaul of Feature Distillation. Heo, Byeongho et al. ICCV 2019Feature-map-level Online Adversarial Knowledge Distillation. ICLR 2020Distilling Object Detectors with Fine-grained Feature Imitation. ICLR 2020Knowledge Squeezed Adversarial Network Compression. Changyong, Shu et al. AAAI 2020Stagewise Knowledge Distillation. Kulkarni, Akshay et al. arXiv: 1911.06786Knowledge Distillation from Internal Representations. AAAI 2020Knowledge Flow:Improve Upon Your Teachers. ICLR 2019LIT: Learned Intermediate Representation Training for Model Compression. ICML 2019Learning Deep Representations with Probabilistic Knowledge Transfer. Passalis et al. ECCV 2018Improving the Adversarial Robustness of Transfer Learning via Noisy Feature Distillation. Chin, Ting-wu et al. arXiv:2002.02998Knapsack Pruning with Inner Distillation. Aflalo, Yonathan et al. arXiv:2002.08258Residual Knowledge Distillation. Gao, Mengya et al. arXiv:2002.09168Knowledge distillation via adaptive instance normalization. Yang, Jing et al. arXiv:2003.04289Bert-of-Theseus: Compressing bert by progressive module replacing. Xu, Canwen et al. arXiv:2002.02925 [code]
Graph-based
Graph-based Knowledge Distillation by Multi-head Attention Network. Lee, Seunghyun and Song, Byung. Cheol arXiv:1907.02226Graph Representation Learning via Multi-task Knowledge Distillation. arXiv:1911.05700Deep geometric knowledge distillation with graphs. arXiv:1911.03080Better and faster: Knowledge transfer from multiple self-supervised learning tasks via graph distillation for video classification. IJCAI 2018Distillating Knowledge from Graph Convolutional Networks. Yang, Yiding et al. arXiv:2003.10477
Mutual Information
Correlation Congruence for Knowledge Distillation. Peng, Baoyun et al. ICCV 2019Similarity-Preserving Knowledge Distillation. Tung, Frederick, and Mori Greg. ICCV 2019Variational Information Distillation for Knowledge Transfer. Ahn, Sungsoo et al. CVPR 2019Contrastive Representation Distillation. Tian, Yonglong et al. ICLR 2020
Self-KD
Moonshine:Distilling with Cheap Convolutions. Crowley, Elliot J. et al. NIPS 2018Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation. Zhang, Linfeng et al. ICCV 2019Learning Lightweight Lane Detection CNNs by Self Attention Distillation. Hou, Yuenan et al. ICCV 2019BAM! Born-Again Multi-Task Networks for Natural Language Understanding. Clark, Kevin et al. ACL 2019,shortSelf-Knowledge Distillation in Natural Language Processing. Hahn, Sangchul and Choi, Heeyoul. arXiv:1908.01851Rethinking Data Augmentation: Self-Supervision and Self-Distillation. Lee, Hankook et al. ICLR 2020Regularizing Predictions via Class wise Self knowledge Distillation. ICLR 2020MSD: Multi-Self-Distillation Learning via Multi-classifiers within Deep Neural Networks. arXiv:1911.09418Self-Distillation Amplifies Regularization in Hilbert Space. Mobahi, Hossein et al. arXiv:2002.05715MINILM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers. Wang, Wenhui et al. arXiv:2002.10957
Structured Knowledge
Paraphrasing Complex Network:Network Compression via Factor Transfer. Kim, Jangho et al. NIPS 2018Relational Knowledge Distillation. Park, Wonpyo et al. CVPR 2019
Knowledge Distillation via Instance Relationship Graph. Liu, Yufan et al. CVPR 2019
Contrastive Representation Distillation. Tian, Yonglong et al. arXiv: 1910.10699Teaching To Teach By Structured Dark Knowledge. ICLR 2020
Privileged Information
Learning using privileged information: similarity control and knowledge transfer. Vapnik, Vladimir and Rauf, Izmailov. MLR 2015Unifying distillation and privileged information. Lopez-Paz, David et al. ICLR 2016Model compression via distillation and quantization. Polino, Antonio et al. ICLR 2018KDGAN:Knowledge Distillation with Generative Adversarial Networks. Wang, Xiaojie. NIPS 2018[noval]Efficient Video Classification Using Fewer Frames. Bhardwaj, Shweta et al. CVPR 2019Retaining privileged information for multi-task learning. Tang, Fengyi et al. KDD 2019A Generalized Meta-loss function for regression and classification using privileged information. Asif, Amina et al. arXiv:1811.06885
KD + GAN
Training Shallow and Thin Networks for Acceleration via Knowledge Distillation with Conditional Adversarial Networks. Xu, Zheng et al. arXiv:1709.00513KTAN: Knowledge Transfer Adversarial Network. Liu, Peiye et al. arXiv:1810.08126KDGAN:Knowledge Distillation with Generative Adversarial Networks. Wang, Xiaojie. NIPS 2018Adversarial Learning of Portable Student Networks. Wang, Yunhe et al. AAAI 2018Adversarial Network Compression. Belagiannis, Vasileios et al. ECCV 2018Cross-Modality Distillation: A case for Conditional Generative Adversarial Networks. ICASSP 2018Adversarial Distillation for Efficient Recommendation with External Knowledge. TOIS 2018Training student networks for acceleration with conditional adversarial networks. Xu, Zheng et al. BMVC 2018[noval]DAFL:Data-Free Learning of Student Networks. Chen, Hanting et al. ICCV 2019MEAL: Multi-Model Ensemble via Adversarial Learning. Shen,Zhiqiang, He,Zhankui, and Xue Xiangyang. AAAI 2019Knowledge Distillation with Adversarial Samples Supporting Decision Boundary. Heo, Byeongho et al. AAAI 2019Exploiting the Ground-Truth: An Adversarial Imitation Based Knowledge Distillation Approach for Event Detection. Liu, Jian et al. AAAI 2019Adversarially Robust Distillation. Goldblum, Micah et al. AAAI 2020GAN-Knowledge Distillation for one-stage Object Detection. Hong, Wei et al. arXiv:1906.08467Lifelong GAN: Continual Learning for Conditional Image Generation. Kundu et al. arXiv:1908.03884Compressing GANs using Knowledge Distillation. Aguinaldo, Angeline et al. arXiv:1902.00159Feature-map-level Online Adversarial Knowledge Distillation. ICLR 2020MineGAN: effective knowledge transfer from GANs to target domains with few images. Wang, Yaxing et al. arXiv:1912.05270Distilling portable Generative Adversarial Networks for Image Translation. Chen, Hanting et al. AAAI 2020GAN Compression: Efficient Architectures for Interactive Conditional GANs. Junyan Zhu et al. CVPR 2020 [code]
KD + Meta-learning
Few Sample Knowledge Distillation for Efficient Network Compression. Li, Tianhong et al. ICLR 2020Learning What and Where to Transfer. Jang, Yunhun et al, ICML 2019Transferring Knowledge across Learning Processes. Moreno, Pablo G et al. ICLR 2019Semantic-Aware Knowledge Preservation for Zero-Shot Sketch-Based Image Retrieval. Liu, Qing et al. ICCV 2019Diversity with Cooperation: Ensemble Methods for Few-Shot Classification. Dvornik, Nikita et al. ICCV 2019Knowledge Representing: Efficient, Sparse Representation of Prior Knowledge for Knowledge Distillation. arXiv:1911.05329v1Progressive Knowledge Distillation For Generative Modeling. ICLR 2020Few Shot Network Compression via Cross Distillation. AAAI 2020
Data-free KD
Data-Free Knowledge Distillation for Deep Neural Networks. NIPS 2017Zero-Shot Knowledge Distillation in Deep Networks. ICML 2019DAFL:Data-Free Learning of Student Networks. ICCV 2019Zero-shot Knowledge Transfer via Adversarial Belief Matching. Micaelli, Paul and Storkey, Amos. NIPS 2019Dream Distillation: A Data-Independent Model Compression Framework. Kartikeya et al. ICML 2019Dreaming to Distill: Data-free Knowledge Transfer via DeepInversion. Yin, Hongxu et al. CVPR 2020Data-Free Adversarial Distillation. Fang, Gongfan et al. CVPR 2020The Knowledge Within: Methods for Data-Free Model Compression. Haroush, Matan et al. arXiv:1912.01274Knowledge Extraction with No Observable Data. Yoo, Jaemin et al. NIPS 2019 [code]Data-Free Knowledge Amalgamation via Group-Stack Dual-GAN. CVPR 2020
other data-free model compression:
Data-free Parameter Pruning for Deep Neural Networks. Srinivas, Suraj et al. arXiv:1507.06149Data-Free Quantization Through Weight Equalization and Bias Correction. Nagel, Markus et al. ICCV 2019ZeroQ: A Novel Zero Shot Quantization Framework. Cai, Yaohui et al. arxiv:2001.00281
KD + AutoML
Improving Neural Architecture Search Image Classifiers via Ensemble Learning. Macko, Vladimir et al. 2019Blockwisely Supervised Neural Architecture Search with Knowledge Distillation. Li, Changlin et al. arXiv:1911.13053v1Towards Oracle Knowledge Distillation with Neural Architecture Search. Kang, Minsoo et al. AAAI 2020Search for Better Students to Learn Distilled Knowledge. Gu, Jindong & Tresp, Volker arXiv:2001.11612Circumventing Outliers of AutoAugment with Knowledge Distillation. Wei, Longhui et al. arXiv:2003.11342
KD + RL
N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning. Ashok, Anubhav et al. ICLR 2018Knowledge Flow:Improve Upon Your Teachers. Liu, Iou-jen et al. ICLR 2019Transferring Knowledge across Learning Processes. Moreno, Pablo G et al. ICLR 2019Exploration by random network distillation. Burda, Yuri et al. ICLR 2019Periodic Intra-Ensemble Knowledge Distillation for Reinforcement Learning. Hong, Zhang-Wei et al. arXiv:2002.00149Transfer Heterogeneous Knowledge Among Peer-to-Peer Teammates: A Model Distillation Approach. Xue, Zeyue et al. arXiv:2002.02202
Multi-teacher KD
Learning from Multiple Teacher Networks. You, Shan et al. KDD 2017Semi-Supervised Knowledge Transfer for Deep Learning from Private Training Data. ICLR 2017
Knowledge Adaptation: Teaching to Adapt. Arxiv:1702.02052
Deep Model Compression: Distilling Knowledge from Noisy Teachers. Sau, Bharat Bhusan et al. arXiv:1610.09650v2Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Tarvainen, Antti and Valpola, Harri. NIPS 2017Born-Again Neural Networks. Furlanello, Tommaso et al. ICML 2018
Deep Mutual Learning. Zhang, Ying et al. CVPR 2018
Knowledge distillation by on-the-fly native ensemble. Lan, Xu et al. NIPS 2018Collaborative learning for deep neural networks. Song, Guocong and Chai, Wei. NIPS 2018Data Distillation: Towards Omni-Supervised Learning. Radosavovic, Ilija et al. CVPR 2018Multilingual Neural Machine Translation with Knowledge Distillation. ICLR 2019
Unifying Heterogeneous Classifiers with Distillation. Vongkulbhisal et al. CVPR 2019
Distilled Person Re-Identification: Towards a More Scalable System. Wu, Ancong et al. CVPR 2019
Diversity with Cooperation: Ensemble Methods for Few-Shot Classification. Dvornik, Nikita et al. ICCV 2019Model Compression with Two-stage Multi-teacher Knowledge Distillation for Web Question Answering System. Yang, Ze et al. WSDM 2020FEED: Feature-level Ensemble for Knowledge Distillation. Park, SeongUk and Kwak, Nojun. arXiv:1909.10754(AAAI20 pre)Stochasticity and Skip Connection Improve Knowledge Transfer. Lee, Kwangjin et al. ICLR 2020Online Knowledge Distillation with Diverse Peers. Chen, Defang et al. AAAI 2020Hydra: Preserving Ensemble Diversity for Model Distillation. Tran, Linh et al. arXiv:2001.04694Distilled Hierarchical Neural Ensembles with Adaptive Inference Cost. Ruiz, Adria et al. arXv:2003.01474
Knowledge Amalgamation(KA) - zju-VIPA
VIPA - KA
Amalgamating Knowledge towards Comprehensive Classification. Shen, Chengchao et al. AAAI 2019Amalgamating Filtered Knowledge : Learning Task-customized Student from Multi-task Teachers. Ye, Jingwen et al. IJCAI 2019Knowledge Amalgamation from Heterogeneous Networks by Common Feature Learning. Luo, Sihui et al. IJCAI 2019Student Becoming the Master: Knowledge Amalgamation for Joint Scene Parsing, Depth Estimation, and More. Ye, Jingwen et al. CVPR 2019Customizing Student Networks From Heterogeneous Teachers via Adaptive Knowledge Amalgamation. ICCV 2019Data-Free Knowledge Amalgamation via Group-Stack Dual-GAN. CVPR 2020
Cross-modal KD & DA
SoundNet: Learning Sound Representations from Unlabeled Video SoundNet Architecture. Aytar, Yusuf et al. ECCV 2016Cross Modal Distillation for Supervision Transfer. Gupta, Saurabh et al. CVPR 2016Emotion recognition in speech using cross-modal transfer in the wild. Albanie, Samuel et al. ACM MM 2018Through-Wall Human Pose Estimation Using Radio Signals. Zhao, Mingmin et al. CVPR 2018Compact Trilinear Interaction for Visual Question Answering. Do, Tuong et al. ICCV 2019Cross-Modal Knowledge Distillation for Action Recognition. Thoker, Fida Mohammad and Gall, Juerge. ICIP 2019Learning to Map Nearly Anything. Salem, Tawfiq et al. arXiv:1909.06928Semantic-Aware Knowledge Preservation for Zero-Shot Sketch-Based Image Retrieval. Liu, Qing et al. ICCV 2019UM-Adapt: Unsupervised Multi-Task Adaptation Using Adversarial Cross-Task Distillation. Kundu et al. ICCV 2019CrDoCo: Pixel-level Domain Transfer with Cross-Domain Consistency. Chen, Yun-Chun et al. CVPR 2019XD:Cross lingual Knowledge Distillation for Polyglot Sentence Embeddings. ICLR 2020Effective Domain Knowledge Transfer with Soft Fine-tuning. Zhao, Zhichen et al. arXiv:1909.02236ASR is all you need: cross-modal distillation for lip reading. Afouras et al. arXiv:1911.12747v1Knowledge distillation for semi-supervised domain adaptation. arXiv:1908.07355Domain Adaptation via Teacher-Student Learning for End-to-End Speech Recognition. Meng, Zhong et al. arXiv:2001.01798Cluster Alignment with a Teacher for Unsupervised Domain Adaptation. ICCV 2019Attention Bridging Network for Knowledge Transfer. Li, Kunpeng et al. ICCV 2019Unpaired Multi-modal Segmentation via Knowledge Distillation. Dou, Qi et al. arXiv:2001.03111Multi-source Distilling Domain Adaptation. Zhao, Sicheng et al. arXiv:1911.11554
Application of KD
Face model compression by distilling knowledge from neurons. Luo, Ping et al. AAAI 2016Learning efficient object detection models with knowledge distillation. Chen, Guobin et al. NIPS 2017Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy. Mishra, Asit et al. NIPS 2018Distilled Person Re-identification: Towars a More Scalable System. Wu, Ancong et al. CVPR 2019[noval]Efficient Video Classification Using Fewer Frames. Bhardwaj, Shweta et al. CVPR 2019Fast Human Pose Estimation. Zhang, Feng et al. CVPR 2019Distilling knowledge from a deep pose regressor network. Saputra et al. arXiv:1908.00858 (2019)Learning Lightweight Lane Detection CNNs by Self Attention Distillation. Hou, Yuenan et al. ICCV 2019Structured Knowledge Distillation for Semantic Segmentation. Liu, Yifan et al. CVPR 2019Relation Distillation Networks for Video Object Detection. Deng, Jiajun et al. ICCV 2019Teacher Supervises Students How to Learn From Partially Labeled Images for Facial Landmark Detection. Dong, Xuanyi and Yang, Yi. ICCV 2019Progressive Teacher-student Learning for Early Action Prediction. Wang, Xionghui et al. CVPR2019Lightweight Image Super-Resolution with Information Multi-distillation Network. Hui, Zheng et al. ICCVW 2019AWSD:Adaptive Weighted Spatiotemporal Distillation for Video Representation. Tavakolian, Mohammad et al. ICCV 2019Dynamic Kernel Distillation for Efficient Pose Estimation in Videos. Nie, Xuecheng et al. ICCV 2019Teacher Guided Architecture Search. Bashivan, Pouya and Tensen, Mark. ICCV 2019Online Model Distillation for Efficient Video Inference. Mullapudi et al. ICCV 2019Distilling Object Detectors with Fine-grained Feature Imitation. Wang, Tao et al. CVPR2019Relation Distillation Networks for Video Object Detection. Deng, Jiajun et al. ICCV 2019Knowledge Distillation for Incremental Learning in Semantic Segmentation. arXiv:1911.03462MOD: A Deep Mixture Model with Online Knowledge Distillation for Large Scale Video Temporal Concept Localization. arXiv:1910.12295Teacher-Students Knowledge Distillation for Siamese Trackers. arXiv:1907.10586LaTeS: Latent Space Distillation for Teacher-Student Driving Policy Learning. Zhao, Albert et al. CVPR 2020(pre)Knowledge Distillation for Brain Tumor Segmentation. arXiv:2002.03688ROAD: Reality Oriented Adaptation for Semantic Segmentation of Urban Scenes. Chen, Yuhua et al. CVPR 2018Next Point-of-Interest Recommendation on Resource-Constrained Mobile Devices. WWW 2020Multi-Representation Knowledge Distillation For Audio Classification. Gao, Liang et al. arXiv:2002.09607Collaborative Distillation for Ultra-Resolution Universal Style Transfer. Wang, Huan et al. CVPR 2020ShadowTutor: Distributed Partial Distillation for Mobile Video DNN Inference. Chung, Jae-Won et al. arXiv:2003.10735Object Relational Graph with Teacher-Recommended Learning for Video Captioning. Zhang, Ziqi et al. CVPR 2020
for NLP
Patient Knowledge Distillation for BERT Model Compression. Sun, Siqi et al. arXiv:1908.09355TinyBERT: Distilling BERT for Natural Language Understanding. Jiao, Xiaoqi et al. arXiv:1909.10351Learning to Specialize with Knowledge Distillation for Visual Question Answering. NIPS 2018Knowledge Distillation for Bilingual Dictionary Induction. EMNLP 2017A Teacher-Student Framework for Maintainable Dialog Manager. EMNLP 2018Understanding Knowledge Distillation in Non-Autoregressive Machine Translation. arxiv 2019DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. Sanh, Victor et al. arXiv:1910.01108Well-Read Students Learn Better: On the Importance of Pre-training Compact Models. Turc, Iulia et al. arXiv:1908.08962On Knowledge distillation from complex networks for response prediction. Arora, Siddhartha et al. NAACL 2019Distilling the Knowledge of BERT for Text Generation. arXiv:1911.03829v1Understanding Knowledge Distillation in Non-autoregressive Machine Translation. arXiv:1911.02727MobileBERT: Task-Agnostic Compression of BERT by Progressive Knowledge Transfer. ICLR 2020Acquiring Knowledge from Pre-trained Model to Neural Machine Translation. Weng, Rongxiang et al. AAAI 2020TwinBERT: Distilling Knowledge to Twin-Structured BERT Models for Efficient Retrieval. Lu, Wenhao et al. KDD 2020Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation. Xu, Yige et al. arXiv:2002.10345
Model Pruning or Quantization
Accelerating Convolutional Neural Networks with Dominant Convolutional Kernel and Knowledge Pre-regression. ECCV 2016N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning. Ashok, Anubhav et al. ICLR 2018Slimmable Neural Networks. Yu, Jiahui et al. ICLR 2018Co-Evolutionary Compression for Unpaired Image Translation. Shu, Han et al. ICCV 2019MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning. Liu, Zechun et al. ICCV 2019LightPAFF: A Two-Stage Distillation Framework for Pre-training and Fine-tuning. ICLR 2020Pruning with hints: an efficient framework for model acceleration. ICLR 2020Training convolutional neural networks with cheap convolutions and online distillation. arXiv:1909.13063Cooperative Pruning in Cross-Domain Deep Neural Network Compression. Chen, Shangyu et al. IJCAI 2019QKD: Quantization-aware Knowledge Distillation. Kim, Jangho et al. arXiv:1911.12491v1
Beyond
Do deep nets really need to be deep?. Ba,Jimmy, and Rich Caruana. NIPS 2014When Does Label Smoothing Help? Müller, Rafael, Kornblith, and Hinton. NIPS 2019Towards Understanding Knowledge Distillation. Phuong, Mary and Lampert, Christoph. AAAI 2019Harnessing deep neural networks with logucal rules. ACL 2016
Adaptive Regularization of Labels. Ding, Qianggang et al. arXiv:1908.05474Knowledge Isomorphism between Neural Networks. Liang, Ruofan et al. arXiv:1908.01581Role-Wise Data Augmentation for Knowledge Distillation. ICLR 2020Neural Network Distiller: A Python Package For DNN Compression Research. arXiv:1910.12232(survey)Modeling Teacher-Student Techniques in Deep Neural Networks for Knowledge Distillation. arXiv:1912.13179Understanding and Improving Knowledge Distillation. Tang, Jiaxi et al. arXiv:2002.03532The State of Knowledge Distillation for Classification. Ruffy, Fabian and Chahal, Karanbir. arXiv:1912.10850 [code]TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language Processing. HIT and iFLYTEK. arXiv:2002.12620Explaining Knowledge Distillation by Quantifying the Knowledge. Zhang, Quanshi et al. aiXiv:2003.03622DeepVID: deep visual interpretation and diagnosis for image classifiers via knowledge distillation. IEEE Trans, 2019.
Note: All papers pdf can be found and downloaded on Bing or Google.
Source: https://github.com/FLHonker/Awesome-Knowledge-Distillation
Contact: Yuang Liu(frankliu624@outlook.com), AIDA, ECNU.