Enriching word vectors with subword information P Bojanowski, E Grave, A Joulin, T Mikolov Transactions of the Association for Computational Linguistics 5, 135-146, 2017 | 5476 | 2017 |
Bag of tricks for efficient text classification A Joulin, E Grave, P Bojanowski, T Mikolov arXiv preprint arXiv:1607.01759, 2016 | 2744 | 2016 |
Learning word vectors for 157 languages E Grave, P Bojanowski, P Gupta, A Joulin, T Mikolov arXiv preprint arXiv:1802.06893, 2018 | 687 | 2018 |
Advances in pre-training distributed word representations T Mikolov, E Grave, P Bojanowski, C Puhrsch, A Joulin arXiv preprint arXiv:1712.09405, 2017 | 654 | 2017 |
Fasttext. zip: Compressing text classification models A Joulin, E Grave, P Bojanowski, M Douze, H Jégou, T Mikolov arXiv preprint arXiv:1612.03651, 2016 | 519 | 2016 |
Parseval networks: Improving robustness to adversarial examples M Cisse, P Bojanowski, E Grave, Y Dauphin, N Usunier International Conference on Machine Learning, 854-863, 2017 | 437 | 2017 |
Unsupervised cross-lingual representation learning at scale A Conneau, K Khandelwal, N Goyal, V Chaudhary, G Wenzek, F Guzmán, ... arXiv preprint arXiv:1911.02116, 2019 | 398 | 2019 |
Colorless green recurrent networks dream hierarchically K Gulordava, P Bojanowski, E Grave, T Linzen, M Baroni arXiv preprint arXiv:1803.11138, 2018 | 245 | 2018 |
Improving neural language models with a continuous cache E Grave, A Joulin, N Usunier arXiv preprint arXiv:1612.04426, 2016 | 201 | 2016 |
Trace lasso: a trace norm regularization for correlated designs E Grave, G Obozinski, F Bach arXiv preprint arXiv:1109.1990, 2011 | 194 | 2011 |
Efficient softmax approximation for gpus E Grave, A Joulin, M Cissé, D Grangier, H Jégou International Conference on Machine Learning, 1302-1310, 2017 | 181* | 2017 |
Loss in translation: Learning bilingual word mapping with a retrieval criterion A Joulin, P Bojanowski, T Mikolov, H Jégou, E Grave arXiv preprint arXiv:1804.07745, 2018 | 137 | 2018 |
Weakly-supervised alignment of video with text P Bojanowski, R Lajugie, E Grave, F Bach, I Laptev, J Ponce, C Schmid Proceedings of the IEEE international conference on computer vision, 4462-4470, 2015 | 111 | 2015 |
Learning probabilistic phenotypes from heterogeneous EHR data R Pivovarov, AJ Perotte, E Grave, J Angiolillo, CH Wiggins, N Elhadad Journal of biomedical informatics 58, 156-165, 2015 | 100 | 2015 |
Reducing transformer depth on demand with structured dropout A Fan, E Grave, A Joulin arXiv preprint arXiv:1909.11556, 2019 | 92 | 2019 |
Unsupervised alignment of embeddings with wasserstein procrustes E Grave, A Joulin, Q Berthet The 22nd International Conference on Artificial Intelligence and Statistics …, 2019 | 80 | 2019 |
Adaptive attention span in transformers S Sukhbaatar, E Grave, P Bojanowski, A Joulin arXiv preprint arXiv:1905.07799, 2019 | 77 | 2019 |
End-to-end asr: from supervised to semi-supervised learning with modern architectures G Synnaeve, Q Xu, J Kahn, T Likhomanenko, E Grave, V Pratap, A Sriram, ... arXiv preprint arXiv:1911.08460, 2019 | 55 | 2019 |
Can you tell me how to get past sesame street? sentence-level pretraining beyond language modeling A Wang, J Hula, P Xia, R Pappagari, RT McCoy, R Patel, N Kim, I Tenney, ... arXiv preprint arXiv:1812.10860, 2018 | 47* | 2018 |
Variable computation in recurrent neural networks Y Jernite, E Grave, A Joulin, T Mikolov arXiv preprint arXiv:1611.06188, 2016 | 46 | 2016 |