Nitish Shirish Keskar

Cited by

	All	Since 2019
Citations	14608	13911
h-index	30	29
i10-index	43	43

4500

2250

1125

3375

20172018201920202021202220232024139 503 1006 1545 1707 2167 2961 4492

Public access

View all

5 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Richard Socheryou.comVerified email at stanford.edu
Caiming XiongSalesforce ResearchVerified email at salesforce.com
Bryan McCannYou.comVerified email at you.com
Jorge NocedalProfessor, Industrial Engineering, Northwestern UniversityVerified email at NORTHWESTERN.EDU
Dheevatsa MudigereDistinguished Engineer, NVIDIAVerified email at nvidia.com
Mikhail SmelyanskiyFacebookVerified email at intel.com
Lav R. VarshneyUniversity of Illinois Urbana-ChampaignVerified email at illinois.edu
Stephen MerityVerified email at smerity.com
Nikhil NaikMITVerified email at mit.edu
Akhilesh Deepak GotmareSalesforce ResearchVerified email at salesforce.com
Ali MadaniProfluent BioVerified email at berkeley.edu
Nazneen RajaniHugging FaceVerified email at huggingface.co
Huan WangSalesforce ResearchVerified email at yale.edu
Semih YavuzSalesforce ResearchVerified email at salesforce.com
Albert S. BerahasAssistant Professor, University of MichiganVerified email at umich.edu
Raphael R EguchiStanford UniversityVerified email at alumni.stanford.edu
Tong NiuSalesforce ResearchVerified email at salesforce.com
Karim AhmedDartmouth College, Samsung Research AmericaVerified email at dartmouth.edu
Jasdeep SinghStanford UniversityVerified email at stanford.edu
Yingbo ZhouSenior Research Director, Salesforce ResearchVerified email at salesforce.com

Nitish Shirish Keskar

OpenAI

Verified email at openai.com - Homepage

Deep Learning Mathematical Optimization Natural Language Processing


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
On large-batch training for deep learning: Generalization gap and sharp minima NS Keskar, D Mudigere, J Nocedal, M Smelyanskiy, PTP Tang arXiv preprint arXiv:1609.04836, 2016	3544	2016
Gpt-4 technical report J Achiam, S Adler, S Agarwal, L Ahmad, I Akkaya, FL Aleman, D Almeida, ... arXiv preprint arXiv:2303.08774, 2023	3378*	2023
Regularizing and optimizing LSTM language models S Merity, NS Keskar, R Socher arXiv preprint arXiv:1708.02182, 2017	1297	2017
Ctrl: A conditional transformer language model for controllable generation NS Keskar, B McCann, LR Varshney, C Xiong, R Socher arXiv preprint arXiv:1909.05858, 2019	1167	2019
Beyond the imitation game: Quantifying and extrapolating the capabilities of language models A Srivastava, A Rastogi, A Rao, AAM Shoeb, A Abid, A Fisch, AR Brown, ... arXiv preprint arXiv:2206.04615, 2022	879	2022
The natural language decathlon: Multitask learning as question answering B McCann, NS Keskar, C Xiong, R Socher arXiv preprint arXiv:1806.08730, 2018	671	2018
Improving generalization performance by switching from adam to sgd NS Keskar, R Socher arXiv preprint arXiv:1712.07628, 2017	649	2017
Neural text summarization: A critical evaluation W Kryściński, NS Keskar, B McCann, C Xiong, R Socher arXiv preprint arXiv:1908.08960, 2019	395	2019
Gedi: Generative discriminator guided sequence generation B Krause, AD Gotmare, B McCann, NS Keskar, S Joty, R Socher, ... arXiv preprint arXiv:2009.06367, 2020	335	2020
A closer look at deep learning heuristics: Learning rate restarts, warmup and distillation A Gotmare, NS Keskar, C Xiong, R Socher arXiv preprint arXiv:1810.13243, 2018	302	2018
Progen: Language modeling for protein generation A Madani, B McCann, N Naik, NS Keskar, N Anand, RR Eguchi, ... arXiv preprint arXiv:2004.03497, 2020	252	2020
An analysis of neural language modeling at multiple scales S Merity, NS Keskar, R Socher arXiv preprint arXiv:1803.08240, 2018	190	2018
Deep learning-enabled breast cancer hormonal receptor status determination from base-level H&E stains N Naik, A Madani, A Esteva, NS Keskar, MF Press, D Ruderman, DB Agus, ... Nature communications 11 (1), 5727, 2020	189	2020
Weighted transformer network for machine translation K Ahmed, NS Keskar, R Socher arXiv preprint arXiv:1711.02132, 2017	161	2017
Balancing communication and computation in distributed optimization AS Berahas, R Bollapragada, NS Keskar, E Wei IEEE Transactions on Automatic Control 64 (8), 3141-3155, 2018	120	2018
Sequence-to-sequence prediction using a neural network model NS Keskar, K Ahmed, R Socher US Patent 11,928,600, 2024	112	2024
Multitask learning as question answering NS Keskar, B McCann, C Xiong, R Socher US Patent 11,501,076, 2022	90	2022
Multitask learning as question answering B McCann, NS Keskar, C Xiong, R Socher US Patent 10,776,581, 2020	84	2020
Xlda: Cross-lingual data augmentation for natural language inference and question answering J Singh, B McCann, NS Keskar, C Xiong, R Socher arXiv preprint arXiv:1905.11471, 2019	80	2019
Coarse-grain fine-grain coattention network for multi-evidence question answering V Zhong, C Xiong, NS Keskar, R Socher arXiv preprint arXiv:1901.00603, 2019	75	2019

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors