Suivre
Catherine Olsson
Catherine Olsson
Anthropic
Adresse e-mail validée de mit.edu
Titre
Citée par
Citée par
Année
Estimating the reproducibility of psychological science
Open Science Collaboration
Science 349 (6251), aac4716, 2015
92102015
Dota 2 with large scale deep reinforcement learning
C Berner, G Brockman, B Chan, V Cheung, P Dębiak, C Dennison, ...
arXiv preprint arXiv:1912.06680, 2019
16842019
An open, large-scale, collaborative effort to estimate the reproducibility of psychological science
Open Science Collaboration
Perspectives on Psychological Science 7, 657-660, 2012
7272012
Training a helpful and harmless assistant with reinforcement learning from human feedback
Y Bai, A Jones, K Ndousse, A Askell, A Chen, N DasSarma, D Drain, ...
arXiv preprint arXiv:2204.05862, 2022
6762022
Constitutional ai: Harmlessness from ai feedback
Y Bai, S Kadavath, S Kundu, A Askell, J Kernion, A Jones, A Chen, ...
arXiv preprint arXiv:2212.08073, 2022
5832022
Tensorfuzz: Debugging neural networks with coverage-guided fuzzing
A Odena, C Olsson, D Andersen, I Goodfellow
International Conference on Machine Learning, 4901-4911, 2019
3412019
Language models (mostly) know what they know
S Kadavath, T Conerly, A Askell, T Henighan, D Drain, E Perez, ...
arXiv preprint arXiv:2207.05221, 2022
2212022
A general language assistant as a laboratory for alignment
A Askell, Y Bai, A Chen, D Drain, D Ganguli, T Henighan, A Jones, ...
arXiv preprint arXiv:2112.00861, 2021
2142021
Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned
D Ganguli, L Lovitt, J Kernion, A Askell, Y Bai, S Kadavath, B Mann, ...
arXiv preprint arXiv:2209.07858, 2022
2122022
In-context learning and induction heads
C Olsson, N Elhage, N Nanda, N Joseph, N DasSarma, T Henighan, ...
arXiv preprint arXiv:2209.11895, 2022
1882022
Predictability and surprise in large generative models
D Ganguli, D Hernandez, L Lovitt, A Askell, Y Bai, A Chen, T Conerly, ...
Proceedings of the 2022 ACM Conference on Fairness, Accountability, and …, 2022
1712022
A mathematical framework for transformer circuits
N Elhage, N Nanda, C Olsson, T Henighan, N Joseph, B Mann, A Askell, ...
Transformer Circuits Thread 1, 1, 2021
1482021
Discriminator rejection sampling
S Azadi, C Olsson, T Darrell, I Goodfellow, A Odena
arXiv preprint arXiv:1810.06758, 2018
1482018
Toy models of superposition
N Elhage, T Hume, C Olsson, N Schiefer, T Henighan, S Kravec, ...
arXiv preprint arXiv:2209.10652, 2022
1392022
Is generator conditioning causally related to GAN performance?
A Odena, J Buckman, C Olsson, T Brown, C Olah, C Raffel, I Goodfellow
International conference on machine learning, 3849-3858, 2018
1362018
Discovering language model behaviors with model-written evaluations
E Perez, S Ringer, K Lukošiūtė, K Nguyen, E Chen, S Heiner, C Pettit, ...
arXiv preprint arXiv:2212.09251, 2022
1242022
Dawn Drain
N Elhage, N Nanda, C Olsson, T Henighan, N Joseph, B Mann, A Askell, ...
Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Andy Jones, Jackson …, 2021
1172021
Dawn Drain
C Olsson, N Elhage, NJ Neel Nanda, N DasSarma, T Henighan, B Mann, ...
Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Scott Johnston, Andy …, 2022
1072022
Dota 2 with large scale deep reinforcement learning
CB OpenAI, G Brockman, B Chan, V Cheung, P Debiak, C Dennison, ...
arXiv preprint arXiv:1912.06680 2, 2019
1042019
Unrestricted adversarial examples
TB Brown, N Carlini, C Zhang, C Olsson, P Christiano, I Goodfellow
arXiv preprint arXiv:1809.08352, 2018
952018
Le système ne peut pas réaliser cette opération maintenant. Veuillez réessayer plus tard.
Articles 1–20