Lut-gemm: Quantized matrix multiplication based on luts for efficient inference in large-scale generative language models G Park, B Park, M Kim, S Lee, J Kim, B Kwon, SJ Kwon, B Kim, Y Lee, ... arXiv preprint arXiv:2206.09557, 2022 | 116 | 2022 |
Structured compression by weight encryption for unstructured pruning and quantization SJ Kwon, D Lee, B Kim, P Kapoor, B Park, GY Wei Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2020 | 52 | 2020 |
Alphatuning: Quantization-aware parameter-efficient adaptation of large-scale pre-trained language models SJ Kwon, J Kim, J Bae, KM Yoo, JH Kim, B Park, B Kim, JW Ha, N Sung, ... arXiv preprint arXiv:2210.03858, 2022 | 35 | 2022 |
Biqgemm: matrix multiplication with lookup table for binary-coding-based quantized dnns Y Jeon, B Park, SJ Kwon, B Kim, J Yun, D Lee SC20: International Conference for High Performance Computing, Networking …, 2020 | 35 | 2020 |
Extremely low bit transformer quantization for on-device neural machine translation I Chung, B Kim, Y Choi, SJ Kwon, Y Jeon, B Park, S Kim, D Lee arXiv preprint arXiv:2009.07453, 2020 | 30 | 2020 |
Decompression apparatus and control method thereof D Lee, K Sejung, B Kim, P Kapoor, P Baeseong US Patent 10,917,121, 2021 | 24 | 2021 |
Flexor: Trainable fractional quantization D Lee, SJ Kwon, B Kim, Y Jeon, B Park, J Yun Advances in neural information processing systems 33, 1311-1321, 2020 | 15 | 2020 |
TF-MVP: Novel Sparsity-Aware Transformer Accelerator with Mixed-Length Vector Pruning E Yoo, G Park, JG Min, SJ Kwon, B Park, D Lee, Y Lee 2023 60th ACM/IEEE Design Automation Conference (DAC), 1-6, 2023 | 5 | 2023 |
Electronic apparatus and controlling method thereof K Sejung, P Baeseong, D Lee US Patent 12,147,892, 2024 | 4 | 2024 |
HyperCLOVA X Technical Report KM Yoo, J Han, S In, H Jeon, J Jeong, J Kang, H Kim, KM Kim, M Kim, ... arXiv preprint arXiv:2404.01954, 2024 | 4 | 2024 |
Encoding weights of irregular sparsity for fixed-to-fixed model compression B Park, SJ Kwon, D Oh, B Kim, D Lee arXiv preprint arXiv:2105.01869, 2021 | 4 | 2021 |
Sparsity-Aware Memory Interface Architecture using Stacked XORNet Compression for Accelerating Pruned-DNN Models Y Byun, S Moon, B Park, SJ Kwon, D Lee, G Park, E Yoo, JG Min, Y Lee Proceedings of Machine Learning and Systems 5, 2023 | 3 | 2023 |
Post-training weighted quantization of neural networks for language models SJ Kwon, D Lee, Y Jeon, B Kim, BS Park, Y Ro | 3 | 2021 |
Electronic apparatus for decompressing a compressed artificial intelligence model and control method therefor P Baeseong, K Sejung US Patent App. 17/519,285, 2022 | 2 | 2022 |
Q-Rater: Non-convex optimization for post-training uniform quantization B Kim, D Lee, Y Ro, Y Jeon, SJ Kwon, B Park, D Oh arXiv preprint arXiv:2105.01868, 2021 | 2 | 2021 |
DropBP: Accelerating Fine-Tuning of Large Language Models by Dropping Backward Propagation S Woo, B Park, B Kim, M Jo, S Kwon, D Jeon, D Lee arXiv preprint arXiv:2402.17812, 2024 | 1 | 2024 |
Electronic apparatus and method for controlling thereof D Lee, P Baeseong, B Kim, K Sejung, J Yongkweon US Patent App. 17/171,582, 2021 | 1 | 2021 |
Electronic device and control method therefor B Kim, D Lee, K Sejung, RO Yeonju, P Baeseong, J Yongkweon US Patent App. 18/131,164, 2023 | | 2023 |
Decompression apparatus for decompressing a compressed artificial intelligence model and control method thereof D Lee, K Sejung, B Kim, P Kapoor, P Baeseong US Patent 11,595,062, 2023 | | 2023 |
Modulating Regularization Frequency for Efficient Compression-Aware Model Training D Lee, SJ Kwon, B Kim, J Yun, B Park, Y Jeon arXiv preprint arXiv:2105.01875, 2021 | | 2021 |