LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Publication
The 62nd Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Iman Mirzadeh
Iman Mirzadeh
Machine Learning Research Engineer