allows complex models to run in production by reducing their size and latency, while keeping most of the performance of larger, more computationally expensive models. It has been used to improve Google Search and Smart Summary for Gmail, Chat, Docs, and more.