Prakash Prabhu
I am a Software Engineer at Google working in the broad area of machine learning (ML) compilers for TPUs, with a focus on ML inference. I have worked on various pipeline parallelism methods to improve the performance of large transformer encoder models like BERT-Large, multi-query attention auto-regressive decoders across multiple TPUs, and more recently on sub-byte quantized models like Gemini Nano on Pixel Edge TPUs.
My interests include ML inference optimizations, distributed systems, parallel computing, program analysis & compiler optimizations. I received a PhD in Computer Science from Princeton University.