vLLM TPU Inference
Core Contributor | GoogleEnabling high-performance, unified JAX and PyTorch LLM inference on TPUs to serve massive models with optimized memory and throughput.
PhD | AI Performance Engineer @ Google
Specializing in high-performance computing and scaling LLM inference on TPUs.
My name is Pate Motter. I hold a PhD in Computer Science with a core focus on High-Performance Computing (HPC).
Currently, I work as an AI Performance Engineer at Google, where my primary focus revolves around optimizing LLM inference performance at scale on Tensor Processing Units (TPUs). My passion lies at the intersection of systems engineering and machine learning, pushing the hardware limits for massive AI models.
Enabling high-performance, unified JAX and PyTorch LLM inference on TPUs to serve massive models with optimized memory and throughput.
An open-source LLM written in pure JAX, explicitly tailored and optimized to run at scale on Google Cloud TPUs.
A technical series explaining complex Artificial Intelligence concepts in an approachable way, bridging the gap between theoretical machine learning and engineering practice.
A comprehensive guide outlining core principles for scaling machine learning models effectively using JAX.
Documentation for Pallas, an explicitly-scheduled kernel language integrated natively into JAX, essential for writing high-performance custom operations on TPUs and GPUs.