vLLM
High-performance production inference engine — PagedAttention, 2-4× throughput over baseline
LLM serving engine built for production (73K+ ⭐). PagedAttention technology optimizes GPU memory, achieving 2-4× throughput over FasterTransformer/Orca. Supports continuous batching, tensor paralle...