Collection of popular literature
Edge Inference, Edge Intelligence, Collaborative DNN Inference, Distributed Computing
Date | Title | Paper | Code | Recom |
---|---|---|---|---|
2024.01 | 🔥[DistServe] DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving | [docs] | [DistServe] | ⭐️⭐️ |
2023.11 | 🔥[Splitwise] Splitwise: Efficient Generative LLM Inference Using Phase Splitting | [docs] | ⭐️⭐️ | |
2024.01 | 🔥[TetriInfer] Inference without Interference:Disaggregate LLM Inference for Mixed Downstream Workloads | [docs] | ⭐️⭐️ | |
2024.06 | 🔥[MemServe] MemServe: Context Caching for Disaggregated LLM Serving with Elastic Memory Pool | [docs] | ⭐️⭐️ | |
2024.06 | 🔥[Mooncake] Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving(@Moonshot AI) | [docs] | [Mooncake] |
⭐️⭐️ |
2024.02 | 🔥[ChunkAttention] ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition | [docs] | [ChunkAttention] | ⭐️⭐️ |
2023.08 | 🔥[SARATHI] SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills | [docs] | ⭐️ | |
2024.05 | 🔥[Galaxy] Galaxy: A Resource-Efficient Collaborative Edge AI System for In-situ Transformer Inference | [docs] | ⭐️ |