LLM_Edge_Inference_Separation_Architecture

Collection of popular literature

📒Introduction

Edge Inference, Edge Intelligence, Collaborative DNN Inference, Distributed Computing

Date	Title	Paper	Code	Recom
2024.01	🔥[DistServe] DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving	[docs]	[DistServe]	⭐️⭐️
2023.11	🔥[Splitwise] Splitwise: Efficient Generative LLM Inference Using Phase Splitting	[docs]		⭐️⭐️
2024.01	🔥[TetriInfer] Inference without Interference:Disaggregate LLM Inference for Mixed Downstream Workloads	[docs]		⭐️⭐️
2024.06	🔥[MemServe] MemServe: Context Caching for Disaggregated LLM Serving with Elastic Memory Pool	[docs]		⭐️⭐️
2024.06	🔥[Mooncake] Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving(@Moonshot AI)	[docs]	[Mooncake]	⭐️⭐️
2024.02	🔥[ChunkAttention] ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition	[docs]	[ChunkAttention]	⭐️⭐️
2023.08	🔥[SARATHI] SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills	[docs]		⭐️
2024.05	🔥[Galaxy] Galaxy: A Resource-Efficient Collaborative Edge AI System for In-situ Transformer Inference	[docs]		⭐️

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md