Skip to content

yiran-luu/LLM_Edge_Inference_Separation_Architecture

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 

Repository files navigation

LLM_Edge_Inference_Separation_Architecture

Collection of popular literature

image

📒Introduction

Edge Inference, Edge Intelligence, Collaborative DNN Inference, Distributed Computing

📖Collection of popular literature

Date Title Paper Code Recom
2024.01 🔥[DistServe] DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving [docs] [DistServe] ⭐️⭐️
2023.11 🔥[Splitwise] Splitwise: Efficient Generative LLM Inference Using Phase Splitting [docs] ⭐️⭐️
2024.01 🔥[TetriInfer] Inference without Interference:Disaggregate LLM Inference for Mixed Downstream Workloads [docs] ⭐️⭐️
2024.06 🔥[MemServe] MemServe: Context Caching for Disaggregated LLM Serving with Elastic Memory Pool [docs] ⭐️⭐️
2024.06 🔥[Mooncake] Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving(@Moonshot AI) [docs] [Mooncake] ⭐️⭐️
2024.02 🔥[ChunkAttention] ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition [docs] [ChunkAttention] ⭐️⭐️
2023.08 🔥[SARATHI] SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills [docs] ⭐️
2024.05 🔥[Galaxy] Galaxy: A Resource-Efficient Collaborative Edge AI System for In-situ Transformer Inference [docs] ⭐️

About

Collection of popular literature

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published