Large Language Models (LLMs) demand vast computing resources and incur significant costs, even just for experimental runs. To optimize both performance and total cost of ownership (TCO), architecture and engineering teams must:
- Gain comprehensive visibility across every stage of the inference and training pipeline
- Identify computational and memory hot-spots
- Rapidly assess trade-offs between throughput, cost, accuracy, and more
This tool will enable interactive “what-if” exploration of deployment strategies—such as parallelism and concurrency—revealing where and how optimizations can deliver the greatest ROI.
- Evolving Model Architectures
Design a schema that can express transformer-style blocks, multi-head attention modules, MLP layers, embeddings, adapters, etc., and easily extend to accommodate future innovations. - Parameter Metadata
Capture comprehensive model metadata, including tensor shapes, data types, density and sparsity profiles, and quantization scales to enable precise cost and complexity analysis.
- Abstract Hardware Profiles
Define a plug-and-play descriptor for compute devices that captures peak FLOPS, memory capacity, and interconnect topology. - Hierarchical Resource Model
Represent multi-level memory and compute resources as a graph
- Component-Level Data Flows
Auto-generate diagrams showing how inputs, activations, weights and output move through the transformer stack and pipeline. - Dependency Graphs
Highlight control-flow and data-dependency chains to analyze serialization points and parallelization barriers.
- Computational Complexity
Report FLOPs per layer and aggregate FLOPs for end-to-end execution. - Memory Footprint
Measure peak and working-set usage across model state, activations, optimizer state (for training), and intermediate buffers. - Latency Breakdown
Provide per-layer and per-stage latencies, including data transfers, compute, and I/O.
- Parameter Sweeps
Adjust batch size, sequence length, precision, and degrees of parallelism on the fly. - Impact Visualization
Deliver real-time feedback on throughput, latency, resource utilization, and cost estimates.
To get the tool up and running, follow these steps in order.
We assume you have Git, and Node.js (14+) installed.
If not, install them first.
git clone https://github.com/chamidou2k/mesa.git
cd mesa
Navigate to the backend directory and install required packages via npm:
cd ../backend
npm install
The default service port for the back-end API is 9090. You can also specify a custom port using the PORT environment variable:
PORT=9090 npm run dev
npm run dev
Successful initialization will display:
> js_backend@1.0.0 dev
> node index.js
MESA Tool Backend running at localhost:9090
Navigate to the frontend directory and install required packages via npm:
cd ../frontend
npm install
Rename .env.example
to .env
, then update backend API address in .env
file:
VITE_BACKEND_API_BASE=http://localhost:9090
Modify vite.config.js to customize the development server settings:
server: {
port: 9000,
host: false,
},
npm run dev
Successful initialization will display:
VITE v5.4.10 ready in 270 ms
➜ Local: http://localhost:9000/
➜ press h + enter to show help
- Defined a unified metadata schema for describing model architectures.
- Implemented an automated parameter-binding system to synchronize metadata descriptions with Hugging Face model parameter files.
- Implemented general hardware specification support.
- Completed operator testing for the LlamaForCausalLM architecture.
- Developed core UI components:
- Configuration panel
- Interactive graph canvas
- Runtime metrics display panel
- Add general Mixture-of-Experts (MoE) model compatibility.
- Integrate Deepseek v3 and LLaMA v4 model architecture.
- Refine interface workflows and interactions.
- Improve visual consistency and responsiveness.
- Expand to cover training pipeline.
The primary goal of this tool is to explore a general framework for evaluating the LLM pipeline, focusing on latency, memory footprint, and FLOPs analysis, while maintaining scalability to support the continuous evolution of model architectures. There are known limitations in the current evaluation and calculation methods, due to both the limited availability of test data and the still-maturing understanding of model knowledge. As the tool continues to develop, the accuracy and reliability of the results are expected to improve. For now, please take this tool for experimental purposes only, and interpret the evaluation results with caution.
"The more I learn, the more I realize how much I don’t know." — This quote perfectly captures the sentiment around the rapid and evolving landscape of AI.