Skip to content

Commit 48c8210

Browse files
authored
Merge pull request #907 from EaminC/main
Yiming
2 parents 17cc83d + 361f0c0 commit 48c8210

File tree

6 files changed

+163
-0
lines changed

6 files changed

+163
-0
lines changed

content/authors/YimingCheng/_index.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
---
2+
# Display name
3+
title: Yiming Cheng
4+
5+
# Username (this should match the folder name)
6+
authors:
7+
- YimingCheng
8+
9+
# Is this the primary user of the site?
10+
superuser: false
11+
12+
# Role/position
13+
role: "Predoc Researcher, Department of Computer Science, University of Chicago"
14+
15+
# Organizations/Affiliations
16+
organizations:
17+
- name: University of Chicago
18+
url: "https://computerscience.uchicago.edu/"
19+
20+
# Short bio (displayed in user profile at end of posts)
21+
bio: Yiming Cheng is a Pre-doc researcher in the Department of Computer Science at the University of Chicago, pursuing an M.S. in Computer Science with a focus on Machine Learning Systems (MLSys). He graduated from Tsinghua University with a B.E. in Electronic Engineering. His research focuses on distributed LLM deployment, distributed KV cache, and efficient machine learning systems. He is currently working on the LMCache project and contributing to open-source initiatives in the machine learning systems space.
22+
23+
# Social/Academic Networking
24+
# For available icons, see: https://sourcethemes.com/academic/docs/widgets/#icons
25+
# For an email link, use "fas" icon pack, "envelope" icon, and a link in the
26+
# form "mailto:your-email@example.com" or "#contact" for contact widget.
27+
social:
28+
- icon: envelope
29+
icon_pack: fas
30+
link: mailto:eaminc0328@gmail.com
31+
- icon: globe
32+
icon_pack: fas
33+
link: https://eaminc.github.io/
34+
35+
# Enter email to display Gravatar (if Gravatar enabled in Config)
36+
email: ""
37+
38+
# Organizational groups that you belong to (for People widget)
39+
# Set this to `[]` or comment out if you are not using People widget.
40+
user_groups:
41+
- Summer of Reproducibility Mentors
42+
- 2025 Contributors
43+
---
44+
45+
Yiming Cheng is a Pre-doc researcher in the Department of Computer Science at the University of Chicago, where he is pursuing an M.S. in Computer Science with a specialization in Machine Learning Systems (MLSys track). He graduated from Tsinghua University in 2024 with a Bachelor of Engineering in Electronic Engineering, along with minors in Statistics and Law.
46+
47+
Currently, Yiming is working as an Open Source Contributor and Research Assistant with the LMCache team under the supervision of Prof. Junchen Jiang. His work focuses on LMCache, the first open-source Knowledge Delivery Network (KDN) that accelerates LLM applications up to 8x faster at 8x lower cost. He also contributes to vLLM/production-stack, helping scale from single vLLM instances to distributed vLLM deployments. He has contributed over 1,262 lines of code to these open-source projects.
48+
49+
His research interests span both systems for machine learning (distributed LLM deployment, distributed KV cache, efficient ML) and machine learning for systems (ML for code generation and Operating Systems). During his undergraduate studies, he worked extensively on data mining projects including recommendation systems, emotion awareness, and embodied city simulations.
50+
51+
Yiming has been recognized with several prestigious awards, including the Merit-based Predoc Scholarship of $40,000 from the University of Chicago and funding from the United States National Science Foundation for his Summer of Reproducibility (SoR) project. He has authored multiple publications in venues such as MDPI Sensors and has patents in semantic encoding and decoding frameworks.
52+
53+
Through his research at institutions including Argonne National Laboratory, Tsinghua University's Future Intelligent Lab, and the University of Houston, Yiming continues to contribute to advancing distributed computing technologies and machine learning systems. His expertise in programming languages including Python (PyTorch, CuPy), Go (Docker, K8s), and various other technologies makes him a valuable contributor to the open-source machine learning community.
200 KB
Loading
Loading
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
---
2+
title: "Smart Environments – An AI System for Reproducible Custom Computing Environments"
3+
authors: [marshalp, YimingCheng]
4+
author_notes: ["University of Chicago", "University of Chicago"]
5+
tags: ["osre25", "reproducibility", "machine learning", "OS"]
6+
date: 2025-02-18
7+
lastmod: 2025-02-18
8+
---
9+
10+
## Overview
11+
12+
The complexity of environment setup and the expertise required to configure specialized software stacks can often hinder efforts to reproduce important scientific achievements in HPC and systems studies. Researchers often struggle with incomplete or ambiguous artifact descriptions that make assumptions about "common knowledge" that is actually specific domain expertise. When trying to reproduce experiments, reviewers may spend excessive time debugging environment inconsistencies rather than evaluating the actual research. These challenges are compounded when experiments need to run on different hardware configurations.
13+
14+
This project seeks to address these fundamental reproducibility barriers by using AI to translate natural language environment requirements often used in papers or artifact descriptions into actionable, reproducible configurations—bridging the knowledge gap between experiment authors and reviewers while standardizing environment creation across different hardware platforms. We will develop an AI-driven system that automatically generates and configures reproducible computing environments based on artifact descriptions from conferences, Trovi artifacts on the [Chameleon](chameleoncloud.org) testbed, and other reliable sources for scientific experiment code and associated documentation. Leveraging Natural Language Processing (NLP), the system will allow researchers to describe desired environments in plain English, then map those descriptions onto predefined configuration templates. By simplifying environment creation and ensuring reproducibility, the system promises to eliminate duplicate setup efforts, accelerate research workflows, and promote consistent experimentation practices across diverse hardware.
15+
16+
## Key Outcomes
17+
18+
- Working Prototype: A system that automatically generates machine images deployable on bare metal and VM instances, based on user-provided requirements.
19+
- Comprehensive Documentation: Detailed user manuals, guides, and best practices tailored to researchers, ensuring a smooth adoption process.
20+
- Live Demo: A demonstration environment (e.g., a web app or Jupyter notebook) that shows how to request, configure, and launch reproducible cloud environments on both hardware profiles.
21+
- Long-Term Impact: Building blocks for future AI-driven automation of cloud infrastructure, reducing human error and enabling fast, repeatable research pipelines.
22+
23+
**Topics**: Reproducibility, AI & NLP, Cloud Computing, DevOps and Automation
24+
25+
**Skills**:
26+
27+
- Machine Learning / AI: Familiarity with NLP methods to interpret user requirements.
28+
- Python: Primary language for backend services and cloud interactions.
29+
- Cloud API Integration: Experience with OpenStack or similar APIs to provision and configure images on both bare metal and virtual machines.
30+
- DevOps: Automated environment configuration, CI/CD workflows, and containerization.
31+
32+
**Difficulty**: Hard
33+
34+
**Size**: Large
35+
36+
**Mentors**: {{% mention marshalp %}}
37+
38+
**Tasks**:
39+
40+
- Requirement Gathering & NLP Design
41+
- Research the specific needs of researchers building experimental setups.
42+
- Design an NLP pipeline to parse plain-English descriptions (e.g., “I need Python 3.9, CUDA 11, and scikit-learn”) into environment “recipes.”
43+
- Backend Environment Builder
44+
- Implement logic that converts parsed user requirements into machine-image definitions for bare metal and VM instances.
45+
- Integrate with Chameleon’s APIs to provision servers, install software, and run configuration validation automatically.
46+
- Front-End & User Experience
47+
- Develop an intuitive web or CLI interface that researchers can use to capture experiment environment requirements.
48+
- Provide real-time status updates during environment setup, along with meaningful error messages and quick-start templates.
49+
- Testing & Validation
50+
- Conduct end-to-end tests using diverse software stacks (e.g., HPC libraries, machine learning frameworks) on bare metal and VM instances.
51+
- Ensure reproducibility by re-creating the same environment multiple times and comparing configurations.
52+
- Documentation & Demonstration
53+
- Produce user-facing documentation, including tutorials and best practices for researchers who frequently run experiments on Chameleon Cloud.
54+
- Create a short live demo or screencast showcasing how to configure an environment for a specific research workflow.
Loading
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
---
2+
title: "EnvGym – An AI System for Reproducible Custom Computing Environments"
3+
subtitle: ""
4+
summary:
5+
authors:
6+
- YimingCheng
7+
tags: ["osre25", "reproducibility", "machine learning", "OS"]
8+
categories: ["osre25", "reproducibility", "EnvGym"]
9+
date: 2025-06-16
10+
lastmod: 2025-06-16
11+
featured: false
12+
draft: false
13+
14+
# Featured image
15+
# To use, add an image named `featured.jpg/png` to your page's folder.
16+
# Focal points: Smart, Center, TopLeft, Top, TopRight, Left, Right, BottomLeft, Bottom, BottomRight.
17+
image:
18+
caption: "EnvGym Project"
19+
focal_point: Top
20+
preview_only: false
21+
---
22+
23+
Hello, My name is Yiming Cheng. I am a Pre-doc researcher in Computer Science at University of Chicago. I'm excited to be working with the Summer of Reproducibility and the Chameleon Cloud community as a project leader. My project is [EnvGym](https://github.com/eaminc/envgym) that focuses on developing an AI-driven system to automatically generate and configure reproducible computing environments based on natural language descriptions from artifact descriptions, Trovi artifacts, and research papers.
24+
25+
The complexity of environment setup often hinders reproducibility in scientific computing. My project aims to bridge the knowledge gap between experiment authors and reviewers by translating natural language requirements into actionable, reproducible configurations using AI and NLP techniques.
26+
27+
### Project Overview
28+
29+
EnvGym addresses fundamental reproducibility barriers by:
30+
31+
- Using AI to translate natural language environment requirements into actionable configurations
32+
- Automatically generating machine images deployable on bare metal and VM instances
33+
- Bridging the knowledge gap between experiment authors and reviewers
34+
- Standardizing environment creation across different hardware platforms
35+
36+
### June 10 – June 16, 2025
37+
38+
Getting started with the project setup and initial development:
39+
40+
- I began designing the NLP pipeline architecture to parse plain-English descriptions (e.g., "I need Python 3.9, CUDA 11, and scikit-learn") into structured environment "recipes"
41+
- I set up the initial project repository and development environment
42+
- I met with my mentor Prof. Kexin Pei to discuss the project roadmap and technical approach
43+
- I started researching existing artifact descriptions from conferences and Trovi to understand common patterns in environment requirements
44+
- I began prototyping the backend environment builder logic that will convert parsed requirements into machine-image definitions
45+
- I explored Chameleon's APIs for provisioning servers and automated configuration
46+
47+
### Next Steps
48+
49+
- Continue developing the NLP component for requirement parsing
50+
- Implement the core backend logic for environment generation
51+
- Begin integration with Chameleon Cloud APIs
52+
- Start building the user interface for environment specification
53+
54+
This is an exciting and challenging project that combines my interests in AI systems and reproducible research. I'm looking forward to building a system that will help researchers focus on their science rather than struggling with environment setup issues.
55+
56+
Thanks for reading, I will keep you updated as I make progress on EnvGym!

0 commit comments

Comments
 (0)