I am a Senior Software Engineer & Architect with over 8 years of experience in designing and building large-scale, high-concurrency distributed systems using Java and microservices.
Currently, I am expanding my expertise into AI/ML Infrastructure and Large Language Model (LLM) Applications, applying my deep engineering background to build robust and efficient systems that power intelligent solutions. I am passionate about creating value at the intersection of solid software architecture and cutting-edge AI.
Based in Singapore, I am seeking senior roles in either Java System Architecture or AI/ML Infrastructure.
My skills cover the full spectrum from foundational backend architecture to modern AI/ML infrastructure.
Core Java & Distributed Systems | AI/ML & Big Data |
---|---|
Java, Spring Boot, Spring Cloud | Python, PyTorch, Transformers, vLLM |
Microservices, SaaS, Domain-Driven Design (DDD) | LLM Fine-tuning (LoRA), RAG |
Docker, Kubernetes, DevOps, gRPC | Vector DB (Milvus, FAISS, Elasticsearch) |
Kafka, Zookeeper, WebSocket | Apache Flink, Flink-CDC, Prometheus |
MySQL, PostgreSQL, ES, Redis, MinIO | ELK Stack, Grafana, Flume |
System Design & Scalable Architecture | MLOps & Inference Optimization |
Here are some projects that highlight my capabilities across both domains.
-
RAG System for Domain-Specific Q&A
- Engineered a Retrieval-Augmented Generation (RAG) pipeline using
Mistral-7B
,chatGPT-o4-mini
,ElasticSearch
for a specialized knowledge domain. - Optimized the system for real-time interaction through efficient data processing and a high-throughput inference server deployment.
- Engineered a Retrieval-Augmented Generation (RAG) pipeline using
-
LLM Inference Acceleration & Fine-tuning
- Customized open-source LLMs (
Mistral
,Qwen
,Llama
) using LoRA fine-tuning techniques on specific datasets. - Accelerated model inference significantly using vLLM and FlashAttention, deploying them as scalable API endpoints on cloud platforms like GCP and Azure.
- Customized open-source LLMs (
-
Group-Level Multi-functional Payment Platform
- Architected and developed a highly available, enterprise-grade payment center using Domain-Driven Design (DDD) and a robust microservices architecture.
- The system reliably handles millions of transactions in a specific period time and smoothly process tens of thousands of transactions or more every day, ensuring data consistency and security across various payment channels.
-
High-Concurrency Instant Messaging (IM) System
- Built a distributed IM system from the ground up to support millions of concurrent users.
- Leveraged a powerful tech stack including
Spring Boot
,WebSocket
,Kafka
for message queuing, andZookeeper
for service coordination, achieving high throughput and low latency.
-
Enterprise Search & Real-Time Data Platform
- Designed a high-performance search engine using
Elasticsearch
andFlink-CDC
capable of indexing and searching billions of records with sub-second latency. - Built the underlying real-time data synchronization pipeline, providing a unified data backbone for multiple business units.
- Designed a high-performance search engine using
-
Enterprise Flink computing Platform
- Enterprise-level Flink Cluster: Built a unified Flink-based computing center for real-time data lake, stats center, and ETL pipelines.
I believe in continuous learning and sharing knowledge. I write about my journey in software architecture, distributed systems, and AI on my Medium blog.