svah‑x

Kelvin Peng

Building ML systems with a taste for polish

Math (Combinatorics & Optimization + Statistics) @ University of Waterloo. Currently focused on reinforcement learning agents, efficient fine-tuning, and on-device ML.

About

I like problems that sit between theory and engineering: training stability, evaluation discipline, and “product-level” UX for technical tools. Right now I’m building a Geometry Dash agent (vision-based) and learning Isaac Lab for robotics simulation.

Projects

Reinforcement Learning · Game AI
In production
Largest ongoing project

Geometry Dash Agent (vision-based)

A gameplay agent trained from pixels with a reproducible evaluation harness. The goal is strong generalization across levels through curriculum design and robust training pipelines.

PyTorch RL (PPO-style) CV pipeline Evaluation
Training
Curriculum + reward shaping and stable rollouts.
Data
Support for learning from real levels (dataset ingestion).
Repro
Metrics, seeds, and sanity checks baked in.
LLM Systems

SFT on Dream‑7B (LoRA/QLoRA + 4‑bit)

Memory-efficient fine-tuning with quantization and careful eval splits for reasoning improvements on consumer GPUs.

PyTorch BitsAndBytes QLoRA
Model Training

Instruction FT on GPT‑OSS‑20B

Large-scale instruction fine-tuning with distributed training and checkpointing; emphasis on stable training + ablations.

DeepSpeed Gradient Ckpt Eval
iOS / On‑device ML

Real‑time camera translation

FastVLM + OCR pipeline; SwiftUI front‑end with CoreML for low-latency, private on-device translation.

SwiftUI CoreML Vision

Contact

Email · [email protected]
Phone · +1 (236) 990-3288
Location · Waterloo, Ontario