LLM Systems
SFT on Dream‑7B (LoRA/QLoRA + 4‑bit)
Fine‑tuned on S1k with memory‑efficient quantization; ~20% reasoning lift and 60%+ cost reduction on a single 5070 Ti.
PyTorch
BitsAndBytes
QLoRA
Kelvin Peng
Math (Combinatorics & Optimization + Statistics) @ University of Waterloo.
I am diving into math, optimization, and applied ML. Recently I’ve been exploring instruction fine‑tuning on consumer GPUs and building small iOS tools with on‑device models. Currently learning IssacLab. Outside of class, I like hiking and cycling.
Fine‑tuned on S1k with memory‑efficient quantization; ~20% reasoning lift and 60%+ cost reduction on a single 5070 Ti.
OpenWebMath + The Stack v2; DeepSpeed + gradient checkpointing on 96GB VRAM. +15% math reasoning, +12% code tasks.
Apple FastVLM + OCR pipeline; SwiftUI front‑end with CoreML, sub‑1s end‑to‑end latency.