AI development, full-stack engineering, and infrastructure consulting.
From prototype to production — no hype, just results.
I'm Charles Chen — a software engineer and AI developer who runs his own multi-node GPU cluster and builds production AI systems from scratch. I operate 550 GB of VRAM across NVIDIA RTX PRO 6000 Blackwell and DGX Spark hardware, interconnected via a 200Gbps QSFP mesh with verified RoCE v2 RDMA. I built my own fleet control plane in Rust, indexed 1,014 codebases into a 37GB knowledge system, and run everything local-first with zero cloud dependency.
I don't just talk about AI — I build the infrastructure, write the control planes, design the knowledge pipelines, and deploy the models. My fleet runs 24/7 serving inference at 110+ tok/s per GPU. Whether you need a private LLM deployment, a knowledge extraction system, or someone to architect your GPU infrastructure — I've already built it at scale for myself.
End-to-end engineering across the AI and software stack.
Custom model training, fine-tuning, RAG pipelines, and LLM integration. I run my own GPU fleet and deploy models locally — no cloud bills, no data leaving your network, full control.
End-to-end application development from database design to polished UIs. I build fast, reliable software with clean architecture that your team can maintain.
On-prem GPU clusters, networking, and deployment automation. I build the same infrastructure I run daily — multi-node GPU fleets with automated health monitoring, auto-restart, and zero-downtime serving.
Architecture reviews, technology strategy, and team mentoring. I help engineering teams make better technical decisions and ship faster.
Extract structured data from PDFs, images, and scanned documents using local OCR and LLM pipelines. No cloud APIs, no data leaving your network — everything runs on-premise with GPU acceleration.
Build local semantic search systems with vector embeddings, hybrid retrieval (BM25 + cosine similarity), and intelligent ranking. Power your apps with meaning-aware search that runs entirely on your hardware.
Real projects, real infrastructure, all running in production.
Built a universal knowledge system in Rust that indexes 1,014 open-source codebases (37 GB, 1.8M+ pages, 4.4M+ chunks, 11M+ structural entities, 40K+ semantic entities). Three-tier retrieval: hybrid search (BM25 + vector), agentic self-correcting loops, and graph-walk verification. Semantic extraction pipeline runs across 5 GPUs concurrently using Qwen3.5 with structured grammar output. Features community detection, topic modeling, concept graphs, and 32 MCP tools for AI-assisted development. The largest personal code knowledge base ever built.
Built a Rust fleet control plane managing 3 machines with 550 GB VRAM across NVIDIA RTX PRO 6000 and DGX Spark hardware. Cache-aware routing via consistent hash ring for KV cache reuse, lock-free circuit breakers (atomic CAS state machine), batch dispatch across all 5 GPUs weighted by throughput, Spark-to-Spark tensor parallelism orchestration, and RDMA/RoCE network awareness (112 Gb/s verified). Dynamic worker registration for cloud GPUs, historical metrics with 7-day retention, and a web dashboard. 8,000 LOC, 13 unit tests, 12 MCP tools.
Designed and operate a private AI compute cluster: 3x RTX PRO 6000 Blackwell (294 GB) + 2x DGX Spark GB10 (256 GB), interconnected via 200Gbps QSFP mesh with verified RoCE v2 at 112 Gb/s and 1.5µs latency. Runs vLLM with prefix caching and NVFP4 quantization, serving ~110 tok/s per mcqueen GPU. Full RDMA stack (ConnectX-6/7), automated fleet management via Enterprise, and zero cloud dependency.
Full-featured AI CLI in Rust with streaming REPL, autonomous tool calling, plan mode, and multi-agent orchestration. Plugin system with marketplace, MCP client integration, and fleet-aware endpoint discovery via Enterprise. Workspace architecture with separate core library and CLI binary. v2.1.0.
Klaus fork specialized for the NVIDIA DGX ecosystem. Jensen personality, Volt theme, fleet management commands, MCP client, ACP server. Manages GPU allocation, model deployment, and inference across DGX Spark nodes. v0.9.0.
Agentic tax optimization system with three layers: L0 (categorize), L1 (personal → business reclassification), L2 (cross-category optimization). Four A2A specialist agents handle retirement, deductions, credits, and categorization. LLM-powered reasoning suggests savings that static rules miss. 278 tests, 15 profiles.
Open-source Claude Code plugin that assembles 5 dynamic expert reviewers to critique AI-generated plans. Reviewers are domain-matched with constructive and adversarial perspectives. Two-pass review with severity tracking and delta reporting. v1.3.0, published on GitHub.
Flutter mobile app for voice-first learning with STT/TTS, voice commands, and AI-powered tutoring. Students interact by speaking — the app transcribes, processes via LLM, and speaks back. v1.1.0.
Full-stack AI platform for a law firm — document analysis, case research automation, and client management. Custom LLM pipelines with domain-specific fine-tuning for legal document understanding.
End-to-end Amazon product management with SP-API and Ads API integration. Merch design pipeline with AI image generation, inventory tracking, and automated listing optimization.
Professional website design and development — from marketing landing pages to full web applications. Responsive design, SEO, Cloudflare deployment, custom domains, and ongoing maintenance.
Have a project in mind? I'm always interested in hearing about new challenges — whether it's a greenfield AI project, a complex infrastructure problem, or scaling an existing system.