I architect LLM systems that cut inference costs 12x
— and scale to 800K+ users.
David Dominguez
CTO · LLM Engineer · Ships from Zero
Let's talkExperience
- CTOSept 2025 — Present
Most AI projects fail on cost before they scale. Architected and shipped an AI sales agent from zero, reducing per-conversation cost from $0.25 to $0.02 (12x) while sustaining sub-second response times. Built a routing layer across 20+ LLM providers that dynamically balances cost and quality per conversation turn. Own hiring, technical budget, product roadmap, and report directly to investors.
- Technical LeadMay 2024 — Sept 2025
Led a 10-person engineering team that built and scaled the Affiliates program within MercadoLibre’s ecosystem from first user to 800K+, with zero SLA breaches across 16 months. The program grew to contribute 1% of MercadoLibre’s global revenue. Responsible for core architecture decisions, sprint delivery, and team growth.
- CTOAug 2022 — May 2024
As CTO, designed a fully client-side architecture for a low-code platform — keeping infrastructure costs near zero at scale. Built two custom engines: a 2D rendering layer for the visual editor and a transpiler that converted user designs into production code. Owned hiring, technical budget, product roadmap, and reported directly to investors.
- Lead DeveloperDec 2021 — Aug 2022
Served as the primary technical lead for the agency’s largest enterprise account, building direct trust with their CTO. Led a team of 6 engineers delivering their core platform rebuild.
- Senior DeveloperAug 2021 — Dec 2021
Diagnosed a delivery bottleneck across client projects and built an automated scaffolding platform that cut project setup time in half, compressing timelines across the full portfolio.
- Senior Web DeveloperDec 2020 — Aug 2021
Credibanco’s proprietary POS hardware required C — capping terminal development to a small specialist team. Built a Python-to-C transpiler that opened POS development to the company’s broader Python developer base. Then led the deployment rollout across thousands of payment terminals nationwide.
- DeveloperOct 2018 — Dec 2020
Built production applications in TypeScript, React, and Node.js across 10+ client projects — from MVPs to full-stack platforms — developing the full-stack depth that informed later architecture work at scale.
Work
Closer AI
AI sales agent that closes pipeline while you sleep
Built the core intelligence layer of an AI sales agent that handles full conversations autonomously — reasoning through objections, maintaining context across sessions, and routing to the optimal model per turn. Reduced per-conversation cost from $0.25 to $0.02 (12x) while sustaining sub-second response times.
LLM Rate Limiter
Hard spending limits across every AI provider — zero infrastructure
Built a TypeScript library that enforces hard spending and usage caps across multiple AI providers — preventing budget overruns before they happen. Supports automatic failover to backup models when limits are hit and recycles unused capacity in real time. Drop-in installation, no servers required. Currently in production at Closer AI and open-sourced.
LLM Markdown WhatsApp
Ship AI on WhatsApp in one function call
Built a zero-configuration TypeScript library that converts raw AI output into properly formatted WhatsApp messages — handling lists, product cards, links, and Spanish punctuation automatically. One function call replaces days of custom formatting work per integration.
LLM Graph Builder
Persistent, relational memory for AI agents that make real decisions
Built a TypeScript library that gives AI agents a structured knowledge graph memory — capturing how entities relate to each other across sessions, not just isolated facts. Addresses the memory problem that limits most agents to single-session tasks. Powers Closer AI’s conversation memory in production.
Agua
A low-code IDE built on near-zero infrastructure spend
Architected the client-side IDE for a low-code platform from the ground up — including the 2D rendering engine and the transpiler that converted user designs into production code. The fully client-side architecture eliminated server costs entirely, keeping infrastructure spend near zero at scale.
MercadoLibre
MercadoLibre’s Affiliates program — from first user to 800K, 1% of global revenue
Led a 10-person engineering team building MercadoLibre’s Affiliates program through its full growth phase — from first user to 800K. The program grew to contribute 1% of MercadoLibre’s global revenue. Responsible for core architecture decisions under real growth pressure: observability, scaling strategy, reliability thresholds. Zero SLA breaches across 16 months of 800x growth.
Credibanco POS
Turned a 10-person bottleneck into a platform hundreds could build on
Built a Python-to-C transpiler that opened POS terminal development from a small C-specialist team to the company’s broader developer base. Deployed across thousands of payment terminals nationwide. The transpiler removed a structural hiring and knowledge bottleneck that had been limiting the company’s hardware roadmap for years.
K8s Scaffolding Platform
Eliminated the hidden tax every engineer paid on every new project
Identified that engineers were rebuilding identical scaffolding from scratch on every client engagement. Designed and built an automated Kubernetes-based platform that cut setup time in half and standardized delivery quality across the full engineering team.
Education
My Philosophy
Adoption is an engineering problem.
The hardest tools to adopt are the ones that ignore how people already work. I build to fit existing workflows, existing habits, existing mental models — because the hidden cost of software is never the license. It’s the change management, the resistance, the workarounds your team builds around tools they were told to use.
Adoption is an engineering problem.
The hardest tools to adopt are the ones that ignore how people already work. I build to fit existing workflows, existing habits, existing mental models — because the hidden cost of software is never the license. It’s the change management, the resistance, the workarounds your team builds around tools they were told to use.
Speed is a strategy. Waiting is a decision.
Every week a tool isn’t in production is a week your team is working around the problem instead of past it. I optimize for sustainable velocity — scoped releases, real users, real feedback — without sacrificing reliability. The compounding cost of delay is invisible until it isn’t. But so is the cost of shipping something that breaks trust.
Privacy is a design constraint, not an afterthought.
I treat data privacy as an architectural decision, not a compliance checkbox. At Agua, the fully client-side architecture meant zero data egress — both a product requirement and a privacy guarantee. At Closer AI, I face the inverse problem: third-party LLM calls are inherently data-leaving events. Privacy there means governing what leaves, where it goes, and how long it’s retained. The approach changes with the problem. The discipline doesn’t.
Stack

Contact
David Dominguez
I take one engagement at a time and work with teams where engineering decisions directly impact business outcomes. If your next 90 days depend on serious engineering judgment, this is a conversation worth having.
l.david.dominguez.12@gmail.com