I've spent the last two years working on ML systems that solve real problems at scale. At Foundly, I developed TensorFlow-powered item-matching models that cut manual review time in half for a platform serving 15,000+ users. The challenge wasn't just accuracy—it was making sure the system could handle noisy, incomplete data from users who were stressed about losing their belongings.
At MemMachine, I focused on long-context retrieval. I benchmarked token recall efficiency and improved sequence reconstruction by 30% through targeted optimizations in cache synchronization and persistence layers. Working in the open-source GPU contributor program taught me how to balance performance constraints with model reliability when you're dealing with documents that span thousands of tokens.
I also explored retrieval systems and embeddings across text and images for Elevare, where I'm building recommendation models that surface job and event suggestions based on user patterns. Each project has reinforced the same lesson: production ML is about trade-offs between latency, cost, accuracy, and user experience.