AI · MSc Deep Learning coursework
SmartKitchen — Vision-Language Cooking Assistant
A multimodal AI assistant for everyday food decisions.
Problem
Deciding what to cook or eat means juggling disconnected steps — identifying a dish, knowing its ingredients, finding a recipe, handling substitutions, then figuring out where to buy or eat. No single tool ties vision, language, and location together for that workflow.
Approach
I built a full-stack system that combines computer vision and language models behind one interface. A CLIP/ResNet50 pipeline recognizes dishes and runs multi-label ingredient detection from a photo; a retriever searches a database of 2,000+ recipes; and a FLAN-T5/Qwen model powers a RAG-based assistant for cooking questions and ingredient substitutions. Location-aware recommendations surface nearby restaurants and grocery stores via OpenStreetMap.
Outcome
A responsive web app that turns a single food image into dish recognition, detected ingredients, matching recipes, conversational cooking help, and nearby places — an end-to-end multimodal product rather than an isolated model.
Tech stack
PyTorch models served via a FastAPI backend with a Next.js + Tailwind frontend. CV models (CLIP/ResNet50) handle perception, FLAN-T5/Qwen handles generation and RAG, and OpenStreetMap handles geosearch. Deployed across Vercel (frontend) and DigitalOcean (API).