Keyo - Bringing Eyes to Voices

🏆 Track Winner in Data Science/ML, DeerHack 2025

A computer vision assistant tool that provides real-time direction feedback to visually impaired users. Making the world easier to navigate, one voice at a time.

View on GitHub

About the Project

Keyo is a voice-activated indoor navigation system designed to empower visually impaired individuals with real-time object detection and spatial guidance. Built for DeerHack 2025, this assistive technology combines YOLOv11 object detection, MediaPipe pose estimation, and Google Text-to-Speech to enable hands-free navigation through voice commands. The system processes webcam feeds to detect 80+ object types, calculates 3D spatial positioning, and delivers precise turn-by-turn voice instructions every 5 seconds. With support for both English and Nepali languages, Keyo transforms indoor environments into navigable spaces through natural language interaction and continuous audio feedback.

Key Features

•Voice-activated navigation with hands-free interaction for 80+ object types
•Real-time object detection using YOLOv11 with CUDA acceleration support
•3D spatial positioning with accurate distance and direction calculations using MediaPipe pose estimation
•Turn-by-turn voice guidance with automatic TTS instructions every 5 seconds
•Multi-method depth estimation combining pose geometry and object class knowledge
•Precise navigation instructions (forward, backward, left, right, up, down) with distance feedback
•Last known location memory for object tracking and position prediction
•Bilingual support (English and Nepali) for broader accessibility
•Responsive web interface with live video feed and object detection overlays
•Real-time WebSocket updates for status and instruction synchronization
•Mobile-optimized interface with touch controls and auto-rotate support

Challenges & Solutions

⚡Implementing accurate 3D spatial reasoning from 2D webcam feeds with depth estimation
⚡Designing robust person and object tracking systems with temporal consistency
⚡Calculating precise turn angles and navigation directions relative to user orientation
⚡Balancing instruction frequency to be helpful without overwhelming visually impaired users
⚡Optimizing YOLOv11 performance for real-time processing at ~25 FPS on standard hardware
⚡Converting technical navigation data into natural, conversational language instructions
⚡Handling varying camera angles and environmental conditions for consistent detection

Outcomes & Impact

✓Successfully implemented voice-activated indoor navigation for visually impaired users
✓Achieved real-time object detection and tracking across 80+ different object types
✓Built comprehensive 3D spatial navigation system with MediaPipe pose estimation
✓Delivered automatic voice guidance every 5 seconds with Google TTS integration
✓Created production-ready FastAPI backend with WebSocket support for real-time updates
✓Developed responsive web interface optimized for desktop, mobile, and tablet devices
✓Won Track Winner in Data Science/ML category at DeerHack 2025
✓Implemented bilingual support making the system accessible to broader user base

Technologies

PythonFastAPIYOLOv11 (Ultralytics)MediaPipeOpenCVGoogle TTS (gTTS)ReactTypeScriptWebSocketPyTorch