Machine Learning
July 2025

Keyo - Bringing Eyes to Voices

🏆 Track Winner in Data Science/ML, DeerHack 2025

A computer vision assistant tool that provides real-time direction feedback to visually impaired users. Making the world easier to navigate, one voice at a time.

Keyo - Bringing Eyes to Voices

About the Project

Keyo is a voice-activated indoor navigation system designed to empower visually impaired individuals with real-time object detection and spatial guidance. Built for DeerHack 2025, this assistive technology combines YOLOv11 object detection, MediaPipe pose estimation, and Google Text-to-Speech to enable hands-free navigation through voice commands. The system processes webcam feeds to detect 80+ object types, calculates 3D spatial positioning, and delivers precise turn-by-turn voice instructions every 5 seconds. With support for both English and Nepali languages, Keyo transforms indoor environments into navigable spaces through natural language interaction and continuous audio feedback.

Key Features

  • Voice-activated navigation with hands-free interaction for 80+ object types
  • Real-time object detection using YOLOv11 with CUDA acceleration support
  • 3D spatial positioning with accurate distance and direction calculations using MediaPipe pose estimation
  • Turn-by-turn voice guidance with automatic TTS instructions every 5 seconds
  • Multi-method depth estimation combining pose geometry and object class knowledge
  • Precise navigation instructions (forward, backward, left, right, up, down) with distance feedback
  • Last known location memory for object tracking and position prediction
  • Bilingual support (English and Nepali) for broader accessibility
  • Responsive web interface with live video feed and object detection overlays
  • Real-time WebSocket updates for status and instruction synchronization
  • Mobile-optimized interface with touch controls and auto-rotate support

Challenges & Solutions

  • Implementing accurate 3D spatial reasoning from 2D webcam feeds with depth estimation
  • Designing robust person and object tracking systems with temporal consistency
  • Calculating precise turn angles and navigation directions relative to user orientation
  • Balancing instruction frequency to be helpful without overwhelming visually impaired users
  • Optimizing YOLOv11 performance for real-time processing at ~25 FPS on standard hardware
  • Converting technical navigation data into natural, conversational language instructions
  • Handling varying camera angles and environmental conditions for consistent detection

Outcomes & Impact

  • Successfully implemented voice-activated indoor navigation for visually impaired users
  • Achieved real-time object detection and tracking across 80+ different object types
  • Built comprehensive 3D spatial navigation system with MediaPipe pose estimation
  • Delivered automatic voice guidance every 5 seconds with Google TTS integration
  • Created production-ready FastAPI backend with WebSocket support for real-time updates
  • Developed responsive web interface optimized for desktop, mobile, and tablet devices
  • Won Track Winner in Data Science/ML category at DeerHack 2025
  • Implemented bilingual support making the system accessible to broader user base

Technologies

PythonFastAPIYOLOv11 (Ultralytics)MediaPipeOpenCVGoogle TTS (gTTS)ReactTypeScriptWebSocketPyTorch

Tags

AccessibilityComputer VisionVoice NavigationAssistive Technology