GestureOS.

Engineering a real-time gesture recognition
pipeline for autonomous vehicles — then proving
it outperforms physical controls.

Role
Design Technologist
Industry
Automotive HCI
Timeline
4 Weeks
Platform
Python + Figma
Tools
MediaPipe, PyAutoGUI
Proof of Concept

It Works. Here’s the Evidence.

Collaborative Design: While I engineered the interaction logic, the visual interface was the result of a collaborative design phase. My partner and I produced two distinct UI concepts; for the final usability test, we selected my partner's design. This decision allowed me to focus entirely on the computer vision pipeline and ensured the testing measured the gesture mechanics rather than my own visual preferences.

2.9 vs 0.5
Cognitive Load (Mean Score)
Gestures demanded significantly less mental effort
2.8 / 4.0
Error Tolerance
Users found gesture correction more natural than button hunting
90%
Static Gesture Reliability
Victory sign detection in controlled conditions
Context

The Lounge Problem

As autonomous vehicles evolve into “moving lounges,” passengers will sit in reclined positions, far from the dashboard. Traditional touchscreens require users to lean forward, breaking their relaxed posture and compromising safety. Physical remotes in public fleets are often lost or unhygienic.

The question wasn't whether gesture control could work — it was whether it could outperform the familiar comfort of physical buttons. To answer this honestly, I needed to build a real system, not fake one.

Touchless Control
Eliminate physical contact for shared fleet hygiene and universal accessibility.
Posture Preservation
Allow reclined interaction without forcing passengers to lean forward.
Real System, Real Data
Build a working CV pipeline (not Wizard of Oz) for honest validation.
Implementation

How I Built It

Instead of building from scratch or faking the interaction with a “Wizard of Oz” approach, I forked and reconstructed an open-source computer vision model to fit the specific ergonomic constraints of a car cabin.

A. Vision Pipeline

I adapted the kinivi/hand-gesture-recognition-mediapipe repository to track 21 skeletal hand landmarks in real-time. I refactored the code to filter out “noisy” data common in moving vehicles and manually re-established the gesture vocabulary by training the system on a custom dataset tailored to cabin interactions.

B. Figma Integration

To enable real-time control, I customized the Figma prototype to listen for specific single-key inputs (e.g., pressing “V” triggers a search overlay). I then programmed the Python script to simulate these exact keystrokes whenever a valid gesture was detected, effectively “driving” the Figma interface remotely.

Figma Prototype Screens — Map View, Media Home, Media Search

Figma prototype: Map View-Ride, Media-Home, and Media-Search Engine screens with interaction hotkeys

C. Input Bridge (PyAutoGUI)

I integrated the PyAutoGUI library to translate visual landmarks into system-level inputs. This allowed the Python script to take control of the mouse pointer and trigger hotkeys directly, creating a seamless bridge between the CV pipeline and the Figma prototype.

D. Physics & Sensitivity Tuning

I wrote custom logic to calculate the velocity and intensity of gestures. A slow rotation adjusts volume incrementally, while a fast, high-intensity drag drops the volume instantly. This mimicking of physical inertia made the touchless interaction feel tactile and responsive.

Live Testing Session — Prototype, Webcam, and Gesture Detection

Live testing session: screen displaying the prototype with gesture detection webcam feed and usability metrics sidebar

Interaction Design

Defining the Vocabulary

I developed a gesture vocabulary designed to balance intuitive control with system reliability. Three categories emerged: Static gestures (hold a pose), Dynamic gestures (directional movement), and Spatial gestures (3D manipulation).

Gesture Vocabulary
✌️
Global Search
Victory sign activates search overlay
Static
☝️
Scroll Up
Index finger swipe upward
Dynamic
👇
Scroll Down
Index finger swipe downward
Dynamic
🖐️
Switch Tab Up
Open palm raise gesture
Spatial
🤏
Switch Tab Down
Open palm lower gesture
Spatial
Static Gesture
Dynamic Gesture
Spatial Gesture
Research

Comparative Usability Study

I conducted a comparative usability study with 10 participants to benchmark the Air Gesture system against traditional Tactile (Button) controls. To establish a realistic baseline, I simulated a physical armrest control panel using a graphic tablet equipped with mapped hotkeys and a trackpad.

Participants performed identical tasks (navigating a playlist, adjusting volume) using both methods while I measured performance and satisfaction across five usability metrics.

Test Protocol

Usability Test Protocol Flowchart

Test protocol: structured flow from introduction through consent, warm-up, experiment tasks, error handling, and post-test questionnaires

Results

Usability Metrics — Gesture vs. Tactile Navigation
Gesture-Based
Tactile-Based
Cognitive Load
2.9
0.5
Error Tolerance
2.8
0.8
Accessibility
2.7
2.3
Efficiency
1.9
2
Effectiveness
1.6
1.4
012345

Key Findings

2.9 vs 0.5
Lower Cognitive Load
Participants preferred staying reclined rather than visually scanning for buttons.
2.8 vs 0.8
Higher Error Tolerance
Users found it easier to re-wave a hand than to locate a specific button.
90%
Static Gesture Reliability
Dynamic gestures failed in low light — static ones stayed reliable. This would have been missed with Wizard of Oz.
Reflection

Key Learnings

Build Before You Test
Using a real CV pipeline (not Wizard of Oz) revealed constraints like low-light failure that would have been invisible in a faked prototype. The code became a research instrument.
Static Over Dynamic
Static gestures (Victory sign) achieved 90% reliability while dynamic ones (swiping) degraded in motion. Gesture vocabularies should prioritize static anchors for critical actions.
Ergonomics Drive Adoption
Cognitive load data showed passengers preferred gestures not because they were faster, but because they could stay reclined. Comfort outweighed speed as the adoption driver.

Future Scope

The current prototype addresses the core gesture pipeline. Here's what's next:

01
IR Sensor Migration
Transition from RGB webcams to infrared sensors to resolve low-light tracking issues in nighttime cab scenarios.
02
Multimodal Confirmation
Integrate voice confirmation alongside gestures to reduce false positives for critical actions like emergency stop or door unlock.
03
Expanded Gesture Set
Develop two-hand gesture combinations for complex operations like split-screen map + media control simultaneously.
04
Fleet-Scale Calibration
Adapt the pipeline for varied cabin geometries across different vehicle manufacturers, ensuring gesture recognition works in any seat position.
More in Featured Work