◎GestureOS.

Engineering a real-time gesture recognition
pipeline for autonomous vehicles — then proving
it outperforms physical controls.

Role

Design Technologist

Industry

Automotive HCI

Timeline

4 Weeks

Platform

Python + Figma

Tools

MediaPipe, PyAutoGUI

Proof of Concept

It Works. Here’s the Evidence.

Collaborative Design: While I engineered the interaction logic, the visual interface was the result of a collaborative design phase. My partner and I produced two distinct UI concepts; for the final usability test, we selected my partner's design. This decision allowed me to focus entirely on the computer vision pipeline and ensured the testing measured the gesture mechanics rather than my own visual preferences.

2.9 vs 0.5

Cognitive Load (Mean Score)

Gestures demanded significantly less mental effort

2.8 / 4.0

Error Tolerance

Users found gesture correction more natural than button hunting

90%

Static Gesture Reliability

Victory sign detection in controlled conditions

Context

The Lounge Problem

As autonomous vehicles evolve into “moving lounges,” passengers will sit in reclined positions, far from the dashboard. Traditional touchscreens require users to lean forward, breaking their relaxed posture and compromising safety. Physical remotes in public fleets are often lost or unhygienic.

The question wasn't whether gesture control could work — it was whether it could outperform the familiar comfort of physical buttons. To answer this honestly, I needed to build a real system, not fake one.

Touchless Control

Eliminate physical contact for shared fleet hygiene and universal accessibility.

Posture Preservation

Allow reclined interaction without forcing passengers to lean forward.

Real System, Real Data

Build a working CV pipeline (not Wizard of Oz) for honest validation.

Implementation

How I Built It

Instead of building from scratch or faking the interaction with a “Wizard of Oz” approach, I forked and reconstructed an open-source computer vision model to fit the specific ergonomic constraints of a car cabin.

A. Vision Pipeline

I adapted the kinivi/hand-gesture-recognition-mediapipe repository to track 21 skeletal hand landmarks in real-time. I refactored the code to filter out “noisy” data common in moving vehicles and manually re-established the gesture vocabulary by training the system on a custom dataset tailored to cabin interactions.

B. Figma Integration

To enable real-time control, I customized the Figma prototype to listen for specific single-key inputs (e.g., pressing “V” triggers a search overlay). I then programmed the Python script to simulate these exact keystrokes whenever a valid gesture was detected, effectively “driving” the Figma interface remotely.

Figma Prototype Screens — Map View, Media Home, Media Search

Figma prototype: Map View-Ride, Media-Home, and Media-Search Engine screens with interaction hotkeys

C. Input Bridge (PyAutoGUI)

I integrated the PyAutoGUI library to translate visual landmarks into system-level inputs. This allowed the Python script to take control of the mouse pointer and trigger hotkeys directly, creating a seamless bridge between the CV pipeline and the Figma prototype.

D. Physics & Sensitivity Tuning

I wrote custom logic to calculate the velocity and intensity of gestures. A slow rotation adjusts volume incrementally, while a fast, high-intensity drag drops the volume instantly. This mimicking of physical inertia made the touchless interaction feel tactile and responsive.

Live Testing Session — Prototype, Webcam, and Gesture Detection

Live testing session: screen displaying the prototype with gesture detection webcam feed and usability metrics sidebar

Interaction Design

Defining the Vocabulary

I developed a gesture vocabulary designed to balance intuitive control with system reliability. Three categories emerged: Static gestures (hold a pose), Dynamic gestures (directional movement), and Spatial gestures (3D manipulation).

Gesture Vocabulary

✌️

Global Search

Victory sign activates search overlay

Static

☝️

Scroll Up

Index finger swipe upward

Dynamic

👇

Scroll Down

Index finger swipe downward

Dynamic

🖐️

Switch Tab Up

Open palm raise gesture

Spatial

🤏

Switch Tab Down

Open palm lower gesture

Spatial

Static Gesture

Dynamic Gesture

Spatial Gesture

Research

Comparative Usability Study

I conducted a comparative usability study with 10 participants to benchmark the Air Gesture system against traditional Tactile (Button) controls. To establish a realistic baseline, I simulated a physical armrest control panel using a graphic tablet equipped with mapped hotkeys and a trackpad.

Participants performed identical tasks (navigating a playlist, adjusting volume) using both methods while I measured performance and satisfaction across five usability metrics.

Test Protocol

Test protocol: structured flow from introduction through consent, warm-up, experiment tasks, error handling, and post-test questionnaires

Results

Usability Metrics — Gesture vs. Tactile Navigation

Gesture-Based

Tactile-Based

Cognitive Load

2.9

0.5

Error Tolerance

2.8

0.8

Accessibility

2.7

2.3

Efficiency

1.9

Effectiveness

1.6

1.4

012345

Key Findings

2.9 vs 0.5

Lower Cognitive Load

Participants preferred staying reclined rather than visually scanning for buttons.

2.8 vs 0.8

Higher Error Tolerance

Users found it easier to re-wave a hand than to locate a specific button.

90%

Static Gesture Reliability

Dynamic gestures failed in low light — static ones stayed reliable. This would have been missed with Wizard of Oz.

Reflection

Key Learnings

Build Before You Test

Using a real CV pipeline (not Wizard of Oz) revealed constraints like low-light failure that would have been invisible in a faked prototype. The code became a research instrument.

Static Over Dynamic

Static gestures (Victory sign) achieved 90% reliability while dynamic ones (swiping) degraded in motion. Gesture vocabularies should prioritize static anchors for critical actions.

Ergonomics Drive Adoption

Cognitive load data showed passengers preferred gestures not because they were faster, but because they could stay reclined. Comfort outweighed speed as the adoption driver.

Future Scope

The current prototype addresses the core gesture pipeline. Here's what's next:

IR Sensor Migration

Transition from RGB webcams to infrared sensors to resolve low-light tracking issues in nighttime cab scenarios.

Multimodal Confirmation

Integrate voice confirmation alongside gestures to reduce false positives for critical actions like emergency stop or door unlock.

Expanded Gesture Set

Develop two-hand gesture combinations for complex operations like split-screen map + media control simultaneously.

Fleet-Scale Calibration

Adapt the pipeline for varied cabin geometries across different vehicle manufacturers, ensuring gesture recognition works in any seat position.

Indoor Navigator

An indoor navigation system designed for users who navigate without sight.

NEXT.JSSTRATEGYPRODUCT DESIGN

Talentflow

Strategic restructuring and visual POC for a recruitment retention engine.

Engineering a real-time gesture recognitionpipeline for autonomous vehicles — then provingit outperforms physical controls.