Light logo Small logo
Dark logo Small logo
Light logo Small logo
Dark logo Small logo

Custom Voice Assistant Device

As Amazon's Echo device began growing in popularity, I worked on several projects exploring the potential of voice interaction across a variety of contexts. However, the consumer device was not suited to our needs. To enable broader experimentation, I built a custom voice assistant device designed for more industrial and operational contexts.

Voice assistant hardware Voice assistant functioning
Scenario

At Amazon, there was growing interest in evaluating audio-based conversational interfaces for fulfillment center managers and associates. Initially, there was an assumption that an off-the-shelf Echo device could quickly demonstrate how voice could streamline tasks typically performed via touchscreen interfaces.

However, to conduct meaningful, ongoing voice research, we needed a more flexible solution, something better suited to the noise and complexity of a warehouse environment, and capable of delivering controlled, scriptable experiences more akin to bespoke chatbots than the Alexa ecosystem.

Approach

Early in planning, I identified two primary requirements: the device had to function reliably in the noisy fulfillment center environment, and it needed to be mountable to the 8020 aluminum extrusion hardware used in workstations.

With no aesthetic constraints, I focused purely on functionality and flexibility. I selected a Raspberry Pi 3B+ as the core, paired with the MATRIX Creator development board for I/O, providing a radial array of eight MEMS microphones and an addressable LED ring for "voice chrome" visual feedback that mimicked the Echo.

For audio output, I connected a small USB speaker housed in a custom 3D-printed parabolic enclosure. A standard Raspberry Pi 7" touchscreen display served as the administration interface, and the entire assembly was mounted to a pivoting third-party case compatible with extrusion frameworks.

The software stack consisted of a local Node.js server handling audio input and output, paired with a custom Bootstrap web interface for the touchscreen. The server leveraged AWS Polly for voice synthesis, and integrated AWS Lex and custom Lambda functions to support flexible chatbot interactions.

Result

The modular hardware and software architecture enabled rapid iteration on both physical and conversational design. Over a two-year period, the device was used in multiple experiments, supporting research ranging from basic tests of environmental noise effects to sophisticated new-hire training chatbots. It also became a regular fixture in internal presentations, demos, and media, helping evangelize my team's capabilities and the broader potential of voice interaction UX across Amazon fulfillment operations.