Exploring the Basics: How Our Screen-Free Voice-Activated AI Device Works

Exploring the Basics: How Our Screen-Free Voice-Activated AI Device Works

Hello everyone! Today, we're exploring an innovative device that allows us to interact directly with advanced artificial intelligence models using voice, without the need for computers, smartphones, or other screen-equipped devices.

How does it work?

This device is built on a very simple and mature solution, utilizing the esp32 chip. In addition, we have incorporated components like a microphone, speaker, operational buttons, and a battery, forming an independent hardware unit.

As for the server-side software, it can run on a local computer, NAS, or cloud services similar to AWS. This software primarily handles three tasks: Speech-to-Text (STT), invoking Large Language Models (LLM), and Text-to-Speech (TTS).

When you speak to the device, the microphone captures your voice and sends the audio file to the server. The server first uses the STT model to convert the audio into text, which is then fed into a configured large language model. Once the model processes this input, it outputs text that the server converts into speech using the TTS model and sends back to the hardware device. Finally, the device plays the audio through its speaker.

This completes a full cycle of interaction!

Imagine what else we could do with this setup?

  • We can integrate a variety of large language models, set different prompts to simulate various characters, and create diverse dialogue content.
  • By utilizing tools like Dify or FastGPT, we can access custom knowledge bases or combine APIs to access more information.
  • We can also configure different TTS models to reproduce various languages and voices, or even use tools like ElevenLabs to clone your own voice, allowing the device to sound just like you.

This is the smart voice interactive device we're introducing today. With it, our ways of communicating with large language models will become more varied and interesting!

Back to blog