This £15 Device Lets Kids Safely Ask AI Anything (Home Assistant + ESP32)
Note: This is generated from a transcript from one of my YouTube videos
Introduction
Imagine a small, affordable device that empowers your children to explore the world of artificial intelligence safely and curiously. I've put together a system that allows kids to ask AI virtually anything, with built-in safeguards that ensure age-appropriate responses and provide you with peace of mind. This project leverages the power of Home Assistant, a versatile ESP32 microcontroller, and a clever AI proxy to create a truly unique and educational tool.
The Core Components: Building Blocks of Safe AI Interaction
At its heart, this project is a symphony of five key components working in harmony. First, we have the M5Stack, a compact ESP32 development board that serves as our physical interface, complete with a microphone and a small speaker. This little device is the gateway for your child’s voice. Next, we introduce Lite LLM, an essential AI switchboard that acts as a proxy, intelligently routing all queries to the chosen AI model.
The true orchestrator of this system is Home Assistant, the central hub that seamlessly connects all the moving parts. For the “brain” of our operation, I’ve opted for Google’s Gemini 2.5 Flash, a powerful yet efficient large language model. Finally, we need a way to hear the AI’s responses, so we’ll route the audio output to a preferred speaker, such as a Google Home device in your living room, or any other speaker accessible through Home Assistant.
graph TB
subgraph "Hardware Layer"
ATOM[M5Stack Atom Echo
192.168.0.84:6053]
SPEAKER[Living Room Speaker]
end
subgraph "ESPHome Layer"
ESP[ESPHome Server
192.168.0.104:6052]
WAKE[Micro Wake Word
okay_nabu]
MIC[Microphone I2S]
LED[Status LED]
end
subgraph "Home Assistant Layer"
HA[Home Assistant]
HACS[Extended OpenAI
Conversation]
TTS_HA[Home Assistant
Cloud TTS]
AUTO1[Play ESPHome TTS
Automation]
AUTO2[Play ESPHome TTS Error
Automation]
end
subgraph "LiteLLM Stack"
LITE[LiteLLM Proxy
192.168.0.104:4000]
PG[(PostgreSQL
Database)]
LANGFUSE[LangFuse Cloud
Monitoring]
end
subgraph "AI Backend"
GEMINI[Gemini 2.5 Flash]
end
ATOM <--> ESP
ESP --> WAKE
ESP --> MIC
ESP --> LED
ESP <--> HA
HA <--> HACS
HACS <--> LITE
LITE <--> PG
LITE --> LANGFUSE
LITE <--> GEMINI
AUTO1 --> SPEAKER
AUTO2 --> TTS_HA
TTS_HA --> SPEAKER
style ATOM fill:#593559,stroke:#a3a3a3,stroke-width:2px
style LITE fill:#414159,stroke:#a3a3a3,stroke-width:2px
style HA fill:#415941,stroke:#a3a3a3,stroke-width:2px
style GEMINI fill:#595941,stroke:#a3a3a3,stroke-width:2px
LiteLLM docker file
services:
litellm:
image: ghcr.io/berriai/litellm:main-latest
container_name: litellm
ports:
- "4000:4000"
volumes:
- /volume1/docker/litellm/data:/app/data
environment:
- LITELLM_DROP_PARAMS=True
- DATABASE_URL=postgresql://litellm:xxx@postgres:5432/litellm
- STORE_MODEL_IN_DB=True
- LITELLM_MASTER_KEY=sk-xxx
- UI_USERNAME=admin
- UI_PASSWORD=xxx
- LANGFUSE_PUBLIC_KEY=pk-xxx
- LANGFUSE_SECRET_KEY=sk-xxx
- LANGFUSE_HOST=https://cloud.langfuse.com
command: --port 4000 --detailed_debug
restart: unless-stopped
depends_on:
- postgres
networks:
- litellm-network
postgres:
image: postgres:15
container_name: litellm-postgres
environment:
- POSTGRES_DB=litellm
- POSTGRES_USER=litellm
- POSTGRES_PASSWORD=xxx
volumes:
- /volume1/docker/litellm/postgres-data:/var/lib/postgresql/data
restart: unless-stopped
networks:
- litellm-network
networks:
litellm-network:
driver: bridge
Ensuring Safety: The Pre-Prompt is Key
The most critical aspect of this setup is ensuring that the AI’s interactions with your children are always safe and appropriate. This is achieved by implementing a carefully crafted pre-prompt within the Extended OpenAI Conversation add-on in Home Assistant. This pre-prompt acts as a set of strict instructions for the AI, guiding its behaviour and content.
This pre-prompt explicitly defines the AI’s persona as a helpful, kind, and encouraging assistant for children, emphasising simple language and age-appropriateness. Crucially, it includes a directive to never discuss inappropriate topics. If a question veers into sensitive territory, the AI is instructed to politely decline, stating, “You asked [question], but I can’t answer that. Ask Mum or Dad.” This ensures that any potentially problematic queries are flagged and brought to your attention, rather than being answered directly by the AI.
Furthermore, the pre-prompt includes a directive to always repeat the question back before providing an answer. This feature is incredibly useful, especially for younger children. If the AI misinterprets a question, the child can hear the repeated question and realise the misunderstanding, giving them an opportunity to rephrase and try again.
This is my prompt
You are a helpful AI assistant for children.
Keep all responses age-appropriate, educational, kind, and encouraging.
Use simple language that a child can understand.
Never discuss inappropriate topics; if a topic is considered inappropriate, just simply say
"You asked {question}, but I can't answer that, ask Mum or Dad".
When answering, always repeat the question back before giving the answer.
Say "You asked {question}, here's my answer: {answer}"
Integrating with Home Assistant: The Extended OpenAI Conversation
To bring the AI's capabilities into Home Assistant, we'll utilise the Extended OpenAI Conversation add-on. If you don't already have the Home Assistant Community Store (HACS) set up, it's a straightforward process that I won't detail here, but it's highly recommended for easily installing custom integrations. Once HACS is ready, you'll install the "Extended OpenAI Conversation" add-on.
After installation, navigate to Settings > Devices & Services and add the integration. You’ll need to configure it to point to your Lite LLM instance, providing the Lite LLM API key. This is where you’ll also input the crucial pre-prompt we discussed earlier. You can also adjust the chat model to match your Lite LLM configuration and fine-tune the “temperature” setting. A lower temperature results in more factual and predictable responses, while a higher temperature allows for more creativity. For a child-friendly experience, a moderate temperature like 0.5 often strikes a good balance.
Home Assistant Voice
Once the Extended OpenAI Conversation addon is set up, you need to pull it in to a voice assistant. You can do this by navigating to Settings > Voice Assistants, and adding a new one. This is what you are going to need to use
This is the voice assistant that you are going to want to choose for your ESP32 device in Home Assistant once you have that set up, so you need to give it a sensible, identifiable name.
Setting Up the ESP32: Your Voice-Activated Assistant
The physical device that your children will interact with is an ESP32, specifically the M5Stack in this case. We’ll be using ESPHome to program this device, which will handle wake word detection and capture the voice input. To get started, you’ll typically spin up an ESPHome Docker container, similar to how we’ll manage other services.
version: '3'
services:
esphome:
container_name: esphome
image: ghcr.io/esphome/esphome:latest
volumes:
- /volume1/docker/esphome/config:/config
ports:
- "6052:6052"
restart: unless-stopped
network_mode: host
Once ESPHome is running, you'll likely need to access its dashboard directly via its IP address and port, as reverse proxies can sometimes cause discovery issues. ESPHome might auto-discover your M5Stack; if so, you can take control of it. The core of the ESPHome configuration lies in its YAML file. This file defines the Wi-Fi credentials, API encryption key, and various settings for the device. I've provided a comprehensive example in my blog post, which you can copy and adapt. Remember to update your secrets for Wi-Fi credentials and generate your own API encryption key from the ESPHome website.
My ESPHome YAML
esphome:
name: m5stack-atom-echo-01cf94
friendly_name: hallway-atom-echo
platformio_options:
board_build.flash_mode: dio
esp32:
board: m5stack-atom
framework:
type: esp-idf
version: recommended
logger:
api:
encryption:
key: bLXT20skmjfgWk1MwC+YI7salc55h2JvdSCH3WWuXCw=
ota:
- platform: esphome
wifi:
ssid: !secret wifi_ssid
password: !secret wifi_password
i2s_audio:
- id: i2s_shared
i2s_lrclk_pin: GPIO33
i2s_bclk_pin: GPIO19
microphone:
- platform: i2s_audio
i2s_audio_id: i2s_shared
id: echo_microphone
adc_type: external
i2s_din_pin: GPIO23
channel: left
pdm: true
micro_wake_word:
models:
- model: okay_nabu
on_wake_word_detected:
- logger.log: "Wake word detected!"
- light.turn_on:
id: led
brightness: 100%
red: 100%
green: 100%
blue: 0%
- delay: 500ms
- voice_assistant.start:
binary_sensor:
- platform: gpio
pin:
number: GPIO39
inverted: true
name: Button
disabled_by_default: true
entity_category: diagnostic
id: echo_button
on_multi_click:
- timing:
- ON for at most 350ms
- OFF for at least 10ms
then:
- if:
condition: voice_assistant.is_running
then:
- voice_assistant.stop:
else:
- voice_assistant.start:
voice_assistant:
microphone: echo_microphone
use_wake_word: false
auto_gain: 31dBFS
volume_multiplier: 4.0
on_client_connected:
- logger.log: "Client connected, starting wake word detection"
- light.turn_on:
id: led
brightness: 100%
red: 100%
green: 0%
blue: 0%
- delay: 2s
- light.turn_off: led
- delay: 1s
- micro_wake_word.start:
on_client_disconnected:
- logger.log: "Client disconnected, stopping wake word"
- micro_wake_word.stop:
on_listening:
- logger.log: "Voice assistant listening"
- light.turn_on:
id: led
brightness: 100%
red: 0%
green: 0%
blue: 100%
on_stt_end:
- logger.log: "STT ended"
- light.turn_on:
id: led
brightness: 100%
red: 0%
green: 100%
blue: 0%
on_tts_start:
- if:
condition:
lambda: |-
std::string response = x;
return response.find("Error code:") != std::string::npos ||
response.find("error") != std::string::npos;
then:
- logger.log: "Error detected in response"
- if:
condition:
lambda: |-
std::string response = x;
return response.find("429") != std::string::npos;
then:
- homeassistant.event:
event: esphome.voice_error
data:
error_type: rate_limit
else:
- homeassistant.event:
event: esphome.voice_error
data:
error_type: general
else:
- logger.log: "Normal response, will play TTS"
on_tts_end:
- logger.log: "TTS ended, raising event with output URL"
- light.turn_on:
id: led
brightness: 100%
red: 100%
green: 100%
blue: 0%
- homeassistant.event:
event: esphome.play.tts
data:
url: !lambda 'return x;'
on_end:
- logger.log: "Voice assistant ended, restarting wake word"
- delay: 100ms
- light.turn_off: led
- micro_wake_word.start:
on_error:
- logger.log:
format: "Voice assistant error: %s"
args: ['message.c_str()']
- light.turn_on:
id: led
brightness: 100%
red: 100%
green: 0%
blue: 0%
- delay: 1s
- light.turn_off: led
- homeassistant.event:
event: esphome.voice_error
- delay: 500ms
- micro_wake_word.start:
light:
- platform: esp32_rmt_led_strip
id: led
name: "hallway-atom-echo LED"
pin: GPIO27
default_transition_length: 0s
chipset: SK6812
num_leds: 1
rgb_order: GRB
After flashing the ESPHome firmware to your M5Stack, you’ll add it as an integration within Home Assistant. This connection allows Home Assistant to control the ESP32, including setting the voice assistant and configuring speech detection. For children, it’s beneficial to set the “finished speaking detection” to a more relaxed setting, allowing for natural pauses without prematurely ending the voice input.
Orchestrating the Conversation: Voice Assistant and Automations
With the ESP32 integrated into Home Assistant, we can now configure the voice assistant. You can name it something intuitive like “Kids AI Gateway.” The key is to select the Lite LLM conversation agent that you previously set up. For speech-to-text and text-to-speech, you can leverage Home Assistant Cloud’s Nabu Casa services, which work reliably, or explore other options if you prefer.
The magic truly happens with Home Assistant automations. We’ll set up two primary automations: one for the “happy path” (successful responses) and another for handling errors. The happy path automation is triggered by an ESPHome event and plays the audio response from the AI, routing it to your chosen speaker.
Happy path automation
alias: Play ESPHome TTS
description: ""
triggers:
- event_type: esphome.play.tts
trigger: event
conditions:
- condition: template
value_template: >
{% set last_error =
state_attr('automation.play_tts_handle_voice_assistant_errors',
'last_triggered') %} {{ last_error is none or (as_timestamp(now()) -
as_timestamp(last_error)) > 2 }}
actions:
- target:
entity_id: media_player.living_room_speaker
data:
media:
media_content_id: "{{ trigger.event.data.url }}"
media_content_type: audio/mpeg
metadata: {}
action: media_player.play_media
The error handling automation is a bit more nuanced. It’s designed to catch specific error codes, like a “429 Too Many Requests” error, and translate them into child-friendly messages. Instead of relaying a technical error, it might say, “That’s all the questions I can answer for today. Try again tomorrow.” For any other unexpected errors, it will default to a message like, “There is a problem, go ask Dad,” which effectively alerts you to an issue.
Error handling automation
alias: Play ESPHome TTS Error
description: ""
triggers:
- event_type: esphome.voice_error
trigger: event
actions:
- choose:
- conditions:
- condition: template
value_template: "{{ trigger.event.data.error_type == 'rate_limit' }}"
sequence:
- target:
entity_id: tts.home_assistant_cloud
data:
media_player_entity_id: media_player.living_room_speaker
message: >-
That's all the questions I can answer for today. Try again
tomorrow!
cache: false
action: tts.speak
default:
- target:
entity_id: tts.home_assistant_cloud
data:
media_player_entity_id: media_player.living_room_speaker
message: There was a problem generating content. Ask Dad instead!
cache: false
action: tts.speak
flowchart TD
Start[TTS Start Event] --> Check{Response contains
'Error code:' or 'error'?}
Check -->|No| Normal[Normal TTS Playback]
Check -->|Yes| Error429{Response contains '429'?}
Error429 -->|Yes| RateLimit[Fire Event:
esphome.voice_error
error_type: rate_limit]
Error429 -->|No| GenError[Fire Event:
esphome.voice_error
error_type: general]
RateLimit --> Auto1[Automation Triggered]
GenError --> Auto1
Auto1 --> CheckType{Check error_type}
CheckType -->|rate_limit| RLMsg["Generate TTS:
'That's all the questions...'"]
CheckType -->|general| GenMsg["Generate TTS:
'Ask Dad instead!'"]
RLMsg --> PlayTTS[Play on Living Room Speaker]
GenMsg --> PlayTTS
Normal --> PlayURL[Play TTS URL on Speaker]
PlayTTS --> End[Return to Idle]
PlayURL --> End
style Check fill:#595050
style Error429 fill:#595050
style RateLimit fill:#594747
style GenError fill:#594747
style Normal fill:#505950
Visual Feedback and Logging: Enhancing the Experience
To make the interaction even more intuitive for children, the ESP32's LED can provide visual cues. A brief yellow flash can indicate wake word detection, a blue light signifies that the device is listening, and a green light confirms that the question has been recognised and processing has begun. If an error occurs, the LED can turn red, signalling that the child should try again.
Beyond visual feedback, keeping a log of all interactions is invaluable. This is where Langfuse comes into play. By integrating Lite LLM with Langfuse, you gain a powerful tool for debugging and monitoring AI conversations. Langfuse provides detailed traces of each interaction, showing the system prompt, the user’s question, and the AI’s response. This allows you to review what your children are asking and how the AI is responding, offering insights and ensuring the safety measures are working as intended. Setting up Langfuse involves creating a project, generating API keys, and configuring Lite LLM to send its logs to the Langfuse cloud service.
If you set this up, you will be able to see everything that your kids ask, and the response that was given.
stateDiagram-v2
[*] --> Idle: System ready
Idle --> WakeDetected: Wake word "okay nabu"
WakeDetected --> Listening: voice_assistant.start()
Listening --> Processing: STT completed
Processing --> TTSReady: AI response received
TTSReady --> Playing: Media playing on speaker
Playing --> Idle: Playback complete
Listening --> Error: Voice assistant error
Processing --> Error: AI/Network error
Error --> Idle: After 1s delay
note right of Idle
LED: OFF
Wake word detection active
end note
note right of WakeDetected
LED: YELLOW (100%)
Duration: 500ms
end note
note right of Listening
LED: BLUE (100%)
Microphone active
end note
note right of Processing
LED: GREEN (100%)
Waiting for AI response
end note
note right of TTSReady
LED: YELLOW (100%)
TTS audio ready
end note
note right of Playing
LED: OFF
Audio playing on speaker
end note
note right of Error
LED: RED (100%)
Duration: 1s
end note
Conclusion
Building this safe AI interaction device for children is a rewarding project that combines cutting-edge technology with practical home automation. By carefully configuring Home Assistant, Lite LLM, and an ESP32 with ESPHome, you can create a system that fosters curiosity while prioritizing safety. The pre-prompt is your most powerful tool for ensuring age-appropriate content, and the logging capabilities through Langfuse provide valuable oversight. This £15 device, when combined with these smart home components, offers a unique and educational experience for young minds to safely explore the exciting world of artificial intelligence.
Links:
- M5Stack Atom Echo: https://s.click.aliexpress.com/e/_c3v0irFl
Video
You can watch the full video on YouTube here:
Support me to keep making videos
If you like the work I’m doing, please drop a like on the video, or consider subscribing to the channel.
In case you’re in a particularly generous mood, you can fund my next cup of coffee over on Ko-Fi
The links from some of my videos are affiliate links, which means I get a small kickback at no extra cost to you. It just means that the affiliate knows the traffic came from me.