Note: This is generated from a transcript from one of my YouTube videos

Introduction

Imagine a small, affordable device that empowers your children to explore the world of artificial intelligence safely and curiously. I've put together a system that allows kids to ask AI virtually anything, with built-in safeguards that ensure age-appropriate responses and provide you with peace of mind. This project leverages the power of Home Assistant, a versatile ESP32 microcontroller, and a clever AI proxy to create a truly unique and educational tool.

The Core Components: Building Blocks of Safe AI Interaction

At its heart, this project is a symphony of five key components working in harmony. First, we have the M5Stack, a compact ESP32 development board that serves as our physical interface, complete with a microphone and a small speaker. This little device is the gateway for your child’s voice. Next, we introduce Lite LLM, an essential AI switchboard that acts as a proxy, intelligently routing all queries to the chosen AI model.

The true orchestrator of this system is Home Assistant, the central hub that seamlessly connects all the moving parts. For the “brain” of our operation, I’ve opted for Google’s Gemini 2.5 Flash, a powerful yet efficient large language model. Finally, we need a way to hear the AI’s responses, so we’ll route the audio output to a preferred speaker, such as a Google Home device in your living room, or any other speaker accessible through Home Assistant.

graph TB
    subgraph "Hardware Layer"
        ATOM[M5Stack Atom Echo
192.168.0.84:6053]
        SPEAKER[Living Room Speaker]
    end
    
    subgraph "ESPHome Layer"
        ESP[ESPHome Server
192.168.0.104:6052]
        WAKE[Micro Wake Word
okay_nabu]
        MIC[Microphone I2S]
        LED[Status LED]
    end
    
    subgraph "Home Assistant Layer"
        HA[Home Assistant]
        HACS[Extended OpenAI
Conversation]
        TTS_HA[Home Assistant
Cloud TTS]
        AUTO1[Play ESPHome TTS
Automation]
        AUTO2[Play ESPHome TTS Error
Automation]
    end
    
    subgraph "LiteLLM Stack"
        LITE[LiteLLM Proxy
192.168.0.104:4000]
        PG[(PostgreSQL
Database)]
        LANGFUSE[LangFuse Cloud
Monitoring]
    end
    
    subgraph "AI Backend"
        GEMINI[Gemini 2.5 Flash]
    end
    
    ATOM <--> ESP
    ESP --> WAKE
    ESP --> MIC
    ESP --> LED
    ESP <--> HA
    HA <--> HACS
    HACS <--> LITE
    LITE <--> PG
    LITE --> LANGFUSE
    LITE <--> GEMINI
    AUTO1 --> SPEAKER
    AUTO2 --> TTS_HA
    TTS_HA --> SPEAKER

    style ATOM fill:#593559,stroke:#a3a3a3,stroke-width:2px
    style LITE fill:#414159,stroke:#a3a3a3,stroke-width:2px
    style HA fill:#415941,stroke:#a3a3a3,stroke-width:2px
    style GEMINI fill:#595941,stroke:#a3a3a3,stroke-width:2px

LiteLLM docker file

services:
  litellm:
    image: ghcr.io/berriai/litellm:main-latest
    container_name: litellm
    ports:
      - "4000:4000"
    volumes:
      - /volume1/docker/litellm/data:/app/data
    environment:
      - LITELLM_DROP_PARAMS=True
      - DATABASE_URL=postgresql://litellm:xxx@postgres:5432/litellm
      - STORE_MODEL_IN_DB=True
      - LITELLM_MASTER_KEY=sk-xxx
      - UI_USERNAME=admin
      - UI_PASSWORD=xxx
      - LANGFUSE_PUBLIC_KEY=pk-xxx
      - LANGFUSE_SECRET_KEY=sk-xxx
      - LANGFUSE_HOST=https://cloud.langfuse.com
    command: --port 4000 --detailed_debug
    restart: unless-stopped
    depends_on:
      - postgres
    networks:
      - litellm-network

  postgres:
    image: postgres:15
    container_name: litellm-postgres
    environment:
      - POSTGRES_DB=litellm
      - POSTGRES_USER=litellm
      - POSTGRES_PASSWORD=xxx
    volumes:
      - /volume1/docker/litellm/postgres-data:/var/lib/postgresql/data
    restart: unless-stopped
    networks:
      - litellm-network

networks:
  litellm-network:
    driver: bridge

Ensuring Safety: The Pre-Prompt is Key

The most critical aspect of this setup is ensuring that the AI’s interactions with your children are always safe and appropriate. This is achieved by implementing a carefully crafted pre-prompt within the Extended OpenAI Conversation add-on in Home Assistant. This pre-prompt acts as a set of strict instructions for the AI, guiding its behaviour and content.

This pre-prompt explicitly defines the AI’s persona as a helpful, kind, and encouraging assistant for children, emphasising simple language and age-appropriateness. Crucially, it includes a directive to never discuss inappropriate topics. If a question veers into sensitive territory, the AI is instructed to politely decline, stating, “You asked [question], but I can’t answer that. Ask Mum or Dad.” This ensures that any potentially problematic queries are flagged and brought to your attention, rather than being answered directly by the AI.

Furthermore, the pre-prompt includes a directive to always repeat the question back before providing an answer. This feature is incredibly useful, especially for younger children. If the AI misinterprets a question, the child can hear the repeated question and realise the misunderstanding, giving them an opportunity to rephrase and try again.

This is my prompt

You are a helpful AI assistant for children. 
Keep all responses age-appropriate, educational, kind, and encouraging. 
Use simple language that a child can understand. 
Never discuss inappropriate topics; if a topic is considered inappropriate, just simply say 
"You asked {question}, but I can't answer that, ask Mum or Dad".

When answering, always repeat the question back before giving the answer. 
Say "You asked {question}, here's my answer: {answer}"

Integrating with Home Assistant: The Extended OpenAI Conversation

To bring the AI's capabilities into Home Assistant, we'll utilise the Extended OpenAI Conversation add-on. If you don't already have the Home Assistant Community Store (HACS) set up, it's a straightforward process that I won't detail here, but it's highly recommended for easily installing custom integrations. Once HACS is ready, you'll install the "Extended OpenAI Conversation" add-on.

After installation, navigate to Settings > Devices & Services and add the integration. You’ll need to configure it to point to your Lite LLM instance, providing the Lite LLM API key. This is where you’ll also input the crucial pre-prompt we discussed earlier. You can also adjust the chat model to match your Lite LLM configuration and fine-tune the “temperature” setting. A lower temperature results in more factual and predictable responses, while a higher temperature allows for more creativity. For a child-friendly experience, a moderate temperature like 0.5 often strikes a good balance.

Home Assistant Voice

Once the Extended OpenAI Conversation addon is set up, you need to pull it in to a voice assistant. You can do this by navigating to Settings > Voice Assistants, and adding a new one. This is what you are going to need to use

This is the voice assistant that you are going to want to choose for your ESP32 device in Home Assistant once you have that set up, so you need to give it a sensible, identifiable name.

Setting Up the ESP32: Your Voice-Activated Assistant

The physical device that your children will interact with is an ESP32, specifically the M5Stack in this case. We’ll be using ESPHome to program this device, which will handle wake word detection and capture the voice input. To get started, you’ll typically spin up an ESPHome Docker container, similar to how we’ll manage other services.

version: '3'
services:
  esphome:
    container_name: esphome
    image: ghcr.io/esphome/esphome:latest
    volumes:
      - /volume1/docker/esphome/config:/config
    ports:
      - "6052:6052"
    restart: unless-stopped
    network_mode: host

Once ESPHome is running, you'll likely need to access its dashboard directly via its IP address and port, as reverse proxies can sometimes cause discovery issues. ESPHome might auto-discover your M5Stack; if so, you can take control of it. The core of the ESPHome configuration lies in its YAML file. This file defines the Wi-Fi credentials, API encryption key, and various settings for the device. I've provided a comprehensive example in my blog post, which you can copy and adapt. Remember to update your secrets for Wi-Fi credentials and generate your own API encryption key from the ESPHome website.

My ESPHome YAML

esphome:
  name: m5stack-atom-echo-01cf94
  friendly_name: hallway-atom-echo
  platformio_options:
    board_build.flash_mode: dio

esp32:
  board: m5stack-atom
  framework:
    type: esp-idf
    version: recommended

logger:

api:
  encryption:
    key: bLXT20skmjfgWk1MwC+YI7salc55h2JvdSCH3WWuXCw=

ota:
  - platform: esphome

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

i2s_audio:
  - id: i2s_shared
    i2s_lrclk_pin: GPIO33
    i2s_bclk_pin: GPIO19

microphone:
  - platform: i2s_audio
    i2s_audio_id: i2s_shared
    id: echo_microphone
    adc_type: external
    i2s_din_pin: GPIO23
    channel: left
    pdm: true

micro_wake_word:
  models:
    - model: okay_nabu
  on_wake_word_detected:
    - logger.log: "Wake word detected!"
    - light.turn_on:
        id: led
        brightness: 100%
        red: 100%
        green: 100%
        blue: 0%
    - delay: 500ms
    - voice_assistant.start:

binary_sensor:
  - platform: gpio
    pin:
      number: GPIO39
      inverted: true
    name: Button
    disabled_by_default: true
    entity_category: diagnostic
    id: echo_button
    on_multi_click:
      - timing:
          - ON for at most 350ms
          - OFF for at least 10ms
        then:
          - if:
              condition: voice_assistant.is_running
              then:
                - voice_assistant.stop:
              else:
                - voice_assistant.start:

voice_assistant:
  microphone: echo_microphone
  use_wake_word: false
  auto_gain: 31dBFS
  volume_multiplier: 4.0
  
  on_client_connected:
    - logger.log: "Client connected, starting wake word detection"
    - light.turn_on:
        id: led
        brightness: 100%
        red: 100%
        green: 0%
        blue: 0%
    - delay: 2s
    - light.turn_off: led
    - delay: 1s
    - micro_wake_word.start:
  
  on_client_disconnected:
    - logger.log: "Client disconnected, stopping wake word"
    - micro_wake_word.stop:
  
  on_listening:
    - logger.log: "Voice assistant listening"
    - light.turn_on:
        id: led
        brightness: 100%
        red: 0%
        green: 0%
        blue: 100%
  
  on_stt_end:
    - logger.log: "STT ended"
    - light.turn_on:
        id: led
        brightness: 100%
        red: 0%
        green: 100%
        blue: 0%
  
  on_tts_start:
    - if:
        condition:
          lambda: |-
            std::string response = x;
            return response.find("Error code:") != std::string::npos || 
                   response.find("error") != std::string::npos;
        then:
          - logger.log: "Error detected in response"
          - if:
              condition:
                lambda: |-
                  std::string response = x;
                  return response.find("429") != std::string::npos;
              then:
                - homeassistant.event:
                    event: esphome.voice_error
                    data:
                      error_type: rate_limit
              else:
                - homeassistant.event:
                    event: esphome.voice_error
                    data:
                      error_type: general
        else:
          - logger.log: "Normal response, will play TTS"

  on_tts_end:
    - logger.log: "TTS ended, raising event with output URL"
    - light.turn_on:
        id: led
        brightness: 100%
        red: 100%
        green: 100%
        blue: 0%
    - homeassistant.event:
        event: esphome.play.tts
        data:
          url: !lambda 'return x;'
          
  on_end:
    - logger.log: "Voice assistant ended, restarting wake word"
    - delay: 100ms
    - light.turn_off: led
    - micro_wake_word.start:

  on_error:
    - logger.log: 
        format: "Voice assistant error: %s"
        args: ['message.c_str()']
    - light.turn_on:
        id: led
        brightness: 100%
        red: 100%
        green: 0%
        blue: 0%
    - delay: 1s
    - light.turn_off: led
    - homeassistant.event:
        event: esphome.voice_error
    - delay: 500ms
    - micro_wake_word.start:

light:
  - platform: esp32_rmt_led_strip
    id: led
    name: "hallway-atom-echo LED"
    pin: GPIO27
    default_transition_length: 0s
    chipset: SK6812
    num_leds: 1
    rgb_order: GRB

After flashing the ESPHome firmware to your M5Stack, you’ll add it as an integration within Home Assistant. This connection allows Home Assistant to control the ESP32, including setting the voice assistant and configuring speech detection. For children, it’s beneficial to set the “finished speaking detection” to a more relaxed setting, allowing for natural pauses without prematurely ending the voice input.

Orchestrating the Conversation: Voice Assistant and Automations

With the ESP32 integrated into Home Assistant, we can now configure the voice assistant. You can name it something intuitive like “Kids AI Gateway.” The key is to select the Lite LLM conversation agent that you previously set up. For speech-to-text and text-to-speech, you can leverage Home Assistant Cloud’s Nabu Casa services, which work reliably, or explore other options if you prefer.

The magic truly happens with Home Assistant automations. We’ll set up two primary automations: one for the “happy path” (successful responses) and another for handling errors. The happy path automation is triggered by an ESPHome event and plays the audio response from the AI, routing it to your chosen speaker.

Happy path automation

alias: Play ESPHome TTS
description: ""
triggers:
  - event_type: esphome.play.tts
    trigger: event
conditions:
  - condition: template
    value_template: >
      {% set last_error =
      state_attr('automation.play_tts_handle_voice_assistant_errors',
      'last_triggered') %} {{ last_error is none or (as_timestamp(now()) -
      as_timestamp(last_error)) > 2 }}
actions:
  - target:
      entity_id: media_player.living_room_speaker
    data:
      media:
        media_content_id: "{{ trigger.event.data.url }}"
        media_content_type: audio/mpeg
        metadata: {}
    action: media_player.play_media

The error handling automation is a bit more nuanced. It’s designed to catch specific error codes, like a “429 Too Many Requests” error, and translate them into child-friendly messages. Instead of relaying a technical error, it might say, “That’s all the questions I can answer for today. Try again tomorrow.” For any other unexpected errors, it will default to a message like, “There is a problem, go ask Dad,” which effectively alerts you to an issue.

Error handling automation

alias: Play ESPHome TTS Error
description: ""
triggers:
  - event_type: esphome.voice_error
    trigger: event
actions:
  - choose:
      - conditions:
          - condition: template
            value_template: "{{ trigger.event.data.error_type == 'rate_limit' }}"
        sequence:
          - target:
              entity_id: tts.home_assistant_cloud
            data:
              media_player_entity_id: media_player.living_room_speaker
              message: >-
                That's all the questions I can answer for today. Try again
                tomorrow!
              cache: false
            action: tts.speak
    default:
      - target:
          entity_id: tts.home_assistant_cloud
        data:
          media_player_entity_id: media_player.living_room_speaker
          message: There was a problem generating content. Ask Dad instead!
          cache: false
        action: tts.speak

flowchart TD
    Start[TTS Start Event] --> Check{Response contains
'Error code:' or 'error'?}
    Check -->|No| Normal[Normal TTS Playback]
    Check -->|Yes| Error429{Response contains '429'?}
    
    Error429 -->|Yes| RateLimit[Fire Event:
esphome.voice_error
error_type: rate_limit]
    Error429 -->|No| GenError[Fire Event:
esphome.voice_error
error_type: general]
    
    RateLimit --> Auto1[Automation Triggered]
    GenError --> Auto1
    
    Auto1 --> CheckType{Check error_type}
    CheckType -->|rate_limit| RLMsg["Generate TTS:
'That's all the questions...'"]
    CheckType -->|general| GenMsg["Generate TTS:
'Ask Dad instead!'"]
    
    RLMsg --> PlayTTS[Play on Living Room Speaker]
    GenMsg --> PlayTTS
    
    Normal --> PlayURL[Play TTS URL on Speaker]
    
    PlayTTS --> End[Return to Idle]
    PlayURL --> End
    
    style Check fill:#595050
    style Error429 fill:#595050
    style RateLimit fill:#594747
    style GenError fill:#594747
    style Normal fill:#505950

Visual Feedback and Logging: Enhancing the Experience

To make the interaction even more intuitive for children, the ESP32's LED can provide visual cues. A brief yellow flash can indicate wake word detection, a blue light signifies that the device is listening, and a green light confirms that the question has been recognised and processing has begun. If an error occurs, the LED can turn red, signalling that the child should try again.

Beyond visual feedback, keeping a log of all interactions is invaluable. This is where Langfuse comes into play. By integrating Lite LLM with Langfuse, you gain a powerful tool for debugging and monitoring AI conversations. Langfuse provides detailed traces of each interaction, showing the system prompt, the user’s question, and the AI’s response. This allows you to review what your children are asking and how the AI is responding, offering insights and ensuring the safety measures are working as intended. Setting up Langfuse involves creating a project, generating API keys, and configuring Lite LLM to send its logs to the Langfuse cloud service.

If you set this up, you will be able to see everything that your kids ask, and the response that was given.

stateDiagram-v2
    [*] --> Idle: System ready
    Idle --> WakeDetected: Wake word "okay nabu"
    WakeDetected --> Listening: voice_assistant.start()
    Listening --> Processing: STT completed
    Processing --> TTSReady: AI response received
    TTSReady --> Playing: Media playing on speaker
    Playing --> Idle: Playback complete
    
    Listening --> Error: Voice assistant error
    Processing --> Error: AI/Network error
    Error --> Idle: After 1s delay
    
    note right of Idle
        LED: OFF
        Wake word detection active
    end note
    
    note right of WakeDetected
        LED: YELLOW (100%)
        Duration: 500ms
    end note
    
    note right of Listening
        LED: BLUE (100%)
        Microphone active
    end note
    
    note right of Processing
        LED: GREEN (100%)
        Waiting for AI response
    end note
    
    note right of TTSReady
        LED: YELLOW (100%)
        TTS audio ready
    end note
    
    note right of Playing
        LED: OFF
        Audio playing on speaker
    end note
    
    note right of Error
        LED: RED (100%)
        Duration: 1s
    end note

Conclusion

Building this safe AI interaction device for children is a rewarding project that combines cutting-edge technology with practical home automation. By carefully configuring Home Assistant, Lite LLM, and an ESP32 with ESPHome, you can create a system that fosters curiosity while prioritizing safety. The pre-prompt is your most powerful tool for ensuring age-appropriate content, and the logging capabilities through Langfuse provide valuable oversight. This £15 device, when combined with these smart home components, offers a unique and educational experience for young minds to safely explore the exciting world of artificial intelligence.

Video

You can watch the full video on YouTube here:

Support me to keep making videos

If you like the work I’m doing, please drop a like on the video, or consider subscribing to the channel.

In case you’re in a particularly generous mood, you can fund my next cup of coffee over on Ko-Fi

The links from some of my videos are affiliate links, which means I get a small kickback at no extra cost to you. It just means that the affiliate knows the traffic came from me.

This £15 Device Lets Kids Safely Ask AI Anything (Home Assistant + ESP32)