Smart doorbell AI announcements
Note: This is generated from a transcript from one of my YouTube videos
Elevate Your Smart Doorbell: AI Vision Announcements with Home Assistant




Getting Started: Google Cloud and Gemini API Setup



Next, you’ll need to obtain an API key from AI Studio. Simply head over to AIStudio.google.com/app/APIkey. I’ll make sure to include that link in the description below for easy access. Click “Create API Key” and ensure you select the Google Cloud project you just configured. Keep this API key readily available, as we’ll need it very soon.
Installing the LLM Vision Integration




First, install it from HACS just like any other custom integration. After installation, give Home Assistant a quick restart. Then, you can install it as a standard integration directly through the Home Assistant interface. When prompted, paste the API key we obtained earlier and select “Google” as your provider. You’ve got this!
Addressing Privacy Concerns

Here’s how I’ve approached it in my setup: my kids won’t be ringing the doorbell, so it should never capture images of them. Crucially, my automation is configured to trigger only when someone actually rings the doorbell, not merely when there’s motion in front of the camera. This significantly limits what gets sent to the cloud.
Building the Automation







The magic happens when we introduce a new action called llmvision.image_analyzer
, which is provided by the integration we just added. Let me walk you through the configuration that makes this work. We’re utilizing the llmvision.image_analyzer
action with some specific parameters that I’ve found to be very effective.
I’ve set the max_tokens
to 50 to ensure the responses remain concise. A “token” is essentially a word, so this limits the description to about 50 words. I’ve also set the temperature
to 0.2 for more consistent results. The temperature parameter ranges from 0 to 1; a value closer to 1 encourages more creative and varied responses, while a value closer to 0 yields more precise and consistent output. I’ve fine-tuned mine to 0.2, though I did experiment with 1 to try and get more interesting results – your mileage may legitimately vary! For the model, I’m using the Gemini 2.0 Flash model, primarily because it’s quite capable and, importantly, it’s free.
Crafting the AI Prompt: Prompt Engineering

This is a picture from a video doorbell.
If there is nobody there say they have already gone and nothing else.
Otherwise if they're walking away say they have already gone and nothing else.
Otherwise if they're faces hidden say check the camera, they're faces hidden and nothing else.
Otherwise if none of the above true then say a adjective, man/woman who looks like famous actor which I thought was quite fun.
Holding whatever they're holding is at the door.
The reason for being so explicit with the prompt, including phrases like “and nothing else,” is to prevent the AI from generating multiple, lengthy responses. I particularly love this approach because it effectively covers the most common situations you’ll encounter. Sometimes people ring the bell and immediately walk away, perhaps dropping off a package or simply in a hurry. Other times, you might have someone whose face is obscured by a hood or mask, which is definitely valuable information to have before you open your front door.
The Results and Customization

If you’re considering implementing this yourself, I highly recommend starting with a basic setup and then fine-tuning the prompts to align with your specific needs. Perhaps you desire more detailed descriptions, or maybe you prefer to focus on particular elements like packages or uniforms. The possibilities for customization are vast!
What do you think? Are you going to give this a try in your own smart home setup? Let me know in the comments below what creative prompts you come up with! If you found this helpful, as always, don’t forget to give it a thumbs up and subscribe for more smart home content. Thanks for watching, and I’ll see you in the next one!
Links:
I’m using a Reolink PoE video doorbell, which I highly recommend: https://amzn.to/4knsj8D
Here’s a link to Google AI studio: https://aistudio.google.com/app/apikey This is the llm vision integration documentation: https://llmvision.gitbook.io/getting-started
This is the prompt that I used to get good results:
This is a picture from a video doorbell.
If is nobody there say "they have already gone" and nothing else.
Otherwise, If they are walking away say "they have already gone" and nothing else.
Otherwise, If their face is hidden say "Check the camera, their face is hidden." and nothing else.
Otherwise, If none of the previous are true, say "A {adjective} {man/woman} who looks like {famous actor} holding {what they are holding} is at the door"
And here’s the complete YAML of the automation:
alias: Notifications - Doorbell
description: ""
triggers:
- type: turned_on
device_id: 60f0f7dba756a82ed054ec8200829078
entity_id: bd8e8bfd2b39fb7832bbc6cdd5df0fb3
domain: binary_sensor
trigger: device
conditions: []
actions:
- delay:
hours: 0
minutes: 0
seconds: 0
milliseconds: 500
- action: camera.snapshot
metadata: {}
data:
filename: /config/www/reolink_snapshot/last_snapshot_doorbell.jpg
target:
entity_id: camera.reolink_video_doorbell_poe_fluent
enabled: true
- action: llmvision.image_analyzer
data:
include_filename: false
max_tokens: 50
provider: 01JW3EN2J5JB2HVCN7DH6H8269
image_file: /config/www/reolink_snapshot/last_snapshot_doorbell.jpg
model: gemini-2.0-flash
message: This is a picture from a video doorbell.
If is nobody there say "they have already gone" and nothing else.
Otherwise, If they are walking away say "they have already gone" and
nothing else.
Otherwise, If their face is hidden say "Check the camera, their face is
hidden." and nothing else.
Otherwise, If none of the previous are true, say "A {adjective}
{man/woman} who looks like {famous actor} holding {what they are
holding} is at the door"
temperature: 0.2
response_variable: vision_result
- action: notify.mobile_app_pixel_9_pro
metadata: {}
data:
title: Doorbell
message: "{{ vision_result.response_text }}"
data:
image: /local/reolink_snapshot/last_snapshot_doorbell.jpg
clickAction: app://com.mcu.reolink
enabled: true
alias: Notify Ben's phone
- action: tts.speak
target:
entity_id: tts.home_assistant_cloud
data:
cache: true
media_player_entity_id: media_player.kitchen
message: Ding dong. {{ vision_result.response_text }}
enabled: true
alias: Notify kitchen speaker
mode: single
Video
You can watch the full video on YouTube here:
Support me to keep making videos

If you like the work I’m doing, please drop a like on the video, or consider subscribing to the channel.
In case you’re in a particularly generous mood, you can fund my next cup of coffee over on Ko-Fi
The links from some of my videos are affiliate links, which means I get a small kickback at no extra cost to you. It just means that the affiliate knows the traffic came from me.