Quantcast
Channel: Machine Learning | Towards AI
Viewing all articles
Browse latest Browse all 792

Creating a Smart Home AI Assistant

$
0
0
Author(s): Michael K Originally published on Towards AI. Source: Image generated by the author (using Adobe Generative AI) The hardware AI assistants recently released have been making splashes in the news which gave me a lot of inspiration around the concept of an ‘action model’ and how powerful they could be. It also made me curious about how hard would it be to give a large language model access to my smart home API — because coding an entire assistant is totally easier than just opening a tab to my dashboard. In this article, using Python and a few open-source tools, we’ll create an assistant that can perform almost any action we desire. We’ll also explore how this works under the hood, and how we can use some extra tools to make debugging these agents a cakewalk. Wrestling LLM Responses I’ve previously written an article about prompt engineering, which still proves to be our most powerful concept as the end-users of the models. Tool use is a supercharged version of prompt engineering, which allows us to give the models a way to do more than just generate text in the end. For example, we could give the model the ability to search Wikipedia, look up customer information for a support request, or send an email — the sky is truly the limit, other than your programming ability of course. Combined with tool use, we can also get the LLMs to generate structured output, allowing us to reliably provide a formatted response. Without these tools, the model’s response can vary wildly, or be heavily influenced by the context provided. This often distracts the model from the requested format, or, depending on the context, can produce erroneous results. The random seed the model uses, as well as its temperature (willingness to generate more varied responses), can be controlled; however, this is far from perfect. Creating the Solution To manage the dependencies for the project, I’ll be using Poetry, which we can initialize like so: Poetry will create all of the boilerplate we need to get started, so the next step is to define any additional dependencies we have. Let’s go ahead and add those now: Ollama I’ll be using Ollama to handle communicating with the model, however, Phidata supports numerous LLM integrations, so you could swap out Ollama for whichever works best for you. To get Ollama set up, it only takes a few steps: Other than Meta’s Llama 3, I’ve had great success with Mistral’s 7B model and Microsoft’s Wizard LM2 when using tools. As more modern models are released, tool use will likely become better supported. Creating the Assistant Phidata lets us structure and format the LLM’s response using Pydantic objects, giving us a reliable method to extract information from the response in a programmatic fashion. For example, if we wanted to create an assistant that only answered math questions: This is incredibly useful for instances where you have complex responses from the model. If you take a look at the prompt it generated, we can see how it gets the model to play nice: Through prompt engineering, it is massaging their response into exactly what we would need, with or without the fields we would require. For example, if we asked a question without an apparent answer: Based on my previous experience with Phidata in a few projects, it’s vital to give the model every possible option as it can trigger an error. In the math example above, if you did not tell Pydantic that the answer key could be None as well, it will provide a verbose answer in addition to the context, versus just returning None: Assistant Tool Use Much like ourselves, giving the LLM tools to perform actions makes it more efficient, accurate, and useful in the long run. Phidata comes with a bunch of awesome tools built-in, but we can also create our own tools giving it access to databases, APIs, or even local binaries if we desire. Let’s give the Assistant access to the internal API for my house, so it can tell us the temperature in a few locations around the house: Phidata does all of the heavy lifting for us, by parsing the response from the model, calling the correct function, and finally returning the response. I’ve included a mock feature so you can test it out without having an API of your own. API Creation To interact with our assistant, we’ll use FastAPI to create a light REST API to handle incoming requests and run the assistant code for us. Another option would be to use a queue system, however, for our use case, this should work fine since it is low traffic. First, let's install the dependencies we’ll need for the API: Then, we can define our base application: I’m setting up Logfire here, which is optional, but it increases our visibility greatly, and we don’t have to spelunk through a mountain of logs as well! Most of the libraries used in this project already have integrations with Logfire, allowing us to truly extract as much information as possible in the fewest lines of code. Testing To run the server, we can use the fastapi utility that gets linked after we install the library: By default, FastAPI uses port 8000 so we’ll use that to send a test prompt: Logfire If you enabled Logfire, we can follow the chain of actions and see the arguments and values for each step: Source: Image by the author The timing chart to the right is also great for understanding where a request might be getting stuck so further investigation can be done. Also, since I plan to eventually try this with a physical device, being able to go back and investigate a weird response is a lifesaver. Next Steps The only part missing now is the actual hardware — so my next project is to take an extra ESP32 I have lying around, and see how much work it’ll be to do […]

Viewing all articles
Browse latest Browse all 792

Trending Articles