Author(s): Anders Ohrn Originally published on Towards AI. Source: Image by Author A hope for Large Language Models (LLMs) is that they will close the gap between what is meant (the semantics) and its formulation (the syntax). That includes calling software functions and querying databases, which employ stringent syntax — a single misplaced character can cause hours of head-scratching and troubleshooting. Tool-using LLMs is a great new approach. Any business-relevant task that deals with stated intentions or requests in part by function-calling can be augmented with a semantic layer to close the gap between meaning and formulation. I will make it concrete: the task is to plan a journey or commute in London, UK. There are functions and tools already to deal in part with that, like the Transport for London (TfL) web Application Programming Interfaces (APIs). However, these APIs “speak machine”. For a person with travel intent and needs the APIs are not enough. We can fill that gap with tool-using LLMs. I describe the problem and solution generally in the next sections. I then show how to implement it with Anthropic’s LLMs. All the code will be accounted for and is available in a Github repo. Once this blog post concludes, you will understand tool-using LLM utility and have the means to build useful prototypes. Principal-Assistant-Machine and the Semantic Layer Imagine planning a commute between places in London by tube, bus, bike, taxi, walking etc. Imagine furthermore that you have preferences on the number of interchanges or how much to use a bike along the way etc. The process to solve this task is visualized below, beginning with the principal (e.g. the human commuter) making a request, and concluding with the principal receiving from the assistant a journey plan in the form of a step-by-step navigation plan, a map or some other object. The workflow of principal, assistant and machine, serving the principal’s request in a clockwise order. Route optimization is a well-established problem with many algorithms to draw on. Computers execute this step given a precisely stated objective. This step is contained within the amber-coloured machine box in the illustration above. The principal should not have to know the machine layer APIs or the algorithms. The principle should simply be served with plans that address the requests and fit their style and mode of information consumption. The principal swims in the domain of meaning and semantics, culture and language, while the machine is anchored in formula and syntax, algorithm and code. The assistant deals with the efforts between the principal and the machine. The assistant possesses knowledge of tools and their proper use, and also how to interpret the principal, and, like a project manager, how to define units of work that add up and make sense. It is the assistant layer that the tool-using LLMs can automate, scale and accelerate. Tasks that are rate-limited or cost-constrained by what goes on in the assistant layer are ripe for disruption. In Simple Terms, What Does the Tool-Using LLM Do? Most providers of advanced LLMs enable tool use, especially for their advanced models. In practice, these LLMs have been tuned to generate strings of text that conform to a very particular structure and syntax. Why does that enable tool use? Because in the land of machine and software engineering, structured data and strict type and syntax rule supreme. To create a tool-using LLM application, the input to the LLM has to include a specification of that very particular syntax. The user’s input prompt to the LLM is not all there is. The LLM that receives the full package — input prompt and syntax specification — can then evaluate whether to generate a free text response, a response in the particular syntax, or a combination of the two. It is up to the application developer to convert the LLM-generated output into a function call, data query, or similar. Compared to smooth back-and-forth chatting with an LLM (e.g. ChatGPT, Claude, Le Chat), an application with tool-using LLMs requires more work by the application developer. The syntax specifications, their relation to the functions to be invoked and what to do with the functions’ output require additional code and logic. The tutorial below deals with this complete effort. What London Journey Planning Looks Like with AI Assitant Layer I will briefly illustrate the principal’s view of an implemented solution. The animated image below illustrates a test conversation. The request by the principal (the green text box) is phrased in a way a human reader can understand. The request blends types of syntax and implicit references for time (“6 am”, “five”, “early evening”) and place (“home”, “Tate Modern museum”, “from there”). Exchange between principal and Journey Planner Vitaloid The journey planner (I named it Journey Planner Vitaloid) correctly extracts the content of the request and maps it to three calls to the TfL API where the route optimization algorithms do their magic. The journey planner also uses the tools for map-making. The map it outputs is shown below: From home (Baker Street) to work (Cheapside) with public transit (tube and bus) then a walk across the Millenium Bridge to Tate Modern, followed by a commute back home via Waterloo; Image by Author using folium Python library. The journey plan details are not described in the exchange above. However, detailed data is available, and stored in data structures of the application. So when I, the principal, reply that I desire a summary of the steps in one or more of the different journeys, the tool-using LLM interprets the request, generates the relevant function call, and then compactly and correctly summarizes the steps in the details returned from the function call. Follow up request from principal with detailed step-by-step description generated from structured syntax from TfL API. In appearance, this is much like the well-known LLM-powered chatbots. The distinction is that the tool-using LLMs equip the Journey Planner Vitaloid with the capacities of algorithms that are not only about semantic processing. Also, any real-time updates […]
↧