Sheet 5.1 LLM agents#
Author: Polina Tsvilodub
This sheet takes a closer look at more complex LLM-based systems and LLM agents. Specifically, we will use the package langchain
and its extensions to build our own LLM systems and explore their functionality. The learning goals for this sheet are:
understanding basics of langchain
trying out langchain agents and tools
understanding the basics of output processing
familiarization with basic handling of agent memory
Langchain is under heavy development. Sometimes examples provided in the docs break with version updates, so one needs to be somewhat patient.
NOTE: At this point it provides quite vast functionality (and docs, respectively) – of course, we do NOT expect you to study or understand all of that. The examples below will provide links to some relevant parts of the documentation, and the examples serve a little demo / inspiration of what is out there, as a starting point for you to learn more, if you are interested.
LangChain#
The lecture discussed that modern LLMs can be viewed as building blocks of larger systems, be it for engineering or research purposes. In particular, one might want to use an LLM and make several calls (i.e., several inference passses) to it, and somehow use the predicted results together to complete one’s task. Note that when we talk about such systems, we (almost always) use the LLM for inference, i.e., the LLM is already pretrained / fine-tuned.
Using the terminology of langchain, a sequence of such LLM calls is called a chain. For each call, one minimally needs to specify a (pretrained / fine-tuned) LLM and a prompt that specifies what exactly the call should accomplish. For the prompt, oftentimes prompt templates are used. These prompt templates usually specify variables which are filled with inputs when the respective LLM call is invoked. The idea behind this is that the calls can be re-used, e.g., with various user inputs, without having to re-type the entire prompt. Further, these inputs may come from a previous LLM call. One neat feature of langchain is that it allows to seamlessly chain LLM calls and stream outputs from one call into the next. Specific types of templates (e.g., chat prompt templates) also take care of formatting text in the way expected by the model, e.g., adding the required special tokens and format for chat models.
[Disclaimer: Not sponsored by LangChain – there are other very useful tools for doing such things, for instance, Haystack. This is just one popular example.]
Below, we will first look at a an example of a simple sequence of LLM calls. In particular, we will build a system that helps us to come up with a dinner menu, given some ingredients that we already have.
We will be using the OpenAI API to get optimal performance (specifically, the GPT-3.5-turbo model). Instructions for retreiving the an API key will be provided in class.
# please install the following packages and versions
#!pip install langchain==0.2.2 langchain-core==0.2.4 langchain-openai==0.1.7 wikipedia==1.4.0 langchainhub==0.1.17 langchain-community==0.2.1
import os
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
# set some hyperparameters for the generation
temperature = 0.7
kwargs = {
"max_tokens": 100,
}
ingredients = "cauliflower, tomatoes."
instructions_text_appetizer = "I have the following ingredients in my fridge: \n{ingredients}\n\nWhich Italian appetizer can I make for dinner with these ingredients?"
instructions_text_main = "I am planning to make the following appetizer: \n{appetizer}\n\nWhich Italian main course can I make for my dinner?"
instructions_menu_summary = "I am planning the following recipes for my dinner: \nAppetizer: {appetizer}\nMain course: {main_course}\n\nPlease write a menu summary for my dinner."
# instantiate model
model = ChatOpenAI(
model="gpt-3.5-turbo-0125",
openai_api_key=### YOUR KEY HERE,
temperature=temperature,
**kwargs
)
# construct prompts for our calls
prompt_template_appetizer = PromptTemplate(
template = instructions_text_appetizer,
input_variables = ['ingredients'],
)
prompt_template_main = PromptTemplate(
template = instructions_text_main,
input_variables = ['appetizer'],
)
prompt_template_summary = PromptTemplate(
template = instructions_menu_summary,
input_variables = ['appetizer', 'main_course'],
)
# construct sub-chains for each course
appetizer_chain = prompt_template_appetizer | model | StrOutputParser()
main_chain = {"appetizer": appetizer_chain} | prompt_template_main | model | StrOutputParser()
composed_chain = {"appetizer": appetizer_chain, "main_course": main_chain} | prompt_template_summary | model | StrOutputParser()
# actually call the execution of the entire chain
composed_result = composed_chain.invoke({"ingredients": ingredients})
print("Result: ", composed_result)
Excercise 5.1.1.2#
Show code cell content
# set some hyperparameters for the generation
temperature = 0.7
kwargs = {
"max_tokens": 100,
}
ingredients = "cauliflower, tomatoes."
instructions_text_appetizer = "I have the following ingredients in my fridge: \n{ingredients}\n\nWhich Italian appetizer can I make for dinner with these ingredients?"
instructions_text_main = "I am planning to make the following appetizer: \n{appetizer}\n\nWhich Italian main course can I make for my dinner?"
instructions_text_dessert = "I am planning to make the following appetizer: \n{appetizer}\n\nWhich dessert can I make for my dinner?"
instructions_menu_summary = "I am planning the following recipes for my dinner: \nAppetizer: {appetizer}\nMain course: {main_course}\nDessert: {dessert}\n\nPlease write a menu summary for my dinner."
Show code cell content
# load env file with the OpenAI API key
load_dotenv()
# instantiate model
model = ChatOpenAI(
model="gpt-3.5-turbo-0125",
openai_api_key='YOUR_API_KEY',
temperature=temperature,
**kwargs
)
# construct prompts for our calls
prompt_template_appetizer = PromptTemplate(
template = instructions_text_appetizer,
input_variables = ['ingredients'],
)
prompt_template_main = PromptTemplate(
template = instructions_text_main,
input_variables = ['appetizer'],
)
prompt_template_dessert = PromptTemplate(
template = instructions_text_main,
input_variables = ['main_course'],
)
prompt_template_summary = PromptTemplate(
template = instructions_menu_summary,
input_variables = ['appetizer', 'main_course', 'dessert'],
)
# construct sub-chains for each course
appetizer_chain = prompt_template_appetizer | model | StrOutputParser()
main_chain = {"appetizer": appetizer_chain} | prompt_template_main | model | StrOutputParser()
dessert_chain = {"appetizer": appetizer_chain, "main_course": main_chain} | prompt_template_main | model | StrOutputParser()
composed_chain = {"appetizer": appetizer_chain, "main_course": main_chain, "dessert": dessert_chain} | prompt_template_summary | model | StrOutputParser()
# actually call the execution of the entire chain
composed_result = composed_chain.invoke({"ingredients": ingredients})
print("Result: ", composed_result)
Agents#
In the system above, we have decomposed the task of creating a dinner menu into “bite-sized” pieces for LLM calls ourselves; i.e., we have specified the order and the specific prompt for the single calls ourselves. Next, we will try to avoid these steps, and use an agent instead: i.e., we will pass our overall task description to an LLM and let it figure out the necessary substeps on its own. Specifically, we will use a React agent.
# same task with agent
from langchain import hub
from langchain.agents import AgentExecutor
from langchain.agents import create_react_agent
# initialize the backbone model for the agent
llm = ChatOpenAI(
model="gpt-3.5-turbo-0125",
temperature=0.5,
openai_api_key=### YOUR KEY HERE,
)
# Get an example prompt from langchain that was constructed for this agent architecture. you can modify this!
prompt = hub.pull("hwchase17/react")
# inspect the prompt
prompt.template
# load the agent
agent = create_react_agent(llm, tools=[], prompt=prompt)
# initialize the agent
agent_executor = AgentExecutor(agent=agent, tools=[], verbose=True)
# actually call the agent with the same task as above
agent_executor.invoke({"input": "Please help me come up with a three course Italian dinner menu. It should be vegetarian. I have cauliflower and tomatoes in my fridge."})
Exercise 5.1.2: LLM agents
Please look at the code above and try to understand what it does. Relevant information about agents can be found here. Please also try to get an overview of the React architecture (see link above).
What steps does the agent (try to) perform in order to accomplish the task? How does it “know” which steps to do when?
Compare the results to the chain above. Do you observe differences?
The model is trying out different possibilities to get the the right result The answer is, that its experiments are not valid. And it gives itself a suggestion what to try out next. Unfortunatelly the model is caged in a loop of making the same step over and over again:
I should try searching for vegetarian Italian cauliflower and tomato recipes on a popular recipe website like Allrecipes or Food Network
As a result, it is not finding an satisfiying answer. Nonetheless, the model ‘understands’ that the results is not correct. In comparision the first model finds a dinnerplan - but with different ingredients (e.g. tiramiu for dessert) than I have offered.
LangChain agent with tools#
As you might have seen above, the agent instantiation accepts a list of tools (which we have left empty above). The agent tried to make use of the tools above. This time, let us add a tool to agent – specifically, we will provide it with a tool to call the Wikipedia API for real time searches.
from langchain.agents import load_tools
tools = load_tools(["wikipedia"], llm=llm)
print('tools', tools)
# create an agent with tools
agent_with_tools = create_react_agent(llm, tools=tools, prompt=prompt)
# instantiate and call the agent
agent_executor = AgentExecutor(agent=agent_with_tools, tools=tools, verbose=True)
agent_executor.invoke({"input": "Please help me come up with a three course Italian dinner menu. It should be vegetarian. I have cauliflower and tomatoes in my fridge."})
Exercise 5.1.3: LLM agents with tools
Please look at the code above and try to understand what it does. A list of various tools can be found here here.
What steps does the agent (try to) perform in order to accomplish the task? How does it “know” which steps to do when? When does it execute searhces?
Compare the results to the chain above. Do you observe differences?
Is the Wikipedia tool a good choice for the task at hand? What else might we consider?
The model is searching stepwise. In the firt step it searches for the whole information, getting more concrete from step to step. It even focuses on specific ingredients, like a human would do.
The results the model found are vegetarian, using the given ingredients and are italien. Therefore, the results are better than the outputs from the models before.
Wikipedia is interesting for theoretical information about italian food and vegetarian food, but does not provide detailed recipies. The possibilities are rather limited. Therefore, a tool concentrating on recipies might be helpful to get a better variety of recipies. Nonetheless, the results the model found are better than the results of the other models.
For the sake of seeing the top possible performance of agents, we have used OpenAI models. But since they are behind a paywall, we might also want to use open-source models as a backbone for the agent. LangChain also provides integration with many various LLMs, including HuggingFace models that can be used via the HF API endpoint which might not always be available and requires and HuggingFace account, or with a local model loaded via transformers
as we have learned. The latter requires downloading the model; since agent LLM systems require good performance of the backbone LLM, it can rather be tested with large models with at least a few billion parameters (mind their size for downloads!).
In case you do have a HuggingFace account or don’t mind signing up for one, you can optionally test the example below which uses the HF API.
from getpass import getpass
HUGGINGFACEHUB_API_TOKEN = getpass()
# set up huggingface LLM
from langchain_community.llms import HuggingFaceEndpoint
# insert your huggingface API token here (DO NOT PUBLICLY SHARE IT!!!!)
HUGGINGFACEHUB_API_TOKEN = ### YOUR API TOKEN
# define which model to query
repo_id = "mistralai/Mistral-7B-Instruct-v0.2"
llm_hf = HuggingFaceEndpoint(
repo_id=repo_id,
max_new_tokens=128,
temperature=0.5,
huggingfacehub_api_token=HUGGINGFACEHUB_API_TOKEN
)
agent_hf = create_react_agent(llm_hf, tools=[], prompt=prompt)
# initialize the agent
agent_hf_executor = AgentExecutor(agent=agent_hf, tools=[], verbose=True)
# actually call the agent with the same task as above
agent_hf_executor.invoke({"input": "Please help me come up with a three course Italian dinner menu. It should be vegetarian. I have cauliflower and tomatoes in my fridge."})
Output parsing#
One of the core bottlenecks of chaining LLM calls is the potential necessity to process outputs in a specific (structured) way. This page provides an overview of how this can be approached. This is step is key for enabling integration of LLMs into automatic systems where other components depend on outputs of LLMs and usually expect particular input types or formats.
There are also packages / frameworks specialized in interfacing with LLMs and properly parsing their outputs like, e.g., LMQL and this controller framework from Microsoft.
These resources are intended as optional useful information, in case you will explore and build your own agents; you are not expected to have looked at them in detail.
Memory handling#
One of the main issues of agents is that they are by default stateless; i.e., at each step of execution there is no memory of what happened before. This is handled by adding memory components. An overview of this can be found here.
Exercise 5.1.4: Memory
Take a look at the approach for handling message memory above. Recall the generative agnet architecture that was discussed in the lecture. What is the difference between this simple approach and the memory implementation in the generative agents? What are respective (dis)advantages of either approach?