If you're like me, one thing you hate doing is booking a flight. It's a uniquely personal thing with a lot of preferences and criteria. The reality is, no search interface really gets it right. I want to leave in the morning, take a direct flight, not on a Boeing, and not with United (both because I live closer to JFK, but also... ew, United). After toggling through an endless sea of options, you finally find the flight you want, 20 minutes later.
I am convinced this is a key reason why travel agents still exist. There is something nice about offloading this task to a human to manage. That got me thinking: LLMs are all the rage. Can I build a simple (and functional) flight search agent? The answer is yes, but it's not without its challenges.
And so begins a tale of Janet - the flight search agent.
The Approach
The initial approach was simple: create an agent that basically executed a search to a flight API, then took the results and sent them back to the user. Here is a little diagram:
To give it more color, the idea was:
The user would input what they want: “Give me all Delta flights leaving tomorrow at 10pm to SFO.”
That would be passed to the Agent/LLM. The LLM would have a tool defined in the prompt of “Find Flights” that takes date, time, and airport and executes an API call. It would also have a “respond” command to write to the user. This enabled the agent to know if it should keep executing its internal loop or respond to the user.
When the agent saw “Find Flights,” it would execute the API search with the parameters, and then the results were sent back to the LLM.
The LLM would then process the results, and ideally respond with a “respond” with a nicely synthesized result to the user. (As a note, this loop of internal tasks is called chaining.)
The entire search history was passed with each invocation for continued filtering features.
Regarding the actual archtiecture, I decided to use a quantized LLAMA-2-13 billion model from Hugging Face and Langchain/LlamaCpp for the toolkit. (For flight search, I used a cheap (so read slow) API called flightapi.io.)
Janet V.0 - Flights on Acid.
With an approach in hand, I started coding. I set up the LLM, the prompts, and started testing. It quickly became very clear that this approach, as it was, was not going to work. There were a few main issues - a big one being Hallucination (i.e., making stuff up):
The LLM would hallucinate commands it could send to find flights or write command invocations that were not in an expected format, meaning the API bridge would fail to execute a meaningful flight search.
When the API search did work, the LLM would often hallucinate details about the flights (non-direct flights became magically direct, etc.).
The response format was not super useful as an end-user - details were either removed or just generally text was not a great format.
This led to a thought:
What if I reduced the scope of the LLM to just some key functions, essentially removing its need to actually understand the flight data itself? Basically having it focus on:
Understanding what the user is asking and translating that into a deterministic set of API calls.
Generating basically language fluff to make it feel more organic.
This solved the latter two issues, but how can I force the model to respond in a more structured manner that can be sent to an API?
Context Free Grammars - the savior
So taking a step back - for the uninitiated - LLMs are actually just giant probability machines. No, literally, that is what they do. Given a set of tokens (chunks of words), they predict what the next likely token is. And in reality, they actually return probabilities of a given token - then the model just picks the highest probability one.
A simple example: say you input: “I want to walk the.” The model would process that then return something like ["Dog": 90%, "Cat": 80%, "Octopus": 5%].
Obviously, it's more complex than that in practice (there are lots of rules around what gets picked, etc.) - but that is the gist. The question becomes, is there a way I can change the probabilities to give me something more in line with what I want?
The answer is yes - introducing Context-Free Grammars. What a CFG (or formal grammar ) is - is a set of rules of what is syntactically correct in a language. It does not describe the meaning, just literally the syntax. For instance, they are used heavily in things like Linters to ensure your code matches the rules of the language you're coding in.
Returning to our LLM issue - what if we can change what token we pick? Instead of picking the highest probability one, we pick the one that is the highest probability and matches a CFG. Meaning we can tweak the LLM's behavior to follow a strict language. This way, you can get the LLM to generate syntactically correct JSON, for instance, or follow specific rules around how to generate search parameters. This solves our first problem above.
Luckly for us LlamaCPP now supports this:
(https://github.com/ggerganov/llama.cpp/pull/1773) and has a handly grammer definition language (GBNF) https://github.com/ggerganov/llama.cpp/tree/master/grammars?ref=imaurer.com
Janet V1 - A usuable search agent:
Armed with this information, I revised my approach in a few key ways:
First, I updated the prompt so it had some very explicit commands:
find_flights: which took a set of dates and locations to look for flights
filter_flights: which would filter flights based on a set of criteria
write: which would write something to the user.
In addition, I forced the entire conversation to follow a JSON format that would result in chats that looked like:
User: Hey how are you today?
Assistant:
{actions: [
{"action": "write",
"action_input": ["I'm good thanks, how are you?"]}]}
User: I'm great, can you find me a flight to SFO from NYC?
Assistant:
{ actions: [
{"action": "write",
"action_input": "Sure thing here are flights to SFO from NYC?"},
{"action": "find_flights",
"action_input":
[{"date": "2024-06-07", "source": "JFK", "destination": "SFO"},
{"date": "2024-06-10", "source": "SFO", "destination": "JFK"}]}]}
User: can you filter the leaving flight by anything less than 100 dollars?
Assistant:
{actions:[
{"action": "write",
"action_input": "Sure thing here are flights less than 100 dollars?"},
{"action": "filter_flights", "action_input":"cost < 100"}]}
User: can you show me the flights for Delta?
Assistant:{actions:[
{"action": "write",
"action_input": "Sure thing here are flights from Delta?"},
{"action": "filter_flights",
"action_input":"carrier = "delta"}]}
Then, once the prompt was done, I designed a grammar that forced the responses to match that format. Finally, I updated the agent logic so the flight data would come back to the end user, and cleaned up the UI.
I ran it and….IT WORKED! The major problems were gone; the bot no longer made up flights (yay), and it actually gave me real flights that matched my preferences. Janet V1 was born.
Janet VBeyond
Where this was a 2-day experiment, it showed just how powerful LLMs can be when leveraged properly and kept to guardrails. The ability to enforce format and break apart tasks gives us powerful tools we can use to create more organic discovery interfaces. And I think that is the key takeaway:
Do not assume the LLM can do everything - it is a tool in the toolbox that can be used to craft new and interesting experiences, but it's not a silver bullet.
If I were to take this to scale, I think there are several things you can do to further drive usefulness and scale:
Evaluate Meta Models - i.e., leveraging multiple specialized LLMs in the main agent loop to do specific travel tasks (flight booking, itinerary adjusting, etc.). An interesting paper on this topic: https://arxiv.org/pdf/2403.13313.pdf
Find a better flight API - that API was good enough for my search, but very limited and slow. Doing this at scale would require a better flight API and system.