- Dev Dispatch
- Posts
- Building An AI Agent
Building An AI Agent
building an ai agent to search google and summarize the results
Building an ai agent
I created my first AI Agent!
It can search google and summarize the results.
I wanted to see for myself what it looks like to build an agent that actually goes out into the world and does something while interacting with other normal human level components like a web browser. I’ve seen dozens of startups doing agent stuff. It all seems like BS. Do these agents accomplish tasks yet? can they interface with other software yet? Didn’t seem like it from my tests so I wanted to dig in for myself to see what is real and what is possible:
The Task:
1. opens google
2. parses the DOM to find the right elements to fill out and the right buttons to click
3. executes the javascript
4. gets the search results and summarizes them.
The Code:
It’s 100 lines of javascript. And it is probably more robust than hard coding dom elements. If google changed its code or selectors tomorrow, it should still work. That’s also kind of the point, I want a solution that is not as brittle as normal web scraping solutions that rely on x-paths or class names to navigate.
This was not as easy as I thought it would be. You need playwright or some browser. You have to use gpt-4 (3.5 didn't cut it. even llama3 didn't cut it). Yes there is an api that can do this but the exercise was to get an “agent” to navigate a browser like a person to accomplish a task.
In terms of “planning” I sort of punted on that. This agent doesn’t plan. I don’t want it to. It’s only supposed to do one thing and that is really to give me which javascript code to execute so that it will fill out the form and click a button. But still, it’s something!
And it works … 8/10 times.