What are AI Agents: From Basics to Implementation
In the rapidly evolving landscape of artificial intelligence, terms like “AI agents,” “agentic capabilities,” and “RAG” seem to be everywhere. Yet most explanations tend to be either too technical for the average user or too simplified to be useful. This guide bridges that gap, offering a clear, practical understanding of AI agents for those who regularly use AI tools but lack technical expertise.
The Simple Path to Understanding AI Agents
Our journey follows a straightforward three-level progression that builds on concepts you likely already understand. We’ll start with familiar large language models (LLMs), move to AI workflows, and finally explore true AI agents—all through practical examples you’ll encounter in everyday use.
Check this explaination video by Jeff Su
Video Source: Jeff Su’s YouTube Channel
Level 1: Large Language Models – The Foundation
Popular AI chatbots like ChatGPT, Google Gemini, and Claude are built on large language models. These sophisticated systems excel at generating and editing text based on the inputs you provide.
The process is straightforward: you (the human) provide a prompt, and the LLM produces an output based on its training data. For instance, if you ask ChatGPT to draft an email requesting a coffee meeting, your prompt serves as the input, and the resulting email—perhaps more polite than you’d typically write—becomes the output.
However, LLMs have significant limitations. If you were to ask ChatGPT when your next coffee chat is scheduled, it would fail to provide an accurate answer because it lacks access to your calendar. This highlights two crucial characteristics of large language models:
- Knowledge limitations: Despite being trained on vast datasets, LLMs have no access to your personal information or company-specific data.
- Passive nature: LLMs wait for your prompt before responding; they don’t take independent action.
These limitations are important to understand as we progress to more complex systems.
Level 2: AI Workflows – Adding External Capabilities
Building on our example, what if we instructed the LLM, “Every time I ask about a personal event, search my Google Calendar before responding”? With this logic implemented, asking “When is my coffee chat with Elon?” would yield the correct answer because the LLM would first check your calendar.
This approach fails, however, when your follow-up question is “What will the weather be like that day?” The LLM would still search your Google Calendar—following the predefined path—but would find no weather information there.
This illustrates the fundamental nature of AI workflows: they follow predetermined paths set by humans. This predefined path is technically called “control logic.”
We could extend this workflow by adding more steps, such as accessing weather information via an API and using text-to-speech to deliver the answer. However, no matter how many steps we add, this remains an AI workflow as long as a human serves as the decision-maker.
A real-world example might involve:
- Compiling news article links in Google Sheets
- Using Perplexity to summarise those articles
- Asking Claude to draft LinkedIn and Instagram posts
- Scheduling this sequence to run automatically each day
If the output doesn’t meet your expectations—perhaps the LinkedIn post isn’t humorous enough—you would need to manually adjust the prompt. This human intervention in the decision-making process keeps this firmly in the realm of AI workflows.
Technical note: Retrieval augmented generation (RAG) is simply a type of AI workflow where models look things up before answering—like accessing your calendar or a weather service.
Level 3: AI Agents – Autonomous Decision-Making
The transformation from an AI workflow to an AI agent occurs when the human decision-maker is replaced by an LLM. In our social media example, this would mean the LLM must:
- Reason: Determine the best approach to compile news articles, decide whether to summarise them, and plan how to create engaging social posts.
- Act: Use tools like Google Sheets, Perplexity, and Claude to execute the planned steps.
- Iterate: Evaluate the output and make improvements without human intervention.
When creating social media posts from news articles, a human decision-maker would reason about the best approach (compile articles, summarise them, draft posts) and then take action using appropriate tools. An AI agent performs these same reasoning and action steps autonomously.
The most common configuration for AI agents is the ReAct framework (Reasoning + Acting). All AI agents must reason about the problem and then act using available tools.
Unlike workflows, AI agents can also iterate independently. In our example, rather than a human manually rewriting prompts to make content more engaging, an AI agent could add another step to critique its own output based on best practices, then revise and improve through multiple cycles until it meets the quality criteria.
Real-World AI Agent Example
A demonstrative example comes from a website created by Andrew Ng that shows how an AI vision agent works. When you search for a term like “skier,” the AI agent:
- Reasons about what a skier looks like (a person on skis moving through snow)
- Acts by examining video footage to identify matching clips
- Returns relevant clips to the user
While this may seem straightforward, consider that the AI agent performed all these tasks instead of humans having to manually review footage and add tags like “skier,” “mountain,” “snow,” etc.
The technical complexity remains hidden behind a simple interface, allowing average users to benefit from AI agents without understanding the underlying mechanisms.
The Three Levels Visualised
To summarise the progression we’ve covered:
Level 1 (LLMs):
- You provide an input
- The LLM responds with an output based on its training
Level 2 (AI Workflows):
- You provide an input
- The LLM follows a predefined path that may involve retrieving information from external tools
- A human programs and adjusts the path as needed
Level 3 (AI Agents):
- The AI receives a goal
- The LLM performs reasoning to determine the best approach
- It takes action using available tools
- It observes interim results and decides whether iterations are needed
- It produces a final output that achieves the initial goal
The critical distinction is that in AI agents, the LLM serves as the decision-maker in the workflow.
Practical Applications for Non-Technical Users
Even without technical expertise, understanding these distinctions helps you:
- Set realistic expectations: Know what your current AI tools can and cannot do
- Identify opportunities: Recognise processes in your work that could benefit from AI workflows or agents
- Communicate effectively: Speak confidently with technical teams about AI implementations
- Plan for the future: Prepare for the increasing role of AI agents in everyday work
Getting Started with AI Agents
If you’re intrigued by the potential of AI agents, consider these starting points:
- Experiment with simple AI workflows using accessible platforms like make.com
- Document decision-making processes in your work that could potentially be handled by AI agents
- Learn about one tool that enables agent creation (such as LangChain or AutoGPT)
- Start small by creating a basic agent that handles a specific, well-defined task
Conclusion
AI agents represent a significant evolution from passive language models and predefined workflows. By understanding the fundamental differences between these technologies, you’re better equipped to leverage their capabilities in your personal and professional life.
The key takeaway is that true AI agents can reason, act, and iterate independently—replacing human decision-making in specific processes while freeing you to focus on more creative and strategic tasks. As these technologies become more accessible, those who understand how to implement them effectively will have a distinct advantage in an increasingly AI-enhanced world.
Whether you’re looking to automate repetitive tasks, enhance your productivity, or simply stay informed about emerging technologies, understanding AI agents is becoming an essential skill for the modern professional.