
This is the “no buzz-word primer” on AI Agents you were looking for, I promise!
Table of Contents
Open Table of Contents
What are AI Agents?
In Tech World, “AI Agents” are everywhere. Let’s demystify them by going first principles! Since the field is very evolving, do yourself a favour, instead of sticking to some definition by “influencers” define your own “playbook”.
Having read a lot about AI agents, I really like the Anthropic Team’s definition so let’s borrow from them 🙂.
-
Lets Define Human Agents : Humans have brains to think & break-down problems, memory to recall context while solving problems, and certain “agency or dexterity” to do stuff (hands, legs, etc) using tools like “laptop/mouse,etc” in digital world and “machines” in physical world. Essentially, take a problem, break it down, do something, take feedback and repeat!
-
AI Agents : Because LLMs are incredibly smart in “thinking and breaking down problems”, with little help from Human Masters (pun intended!) they can mimick “Human Agents”. That is, LLM based Agents, can also use LLM (their brain) to breakdown problems, use “tools provided” to do stuff in the digital/physical world, receive feedback and keep iterating till the task is complete! This paradigm of new “intelligent human like AI applications” that is evolving is called “AI Agents”.
But just like humans not all AI agents are the same! Agents evolve from simple ones with limited autonomy to highly autonomous agents that accomplish a much harder human level of task!
-
Single LLM Features: Simple app features enhanced with LLM calls. Example: LLM summarizing yesterday’s WhatsApp messages instead of reading them manually. Easy to build but limited in complexity - just fire & forget with no memory.
-
Workflow based Agents: AI that handles complete digital workflows with predefined steps. Example: Given a presentation topic, the agent researches → creates outline → generates content → adds visuals → provides speaker notes. Each step uses multiple LLM calls in a clear sequence. Gamma.app is a great example.
-
Highly Autonomous Agents: Most people believe these are the true agents! They need only a high-level task and available tools. They break down problems, work iteratively, and adapt their approach autonomously. Magical when they work, but come with higher costs, latency, and unpredictability. Examples: Claude Code, Deep Research agents that handle complex tasks without predefined workflows or orchestration.
Agent Primitives
Just like traditional software is composed of frontend, backend, APIs, etc, Agents are composed of the following building blocks:
-
LLMs with reasoning & tool calling: LLMs which can reason (to breakdown tasks), can follow instructions, and have tool calling capability (i.e. can call tools available to them at will !!). Most state-of-the-art LLMs have these capabilities!
-
Tool System: This is where application developers come into picture! Look at the your agent think of the “capabilities” you want to provide the LLM (“your brain”) to accomplish any kind of tasks they may receive!
-
Memory & State Management: Just like humans! To work on long horizon tasks one needs to use their past learnings, constantly update their memory and utilise it when needed!
-
Control & Reliability: These are the guardrails, constraints, exit conditions you as an app developer have to set, so that Agents don’t go into a zone and just burn time and tokens! With autonomous agents it becomes a very important task!
-
Evals: Agents are magic when work! But bringing reliability is hard in such autonomous tasks as they can take any path in production! Hence, evals to bring reliability is really important!
Let’s build a Github PR Orchestrator Agent!
As a PM, I brainstorm with ChatGPT/Claude for PRDs, then build with Claude Code
. For this project, I explored OpenAI Agents SDK
: it provides core agent primitives (instructions, tool definitions, session management) with customization flexibility. Most major LLM providers now offer similar SDKs: Anthropic’s Claude Code SDK
, Google’s Agent Development Kit
.
Following was the scope of this agent and prd.md
file:
# Github PR Orchestrator Agent - PRD
## Introduction
I am building a github PR Orchestrator AI Agent. Even-driven agent that reviews PRs in a fixed workflow: summarise, assess risks, and annotate.
**Features**:
- Concise PR summary with impacted files.
- Risk detection + reviewer checklist.
- Inline annotations (≤50 per batch) posted to GitHub Checks.
## High Level Implementation Plan
1. **Trigger**: Create a **GitHub Actions** workflow on `pull_request` (`opened|synchronize|reopened`).
2. Add `OPENAI_API_KEY` to secrets
3. **Fetch context**: In the job, call GitHub REST to **get PR details + list changed files** (and/or fetch the diff).
4. **Install SDK**: Install the **OpenAI Agents SDK** in the job. (We'll use **Agents**, **Sessions** for per-PR state, and **Guardrails** for output validation.)
5. **Define agent**: "Reviewer" agent with tools: `gh_get_pr`, `gh_list_files`, optional `gh_get_diff`. Output schema: `{summary, checklist[], annotations[]}`.
6. **Run**: Pass files/diff to the agent; produce summary, risk checklist, and normalized annotations (`path`, `start_line`, `end_line`, `level`, `message`).
7. Post results:
- **No-server path (easiest):** Emit **workflow annotations** via Actions **workflow commands** (`::notice|::warning|::error file=...,line=...::message`) so they show inline on the PR. Also write the summary to the job summary.
8. **Guard & limits**: Enforce max annotations per run, validate JSON before posting, and retry on API 429/5xx.
Let’s follow these simple steps to create the Agent:
Step 1: Create Agent with instructions
# Agent instructions
instructions = """
You are a senior code reviewer AI agent specializing in pull request analysis.
Your responsibilities:
1. Use get_pr_details() to understand the PR context
2. Use analyze_file_changes() to examine what files were modified
3. Use security_scan() to check for security risks
4. Use performance_check() to identify performance concerns
5. Generate a structured review with summary, checklist, and annotations
Always be constructive, specific, and helpful in your feedback.
Focus on code quality, security, performance, and maintainability.
Provide your final review in this exact format:
SUMMARY: [2-3 sentence summary of the PR]
RISK_SCORE: [number 1-10]
CHECKLIST:
- [item 1 with priority: high/medium/low]
- [item 2 with priority: high/medium/low]
ANNOTATIONS:
- [filename]:[line]: [level]: [message]
- [filename]:[line]: [level]: [message]
"""
Step 2: Provide Tool Definitions
Define tool definitions for analyzing file changes, security scan, performance scan, getting PR metadata, etc. These tool definitions are provided LLMs so that as per the task, based on their instructions they can call these tools autonomously and act on their output!
@function_tool
def analyze_file_changes() -> List[Dict[str, Any]]:
"""
Tool: Analyze individual file changes
"""
global FILES_DATA
analyzed_files = []
for file_data in FILES_DATA[:20]: # Limit to 20 files max
file_info = {
"filename": file_data.get("filename", ""),
"status": file_data.get("status", ""),
"additions": file_data.get("additions", 0),
"deletions": file_data.get("deletions", 0),
"extension": get_file_extension(file_data.get("filename", "")),
"has_tests": "test" in file_data.get("filename", "").lower(),
"patch_preview": file_data.get("patch", "")[:800] # First 800 chars of diff
}
analyzed_files.append(file_info)
return analyzed_files
@function_tool
def security_scan() -> Dict[str, Any]:
"""
Tool: Basic security analysis of changes
"""
global FILES_DATA
security_issues = []
risk_indicators = []
for file_data in FILES_DATA:
filename = file_data.get("filename", "")
patch = file_data.get("patch", "")
# Check for common security patterns
if any(pattern in patch.lower() for pattern in ["password", "secret", "api_key", "token"]):
security_issues.append(f"Potential sensitive data in {filename}")
if ".env" in filename or "config" in filename:
risk_indicators.append(f"Configuration file modified: {filename}")
if any(pattern in patch for pattern in ["eval(", "exec(", "subprocess", "os.system"]):
security_issues.append(f"Potentially dangerous code execution in {filename}")
return {
"security_issues": security_issues,
"risk_indicators": risk_indicators,
"risk_level": "high" if security_issues else "medium" if risk_indicators else "low"
}
Step 3: Review incoming PR with the Agent
# Create the agent with tools
self.agent = Agent(
name="PR-Reviewer",
instructions=instructions,
tools=[get_pr_details, analyze_file_changes, security_scan, performance_check]
)
# Lastly, Review the incoming PR with Github Orchestrator Agent
def review_pr(self, pr_data: Dict[str, Any], files_data: List[Dict[str, Any]]) -> PRReviewOutput:
"""
Main method: Review the PR using the agent
"""
global PR_DATA, FILES_DATA
PR_DATA = pr_data
FILES_DATA = files_data
try:
# Create the review prompt
review_prompt = f"""
Please review this Pull Request thoroughly:
The PR is titled: "{pr_data.get('title', 'Unknown')}"
Author: {pr_data.get('user', {}).get('login', 'unknown')}
Files changed: {len(files_data)}
Use your available tools to:
1. Get detailed PR information
2. Analyze the file changes
3. Run security scan
4. Check performance implications
Then provide a comprehensive review following the exact format specified in your instructions.
"""
# Agent processes the request using its tools
print("🤖 Agent analyzing PR with tools...")
result = Runner.run_sync(self.agent, review_prompt)
print("✅ Agent completed analysis")
response_text = result.final_output
# Parse and validate the response
return self._parse_agent_response(response_text, files_data)
The rest is just glue code i.e. parsing, debugging, etc. That’s it! Yes, the LLM does most of the magic. But here’s the catch: building a demo is 10X easier than building a reliable production agent that works ≥80% of the time. Users want magic that works consistently, not one time demos. That’s where real value is created.
Conclusion
In this post, I just wanted to dip my toe into building an agent whether “workflow based agent” or “autonomous agent” albeit something very simple! Also, wanted to explore OpenAI Agent SDK! (Confession) Claude Code wrote most of the implementation. For straightforward agents like this PR reviewer, today’s coding agents are remarkably capable.