Designing Autonomous AI Agents with LangChain and Tool Calling

Architecture patterns for building production-ready AI agents: tool design, memory strategies, ReAct reasoning loops, and guardrails that actually work.

What Makes an Agent Different From a Chatbot

A chatbot responds. An agent acts. The difference is tool use — agents can search the web, query databases, call APIs, write files, and chain those actions together to accomplish multi-step goals without human intervention at each step.

Building agents that are reliable in production requires careful thought about tool design, state management, and failure modes.

The ReAct Loop

Most production agents use a Reason-Act loop:

Thought:  What do I need to do to answer this question?
Action:   Call tool X with parameters Y
Observation: Tool returned Z
Thought:  Given Z, I now know... Next I need to...
Action:   Call tool A with parameters B
...
Final Answer: [response to user]

LangChain's AgentExecutor implements this loop. The key is designing tools that return structured, informative observations — not raw API responses.

Step 1: Define Tools as Typed Functions

from langchain.tools import tool
from pydantic import BaseModel, Field

class SearchInput(BaseModel):
    query: str = Field(description="The search query to look up")
    max_results: int = Field(default=5, description="Number of results to return")

@tool("search_knowledge_base", args_schema=SearchInput)
def search_knowledge_base(query: str, max_results: int = 5) -> str:
    """
    Search the company knowledge base for relevant documents.
    Use this when the user asks about internal processes, policies, or product details.
    """
    results = vector_store.similarity_search(query, k=max_results)
    if not results:
        return "No relevant documents found for this query."

    return "\n\n".join([
        f"[Source: {doc.metadata['source']}]\n{doc.page_content}"
        for doc in results
    ])

Critical rule: The docstring IS the tool description the LLM reads. Write it as if explaining to a smart colleague when to use this tool — not a human using the function.

Step 2: Build the Agent

from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

llm = ChatOpenAI(model="gpt-4o", temperature=0)

tools = [
    search_knowledge_base,
    query_database,
    send_notification,
]

system_prompt = """You are an operations assistant for AdvancedDataScience Solutions.
You help engineers investigate incidents and answer questions about our systems.

Guidelines:
- Always search the knowledge base before answering technical questions
- For database queries, only use SELECT statements — never mutate data
- If you cannot find a definitive answer, say so clearly
- Cite your sources when providing information from the knowledge base
"""

prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    MessagesPlaceholder(variable_name="chat_history"),
    ("human", "{input}"),
    MessagesPlaceholder(variable_name="agent_scratchpad"),
])

agent = create_openai_tools_agent(llm, tools, prompt)
agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True,
    max_iterations=10,
    handle_parsing_errors=True,
)

Step 3: Memory Strategies

Short-term (conversation) memory: Use ConversationBufferWindowMemory with a window of the last 10 exchanges. Beyond that, summarize with an LLM call.

from langchain.memory import ConversationSummaryBufferMemory

memory = ConversationSummaryBufferMemory(
    llm=llm,
    max_token_limit=2000,
    memory_key="chat_history",
    return_messages=True,
)

Long-term memory: Store important facts in a vector store, tagged by session and entity. Retrieve relevant memories at the start of each conversation.

async def build_context(user_id: str, query: str) -> str:
    memories = await memory_store.search(
        query=query,
        filter={"user_id": user_id},
        k=3
    )
    if not memories:
        return ""

    return "Relevant context from previous conversations:\n" + \
           "\n".join([m.content for m in memories])

Step 4: Guardrails

Production agents need hard constraints. The LLM cannot enforce these — your application code must.

ALLOWED_DB_OPERATIONS = {"SELECT"}
MAX_QUERY_ROWS = 1000

def safe_database_query(sql: str) -> str:
    # Parse and validate before execution
    parsed = sqlparse.parse(sql)[0]
    statement_type = parsed.get_type()

    if statement_type not in ALLOWED_DB_OPERATIONS:
        return f"Error: Only SELECT queries are permitted. Got: {statement_type}"

    results = db.execute(sql).fetchmany(MAX_QUERY_ROWS)
    return format_results(results)

Never let an agent touch production data with write permissions. Use read-only database replicas for agent tool connections.

What to Watch in Production

Tool call latency p95 — slow tools destroy agent UX
Max iterations hit — agents that loop are usually stuck; log and alert
Tool error rate — a flaky tool will make your agent look unreliable
Token usage per session — context windows fill up; monitor costs

Build your agents incrementally: start with 2 tools, get them reliable, then add more.