Building an enterprise AI Agent: How to combine Box, MongoDB Atlas, OpenAI, and LangChain for intelligent document search

|
Share
Building an enterprise AI Agent: How to combine Box, MongoDB Atlas, OpenAI, and LangChain for intelligent document search

Organizations are drowning in documents scattered across various platforms. What if you could build an AI agent that not only searches through your enterprise documents but actually understands their content and provides intelligent, conversational responses?

This guide walks you through building exactly that — a sophisticated AI agent that combines the power of Box for document storage, MongoDB Atlas for vector search, OpenAI for language understanding, and LangChain for orchestration.

What we’re building

By the end of this tutorial, you’ll have created an enterprise-grade AI system that can:

  • Intelligently search through documents stored in Box using semantic understanding
  • Provide conversational responses with proper source attribution
  • Remember context across multiple interactions

Our system will process financial earnings reports and answer complex questions like “What are the biggest challenges facing tech companies?” with detailed, source-backed responses.

The technology stack

Box: Intelligent content management

Box serves as our secure document repository. Unlike simple file storage, Box provides enterprise-grade security, version control, and API access that makes it perfect for business applications. We’ll use Box’s Client Credentials Grant (CCG) authentication for seamless, secure access to documents.

MongoDB Atlas: Vector search

MongoDB Atlas acts as our intelligent search engine. Its vector search capabilities allow us to find documents based on meaning, not just keywords. Atlas handles the complex mathematics of embedding similarity while providing the scalability enterprises need.

OpenAI: Language understanding

OpenAI’s models power both our document understanding (through embeddings) and response generation. The combination of embeddings for search and GPT-4 for responses creates a system that truly understands content.

LangChain: The orchestration layer

LangChain ties everything together, providing tools for document processing, agent creation, and workflow management. Its LangGraph framework enables sophisticated multi-step reasoning and tool usage.

Setting up the foundation

Let’s start by establishing connections to all our services. This foundation ensures our AI agent can access documents, store embeddings, and generate intelligent responses.

Installing Dependencies

First, create a new Python project and install the required packages:

# Using uv for fast dependency management
uv sync

Environment Configuration

Create a .env file with your service credentials:

# Box Enterprise Configuration
BOX_CLIENT_ID=your_box_client_id
BOX_CLIENT_SECRET=your_box_client_secret  
BOX_SUBJECT_ID=your_box_user_id

# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key

# MongoDB Atlas Configuration
MONGODB_URI=mongodb+srv://username:[email protected]/

Establishing Box connection

Box’s CCG authentication provides secure, server-to-server communication perfect for enterprise applications. Here’s how we establish the connection in our a_init_data.py:

from box_sdk_gen import (
    BoxCCGAuth,
    CCGConfig,
    BoxClient,
    FileTokenStorage,
    BoxAPIError,
)

def get_box_client() -> BoxClient:
    """
    Initialize and return a Box client using the Box CCG Auth.
    """
    client_id = os.getenv("BOX_CLIENT_ID")
    client_secret = os.getenv("BOX_CLIENT_SECRET")
    user_id = os.getenv("BOX_SUBJECT_ID")

    # Create a BoxCCGConfig instance
    box_config = CCGConfig(
        client_id=client_id,
        client_secret=client_secret,
        user_id=user_id,
        token_storage=FileTokenStorage(".ccg.db"),
    )
    # Create a BoxCCGAuth instance
    box_auth = BoxCCGAuth(box_config)
    # Create a BoxClient instance
    return BoxClient(box_auth)

This approach ensures your application can access Box resources securely without requiring user login prompts — essential for automated enterprise workflows.

Document ingestion and processing

Now we’ll create a pipeline that uploads documents to Box, processes them with LangChain, and prepares them for intelligent search.

Uploading documents to Box

Box serves as our secure document repository. Here’s how our upload_sample_data function handles document uploads with proper error handling:

from box_sdk_gen import (
    CreateFolderParent,
    UploadFileAttributes,
    UploadFileAttributesParentField,
)

def upload_sample_data(
    box_client: BoxClient,
    parent_folder_id: str = DEMO_FOLDER_PARENT_ID,
    local_folder_path: str = SAMPLE_DATA_LOCAL_PATH,
) -> str:
    try:
        box_folder = box_client.folders.create_folder(
            name=os.path.basename(local_folder_path),
            parent=CreateFolderParent(id=parent_folder_id),
        )
    except BoxAPIError as e:
        if e.response_info.body["status"] == 409:
            # Folder already exists, get its ID
            box_folder = box_client.folders.get_folder_by_id(
                e.response_info.body["context_info"]["conflicts"][0]["id"]
            )
    print(f"Created folder: {box_folder.name} ({box_folder.id})")
    # Upload files to the new folder
    local_folder_path = os.path.abspath(local_folder_path)
    for root, _, files in os.walk(local_folder_path):
        for file_name in files:
            file_path = os.path.join(root, file_name)
            parent = UploadFileAttributesParentField(id=box_folder.id, type="folder")
            file_attributes = UploadFileAttributes(
                name=file_name,
                parent=parent,
            )
            with open(file_path, "rb") as file_stream:
                try:
                    box_file = box_client.uploads.upload_file(
                        attributes=file_attributes, file=file_stream
                    ).entries[0]
                    print(f"Uploaded file: {box_file.name} ({box_file.id})")
                except BoxAPIError as e:
                    if e.response_info.body["status"] == 409:
                        print(
                            f"File already exists: {file_name} ({e.response_info.context_info['conflicts']['id']})"
                        )
    return box_folder.id

Processing documents with LangChain

LangChain’s BoxLoader seamlessly integrates with Box to extract and process document content. Our load_documents_from_box function demonstrates this integration:

from langchain_box.document_loaders import BoxLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

def load_documents_from_box(
    box_client: BoxClient,
    box_folder_id: str,
) -> List[Document]:
    auth_token = box_client.auth.retrieve_token().access_token
    loader = BoxLoader(
        box_developer_token=auth_token,
        box_folder_id=box_folder_id,  # type: ignore
    )
    data = loader.load()

    # Split documents into chunks
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=20)
    return text_splitter.split_documents(data)

The chunking strategy is crucial — our configuration uses 200-character chunks with 20-character overlap, optimizing for precise search while maintaining context.

Creating the MongoDB Atlas vector search infrastructure

MongoDB Atlas transforms our document chunks into a searchable knowledge base using vector embeddings — mathematical representations that capture semantic meaning.

Setting up vector storage

Our create_vector_index function creates the complete vector search infrastructure:

from langchain_mongodb import MongoDBAtlasVectorSearch
from langchain_openai import OpenAIEmbeddings
from pymongo import MongoClient

def create_vector_index(docs: List[Document]) -> None:
    # drop collection if it exists
    client = MongoClient(MONGODB_URI)
    db = client["langchain_db"]
    if "earnings_reports" in db.list_collection_names():
        db["earnings_reports"].drop()
    # Initialize the MongoDB Atlas vector search index
    embedding_model = OpenAIEmbeddings()
    # Instantiate vector store
    vector_store = MongoDBAtlasVectorSearch.from_connection_string(
        connection_string=MONGODB_URI,
        namespace="langchain_db.earnings_reports",
        embedding=embedding_model,
        index_name="vector_index",
    )
    # Add data to the vector store
    vector_store.add_documents(docs)
    # Use helper method to create the vector search index
    vector_store.create_vector_search_index(dimensions=1536)

We also create a full-text search index for exact matches:

from langchain_mongodb.index import create_fulltext_search_index

def create_search_index() -> None:
    # Connect to your cluster
    client = MongoClient(MONGODB_URI)
    # Use helper method to create the search index
    create_fulltext_search_index(
        collection=client["langchain_db"]["earnings_reports"],
        field="text",
        index_name="search_index",
    )

Understanding vector search

Vector search works by converting text into high-dimensional mathematical representations (embeddings) that capture semantic meaning. When you search for “company challenges,” the system finds documents about “business obstacles” or “corporate difficulties” even if those exact words aren’t present.

Building intelligent search tools

Now we’ll create specialized tools that our AI agent can use to search through documents intelligently.

Vector search tool

This tool finds documents based on semantic similarity:

from langchain.agents import tool
from langchain_mongodb import MongoDBAtlasVectorSearch
from langchain_openai import OpenAIEmbeddings

@tool
def vector_search(user_query: str) -> str:
    """
    Retrieve information using vector search to answer a user query.
    """
    # Instantiate the vector store
    vector_store = MongoDBAtlasVectorSearch.from_connection_string(
        connection_string=MONGODB_URI,
        namespace="langchain_db.earnings_reports",
        embedding=OpenAIEmbeddings(),
        index_name="vector_index",  # Name of the vector index
    )
    retriever = vector_store.as_retriever(
        search_type="similarity",
        search_kwargs={"k": 5},  # Retrieve top 5 most similar documents
    )
    results = retriever.invoke(user_query)
    # Concatenate the results into a string
    context = "\n\n".join(
        [f"{doc.metadata['title']}: {doc.page_content}" for doc in results]
    )
    return context

Full-Text search tool

This tool finds exact text matches for precise queries:

from langchain_mongodb.retrievers.full_text_search import (
    MongoDBAtlasFullTextSearchRetriever,
)

@tool
def full_text_search(user_query: str) -> dict:
    """
    Retrieve movie plot content based on the provided title.
    """
    client = MongoClient(MONGODB_URI)
    collection = client["langchain_db"]["earnings_reports"]
    # Initialize the retriever
    retriever = MongoDBAtlasFullTextSearchRetriever(
        collection=collection,  # MongoDB Collection in Atlas
        search_field="text",  # Name of the field to search
        search_index_name="search_index",  # Name of the search index
        top_k=1,  # Number of top results to return
    )
    results = retriever.invoke(user_query)
    for doc in results:
        if doc:
            del doc.metadata["embedding"]
            doc.metadata["page_content"] = doc.page_content
            return doc.metadata
        else:
            return "Document not found"

Creating the LangChain agent

The agent serves as the brain of our system, deciding which tools to use and how to combine their results into coherent responses.

Agent configuration

from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_openai import ChatOpenAI
from langchain_core.tools import BaseTool

def get_llm_with_tools():
    # Initialize the LLM
    llm = ChatOpenAI(model="gpt-4o")
    # Create a chat prompt template for the agent, which includes a system prompt and a placeholder for `messages`
    prompt = ChatPromptTemplate.from_messages(
        [
            (
                "You are a helpful AI agent."
                " You are provided with tools to answer questions about tech companies earnings."
                " Think step-by-step and use these tools to get the information required to answer the user query."
                " Do not re-run tools unless absolutely necessary."
                " If you are not able to get enough information using the tools, reply with I DON'T KNOW."
                " You have access to the following tools: {tool_names}."
            ),
            MessagesPlaceholder(variable_name="messages"),
        ]
    )
    tools = [vector_search, full_text_search]
    # Provide the tool names to the prompt
    prompt = prompt.partial(tool_names=", ".join([tool.name for tool in tools]))
    # Prepare the LLM by making the tools and prompt available to the model
    bind_tools = llm.bind_tools(tools)
    llm_with_tools = prompt | bind_tools
    return llm_with_tools

The system prompt defines clear guidelines for tool usage and ensures the agent provides reliable, source-backed responses.

Implementing LangGraph workflow

LangGraph enables sophisticated multi-step reasoning workflows. Our d_langgraph.py implements the complete conversational agent with state management.

Graph state management

from typing import Annotated
from typing_extensions import TypedDict
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages

# Define the graph state
class GraphState(TypedDict):
    messages: Annotated[list, add_messages]

Agent and Tools nodes

def agent(state: GraphState) -> Dict[str, List]:
    """
    Agent node
    Args:
        state (GraphState): Graph state
    Returns:
        Dict[str, List]: Updates to messages
    """
    llm_with_tools = get_llm_with_tools()
    # Get the messages from the graph `state`
    messages = state["messages"]
    # Invoke `llm_with_tools` with `messages`
    result = llm_with_tools.invoke(messages)
    # Write `result` to the `messages` attribute of the graph state
    return {"messages": [result]}

def tools_node(state: GraphState) -> Dict[str, List]:
    tools = get_tools()
    # Create a map of tool name to tool call
    tools_by_name = {tool.name: tool for tool in tools}
    result = []
    # Get the list of tool calls from messages
    tool_calls = state["messages"][-1].tool_calls
    # Iterate through `tool_calls`
    for tool_call in tool_calls:
        # Get the tool from `tools_by_name` using the `name` attribute of the `tool_call`
        tool = tools_by_name[tool_call["name"]]
        # Invoke the `tool` using the `args` attribute of the `tool_call`
        observation = tool.invoke(tool_call["args"])
        # Append the result of executing the tool to the `result` list as a ToolMessage
        result.append(ToolMessage(content=observation, tool_call_id=tool_call["id"]))
    # Write `result` to the `messages` attribute of the graph state
    return {"messages": result}

Workflow routing logic

def route_tools(state: GraphState):
    """
    Uses a conditional_edge to route to the tools node if the last message
    has tool calls. Otherwise, route to the end.
    """
    # Get messages from graph state
    messages = state.get("messages", [])
    if len(messages) > 0:
        # Get the last AI message from messages
        ai_message = messages[-1]
    else:
        raise ValueError(f"No messages found in input state to tool_edge: {state}")
    # Check if the last message has tool calls
    if hasattr(ai_message, "tool_calls") and len(ai_message.tool_calls) > 0:
        return "tools"
    return END

Complete graph assembly and adding memory with MongoDB

Enterprise applications need to remember context across conversations. Our implementation uses MongoDB’s checkpointing for persistent memory:

from langgraph.checkpoint.mongodb import MongoDBSaver

def get_compiled_graph() -> CompiledStateGraph:
    """
    Main function to execute the graph.
    """
    # Instantiate the graph
    graph = StateGraph(GraphState)
    # Add "agent" node using the `add_node` function
    graph.add_node("agent", agent)
    # Add "tools" node using the `add_node` function
    graph.add_node("tools", tools_node)
    # Add an edge from the START node to the `agent` node
    graph.add_edge(START, "agent")
    # Add a conditional edge from the `agent` node to the `tools` node
    graph.add_conditional_edges(
        "agent",
        route_tools,
        {"tools": "tools", END: END},
    )
    # Add an edge from the `tools` node to the `agent` node
    graph.add_edge("tools", "agent")
    # Initialize a MongoDB check_pointer
    client = MongoClient(MONGODB_URI)
    check_pointer = MongoDBSaver(client)
    # Instantiate the graph with the checkpointer
    return graph.compile(checkpointer=check_pointer)

Adding a method to invoke the agent

def execute_graph(thread_id: str, user_input: str, app: CompiledStateGraph) -> None:
    config = {"configurable": {"thread_id": thread_id}}
    input = {
        "messages": [
            (
                "user",
                user_input,
            )
        ]
    }
    for output in app.stream(input, config):
        for key, value in output.items():
            print(f"Node {key}:")
            print(value)
    print("\n---FINAL ANSWER---")
    print(value["messages"][-1].content)

Testing the complete system

Let’s see our enterprise AI agent in action. Our main function demonstrates the system with real-world scenarios:

Running the complete demo

def main() -> None:
    app = get_compiled_graph()
    execute_graph("001", "What are the biggest challenges facing tech companies?", app)
    execute_graph("001", "What earnings reports has a comment from Brett Iversen?", app)

Sample Data Processing

The system processes real financial documents included in the sample_data/Q4 Tech earnings-Demo/ directory:

  • Apple_analysis.docx
  • Tesla_analysis.docx
  • Microsoft_analysis.docx
  • Meta_analysis.docx
  • NVIDIA_analysis.docx

Expected System Response

When you ask “What are the biggest challenges facing tech companies?”, the system:

  1. Agent Reasoning: Decides to use the vector_search tool
  2. Tool Execution: Searches through document embeddings
  3. Result Processing: Finds relevant content from multiple reports
  4. Response Generation: Synthesizes a comprehensive answer

Sample Response:

---FINAL ANSWER---
The biggest challenges facing tech companies include:

1. **Shifting Preferences for Western Technology**: Companies like Apple 
face challenges related to changing consumer preferences, particularly 
in global markets.

2. **Complex Tech Stack Issues**: Companies such as Microsoft encounter 
challenges across every layer of their technology stack, impacting 
their operational efficiency and capacity for innovation.

3. **Scalability and Cost Efficiency for Inference at Scale**: NVIDIA 
and similar companies face challenges in providing the necessary 
throughput and maintaining cost efficiency to handle the increasing 
complexity of large-scale AI and data processing.

These challenges highlight the constantly evolving nature of the tech 
industry, requiring continuous adaptation and innovation to maintain 
competitiveness and growth.

---FINAL ANSWER---
Earnings reports from Microsoft include comments from Brett Iversen, 
who is the Vice President of Investor Relations.

Conclusion: Your enterprise AI journey

Imagine what you could do by enhancing this with other datasources, structured or not. You would have a complete view of any area from the company.

This system demonstrates how modern AI can transform enterprise document management from a storage problem into an intelligence asset. Your documents are no longer just files — they’re a queryable knowledge base that can provide insights, answer questions, and support decision-making.

Remember: the key to successful enterprise AI isn’t just the technology — it’s understanding how to combine these tools to solve real business problems while maintaining security, scalability, and user trust.