Using Contextual Retrieval with Box and Pinecone

We recently published an article on Contextual Retrieval. The concept is fantastic, but how can you apply these ideas in practice? Let’s walk through an example using Box (as the content source) and Pinecone (as the vector database). Box is an enterprise Intelligent Content Management platform that stores files and provides key content-driven services like AI and eSignature, and Pinecone is a managed vector database for fast similarity search. Together, they allow developers to build a custom RAG pipeline: Box provides the documents and their metadata, Pinecone provides semantic search over those documents, and an LLM (such as OpenAI’s GPT-4 or Anthropic’s Claude) generates answers using the retrieved data.

Scenario: Imagine your company’s internal HR knowledge — PDFs, Word docs, and notes — are stored in Box. You want an AI assistant that can answer employees’ questions using that private data. You can do this with Box AI and Box Hubs, but for this example, you want to include your Box content with other external data sources. Contextual retrieval will make it effective. Here’s how you could implement it:

Index Your Box Content in Pinecone: Using Box’s API or SDK, you can iterate through files in a Box folder and fetch their text content. Box can provide a text representation of common file types (PDF, DOCX, etc.), which your code can use for processing. Split the text into reasonable chunks (e.g. 200–500 words each).

At this step, you can then use the LLM of your choice to generate and add document context to the chunk or embed file metadata (for instance, prepend the file name or a section title). Then create your embeddings with OpenAI’s Embedding model or the embedding model of your choosing. Finally, upsert these vectors into a Pinecone index. Pinecone will store vectors along with metadata — you can save the Box file ID, chunk text, and any tags as metadata for each entry. This metadata is useful later for filtering or for retrieving the full document reference. By connecting Box and Pinecone, you essentially give your LLM a memory of your enterprise data that it can search.

import os
from time import sleep
from typing import List
from boxsdk.object.file import File
from boxsdk.object.folder import Folder
from pinecone import Pinecone
from box_ai_agents_toolkit import (
    box_folder_list_content,
    box_file_text_extract,
    get_ccg_client,
    BoxClient
)

pinecone: Pinecone = Pinecone(
    api_key=os.getenv("PINECONE_API_KEY"), 
    source_tag="box-contextual-retrieval-demo"
)

pinecone_index_name = os.getenv("PINECONE_INDEX")
pinecone_index = pinecone.Index(pinecone_index_name) # type: ignore

box: BoxClient = get_ccg_client()

items: List[File | Folder] = box_folder_list_content(box, os.getenv("BOX_FOLDER_ID")) # type: ignore

for item in items:
    if item.type == "file":
        text_content: str = box_file_text_extract(box, item.id)
        chunks = []
        start = 0
        while start < len(text_content):
            end = start + int(os.getenv("CHUNK_SIZE", 4000))
            chunk = text_content[start:end]
            contextualized_chunk = f"{item.name}: {chunk}"
            chunks.append(contextualized_chunk)
            start = end - int(os.getenv("CHUNK_OVERLAP", 200))  # Overlap adjustment

        for i, chunk in enumerate(chunks):
            # Get embeddings using the Pinecone inference API
            embeddings = pinecone.inference.embed(
                model="multilingual-e5-large",
                inputs=[chunk],
                parameters={
                    "input_type": "passage",
                    "truncate": "END"
                }
            )

            # Check if embeddings were returned
            if not embeddings or len(embeddings) == 0:
                print(f"No embeddings returned for chunk {i} of file ID: {item.id}")
                continue

            # Extract the actual embedding values (list of floats) from the result
            vectors = embeddings[0]['values']

            # Store the chunk text in metadata
            minimal_metadata = {
                "chunk_id": i, 
                "file_id": item.id, 
                "file_name": item.name, 
                "box_user_id": os.getenv("BOX_SUBJECT_ID"),
                "chunk_text": chunk  # Store the chunk text as part of the metadata
            }
            # Upsert the vector with the chunk ID and metadata into Pinecone
            pinecone_index.upsert([(f"{item.id}_chunk_{i}", vectors, minimal_metadata)], namespace=os.getenv("BOX_SUBJECT_ID"), index_name=pinecone_index_name)
            
            print(f"Upserted chunk {i} for file ID: {item.id} into Pinecone.")

print("All files processed and upserted into Pinecone.")

In this sample, we loop over files in a Box folder, get their text, split them into chunks, contextualize them, and create embeddings. We prepend the file name to the chunk text as a simple form of contextualization. Each vector with metadata is upserted into Pinecone. (In a real app, you might use batch upsert for efficiency and a more advanced chunking strategy.) After this, Pinecone holds a vector index of your Box content. The BM25 or keyword aspect can be handled by Pinecone’s hybrid search if enabled, or you can maintain a parallel index using Box metadata or an external search engine.

Use an LLM to generate contextual statements: In the last example, we showed a very basic contextual statement. It does add context, but you may want more. In the next example, we’ll edit the code slightly to show a function that implements a technique to have the LLM of your choice evaluate the chunk and the entire document to generate a statement that explains the context and importance of the chunk in the broader document.

First, we need to import and initialize the OpenAI client. We must then replace the line of code that appends the file name to the chunk with a call to a new function we’ll call contextualize_chunk. This function takes the OpenAI client, the chunk, and the full text of the file as arguments.

from openai import OpenAI

openai = OpenAI(api_key=os.getenv("OPENAI_API_KEY", ""))

while start < len(text_content):
            end = start + int(os.getenv("CHUNK_SIZE", 4000))
            chunk = text_content[start:end]
            contextualized_chunk = contextualize_chunk(openai, chunk, text_content)
            chunks.append(contextualized_chunk)
            start = end - int(os.getenv("CHUNK_OVERLAP", 200))  # Overlap adjustment

The next step is to add in our new function. This function will take the full text and the chunk and send it to gpt-4o (though you can use whatever model you like) with a prompt to generate a context statement. The response is then prepended to the chunk and returned.

def contextualize_chunk(openai, chunk, content):
    prompt = f"""
        Given the document below, explain what the chunk captures in the context of the whole document.

        <document>
        {content}
        </document>

        Here is the chunk we want to explain:
        <chunk>
        {chunk}
        </chunk>

        Answer ONLY with a succinct explanation of the meaning of the chunk in the context of the whole document above.
    """.format(content=content, chunk=chunk)

   
    response = openai.chat.completions.create(
        model="gpt-4o", # Or another suitable model
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": prompt}
        ],
        max_tokens=100,
        temperature=0.7,
    )
    contextual_explanation = response.choices[0].message.content.strip()

    return f"{contextual_explanation}: {chunk}"

With these tweaks, you’ll now have full contextual awareness in each chunk. Feel free to tweak the prompt and test other models and providers to find the one that works best for you, your content, and your use case.

This Box + Pinecone ingestion setup demonstrates contextual retrieval in action. By leveraging Box’s rich content and Pinecone’s vector search, the AI assistant always has the right context at hand. It retrieves answers from your data (not just generic training data) and uses context (document info, metadata, and query filters) to pinpoint what the user needs. This results in more relevant and credible answers, powered by your organization’s knowledge.

Conclusion and Next Steps

Contextual retrieval is a powerful addition to any RAG-based AI application. It bridges the gap between isolated information and the rich, contextual knowledge that leads to the best possible results. By keeping track of “the bigger picture” — whether that’s the source context of a document chunk or the situational context of a user’s request — we can significantly boost the quality of AI-generated responses. We’ve seen how methods like adding explanatory context to text chunks or using session data for query filtering can improve retrieval accuracy and user satisfaction. Techniques pioneered by researchers like Anthropic and their contextual embeddings are now within reach for developers to implement in their own projects.

As you build your next AI solution, consider how contextual retrieval can make your system smarter and more aligned with users’ needs. Whether you use an open-source stack or platforms like Box and Pinecone to get started quickly, the principle remains the same: equip your AI with context so that it retrieves the right knowledge at the right time.

To learn more about building intelligent, content-driven applications, follow Box for updates and tutorials. Check out our videos on YouTube, read more guides on Medium, and explore example projects on our GitHub. By staying connected, you’ll be ready to build the next generation of context-aware AI solutions with us!