Create RAG-powered apps with Box loader for LangChain.js

We’re excited to announce the launch of langchainjs-box, a new document loader that brings the power of Box’s Intelligent Content Management platform directly into your LangChain.js applications. This package makes integrating your Box files and folders for building RAG applications seamless.

Powered by Box Markdown representations

One of the most exciting features of this loader is its built-in support for Box’s new Markdown representations. Box recently introduced the ability to convert complex file formats into clean, structured Markdown — perfect for LLM-powered applications.

Our langchainjs-box loader automatically leverages Markdown representations for:

Microsoft Office: .docx, .pptx, .xls, .xlsx, .xlsm.
Google Workspace: .gdoc, .gslide, .gslides, .gsheet.
PDFs: .pdf.

This means your LangChain.js applications get clean, well-structured content that’s optimized for embedding and retrieval. For other text-based files, the loader seamlessly falls back to Box text representations. Discover how to supercharge your AI applications with Box's new Markdown representation.

Getting started

Before you start building, there are some prerequisites. Ensure you have:

Node.js version 20 or higher
Langchain/core package: ≥0.3.78 <1.0.0
A Box account (sign up for a free developer account)
A Box app created in the Box Developer Console:

langchainjs-box loader supports all available authentication methods
Server authentication applications using JWT or Client Credentials Grant must be authorized by a Box Admin before use; see details in this developer guide

Installing the loader is as simple as pulling the package from npm:

npm install langchainjs-box

Authentication options

The langchainjs-box loader supports multiple authentication methods through the BoxAuth helper class:

Developer token (best for quick prototyping):

import { BoxLoader, BoxAuth, BoxAuthType } from 'langchainjs-box';

const auth = new BoxAuth({
 authType: BoxAuthType.TOKEN,
 boxDeveloperToken: 'DEVELOPER_TOKEN'
});

JWT (either with a service account or a specified user):

import { BoxLoader, BoxAuth, BoxAuthType } from 'langchainjs-box';

const auth = new BoxAuth({
 authType: BoxAuthType.JWT,
 boxJwtPath: './path/to/jwt-config.json',
 boxUserId: 'USER_ID'
});

CCG (also with a service account or a specified user):

import { BoxLoader, BoxAuth, BoxAuthType } from 'langchainjs-box';

const auth = new BoxAuth({
 authType: BoxAuthType.CCG,
 boxClientId: 'CLIENT_ID',
 boxClientSecret: 'CLIENT_SECRET',
 boxEnterpriseId: 'ENTERPRISE_ID'
});

Check project documentation for remaining authentication code examples.

Load files or entire folders

Once you tackle authentication, it’s time to load content. The langchainjs-box loader gives you flexibility:

You can load specific files by providing an array of file IDs

import { BoxLoader, BoxAuth, BoxAuthType } from 'langchainjs-box';

const auth = new BoxAuth({
 authType: BoxAuthType.TOKEN,
 boxDeveloperToken: 'DEVELOPER_TOKEN'
});

const loader = new BoxLoader({
 boxAuth: auth,
 boxFileIds: ['FILE_ID_1', 'FILE_ID_2'],
 characterLimit: 10000  // Optional, defaults to no limit
});

const docs = await loader.load();

Load entire folders with a single folder ID.

const loader = new BoxLoader({
 boxAuth: auth,
 boxFolderId: 'FOLDER_ID'
});

const docs = await loader.load();

Recursively load folder content to include all subfolders and files.

const loader = new BoxLoader({
 boxAuth: auth,
 boxFolderId: 'FOLDER_ID',
 recursive: true,  // Optional, defaults to false
 characterLimit: 10000  // Optional, defaults to no limit
});

const docs = await loader.load();

Or lazy-load content, allowing you to process documents one at a time.

const loader = new BoxLoader({
 boxAuth: auth,
 boxFolderId: 'FOLDER_ID'
});

for await (const doc of loader.lazyLoad()) {
 // Process each document
}

Building a RAG application

Let’s take a look at a complete example of building a RAG application with LangChain.js and langchainjs-box. First, create environment variables:

OPENAI_API_KEY=YOUR_KEY
BOX_FILE_IDS=FILE_ID
CHAR_LIMIT=1000
BOX_DEVELOPER_TOKEN=YOUR_TOKEN

This example loads documents from Box, chunks them, creates embeddings, and answers questions using Retrieval-Augmented Generation (RAG) leveraging the OpenAI model:

import 'dotenv/config';
import { createRequire } from 'module';
const require = createRequire(import.meta.url);
const { BoxLoader, BoxAuth, BoxAuthType } = require('langchainjs-box');
import { RecursiveCharacterTextSplitter } from '@langchain/textsplitters';
import { ChatOpenAI, OpenAIEmbeddings } from '@langchain/openai';
import { MemoryVectorStore } from 'langchain/vectorstores/memory';
import { RunnableSequence } from '@langchain/core/runnables';
import { StringOutputParser } from '@langchain/core/output_parsers';

// You can use other authentication methods
async function loadBoxDocuments() {
 const developerToken = process.env.BOX_DEVELOPER_TOKEN;
 const boxFileIds = (process.env.BOX_FILE_IDS || '').split(',').map(s => s.trim()).filter(Boolean);
 const boxFolderId = process.env.BOX_FOLDER_ID?.trim();

 let auth;
 if (developerToken) {
   auth = new BoxAuth({
     authType: BoxAuthType.TOKEN,
     boxDeveloperToken: developerToken
   });
 }

 const loader = new BoxLoader({
   boxAuth: auth,
   boxFileIds: boxFileIds.length ? boxFileIds : undefined,
   characterLimit: process.env.CHAR_LIMIT ? Number(process.env.CHAR_LIMIT) : undefined
 });

 return loader.load();
}
async function main() {
 const openaiApiKey = process.env.OPENAI_API_KEY;
 if (!openaiApiKey) {
   throw new Error('Missing OPENAI_API_KEY in environment');
 }
 if (!process.env.BOX_DEVELOPER_TOKEN && !process.env.BOX_FILE_IDS && !process.env.BOX_FOLDER_ID) {
   throw new Error('Provide BOX_DEVELOPER_TOKEN and either BOX_FILE_IDS (comma-separated) or BOX_FOLDER_ID');
 }

 console.log('Loading documents from Box...');
 const docs = await loadBoxDocuments();
 console.log(`Loaded ${docs.length} documents`);

 const splitter = new RecursiveCharacterTextSplitter({
   chunkSize: 1000,
   chunkOverlap: 200
 });
 const splits = await splitter.splitDocuments(docs);
 const cleanSplits = splits.filter(d => d.pageContent && d.pageContent.trim().length > 0 && !d.pageContent.trim().startsWith('[Error'));
 console.log(`Created ${cleanSplits.length} clean chunks (from ${splits.length})`);

 const embeddings = new OpenAIEmbeddings({ apiKey: openaiApiKey, model: 'text-embedding-3-small' });
 const vectorstore = await MemoryVectorStore.fromDocuments(cleanSplits, embeddings);
 const retriever = vectorstore.asRetriever({ searchType: 'mmr', searchKwargs: { fetchK: 20 }, k: 8 });

 const question = process.env.QUESTION || 'What are the key points in these documents?';

 const prompt = (docsText, q) => `You are a helpful assistant.
Use the provided context to answer the question.
If the answer is not in the context, say you don't know.

Context:
${docsText}

Question: ${q}
Answer:`;

 const model = new ChatOpenAI({ model: 'gpt-4o-mini', temperature: 0, apiKey: openaiApiKey });
 const chain = RunnableSequence.from([
   async (input) => {
     const retrieved = await retriever.getRelevantDocuments(input.question);
     const context = retrieved.map(d => d.pageContent).join('\n\n');
     console.log('Context:', context);
     return { promptText: prompt(context, input.question) };
   },
   (x) => model.invoke(x.promptText),
   new StringOutputParser()
 ]);

 console.log('Asking question:', question);
 const answer = await chain.invoke({ question });
 console.log('\nAnswer:\n', answer);
}

main().catch(err => {
 console.error(err);
 process.exit(1);
});

We are excited to hear your feedback and learn how you’re leveraging Box loader for LangChain.js. Check out the langchainjs-box package on npm and explore the full documentation. LangChain recently announced a new major version, so the package will be soon updated to support it.

If you have additional requests or suggestions (or you’d like to share examples of apps that you’ve built leveraging this package), let us know on the Box Developer Community forum.

Happy building! 🚀