Create an AI e-signature chatbot with Box Sign, Box Content Preview, and CopilotKit

|
Share

Managing complex signature workflows can be challenging. Advanced cases might include setting up sequential signing with multiple approvers and signers, configuring custom email messages, setting expiration dates, or enabling automatic reminders. Any of these typically require navigating through multiple screens and filling in numerous and specific fields. 

What if you could let your users handle all of this complexity with a mandatory preview and confirmation step instead — and using natural language? Or send bulk requests with just one prompt?

Follow along to learn about a demo project, the conversational AI interface for Box Sign, Box Content Preview and CopilotKit. This is an open-source framework for creating AI-native applications. We'll combine Box Sign's powerful enterprise features with natural language processing to handle complex signature workflows through simple conversation.

What is CopilotKit

Before we dive into the integration, let's talk about CopilotKit. CopilotKit is an open-source framework for building AI-native web apps where the AI is wired directly into your product’s workflows.

You expose your app’s capabilities as a structured set of tools. CopilotKit maintains conversational state, maps natural language intent to those actions, executes them on the user’s behalf, and drives dynamic UI updates based on what’s happening in the flow. This lets you focus on domain logic and UX while CopilotKit handles orchestration, context, and intent-to-action routing.

With CopilotKit, you describe what you want to accomplish, and the AI figures out how to make it happen using the actions you've defined.

Box Sign's flexibility meets complexity

Box Sign is an enterprise-grade e-signature solution that provides comprehensive features for document signing workflows. You can create signature requests with multiple participants in different roles (signers, approvers, final copy readers), set up sequential signing orders, configure automatic expiration dates and reminder cadences, or customize email subjects and messages.

This flexibility is powerful, but it comes with complexity. A typical advanced workflow might require a lot of steps and could be a tedious task. What normally takes 8 to 15 clicks and 1 to 3 minutes across multiple screens can become one sentence followed by a quick preview and confirmation. The end user could simply say:

"Send the job agreement to [email protected] for approval, then [email protected] to sign. Make it expire in 7 days, enable reminders, and set the subject to 'Engineering manager job agreement at Acme.'"

That's exactly what this demo project delivers.

Transforming Box Sign workflows with AI

By integrating Box Sign and Box Content Preview with CopilotKit, end users can leverage a natural language interface that handles the full complexity of signature workflows. Here's how it works:

Action-based architecture

CopilotKit “actions” are the bridge between the LLM and Box API. Technically, an action is a typed function (with a name, schema, and handler) that CopilotKit exposes to the model as an available tool.

When the model decides it needs real data or to perform an operation, it returns a structured action invocation. CopilotKit validates the arguments against the action’s schema, runs your handler, and then feeds the result back into the model so it can continue the conversation with grounded context.

type BoxAction =
  | "search_files"
  | "get_file_preview_info"
  | "create_signature_request"
  | "list_signature_requests"
  | "get_signature_request_status"
  | "cancel_signature_request"
  | "resend_signature_request";

In the snippet above you can see the actions used in this demo project. Each action is defined with clear parameters and descriptions that help the LLM understand when and how to use it.

Natural language to API parameters

CopilotKit's LLM handles the translation from natural language to structured API calls. When a user says: 

"Create a sign request, including one signer at [email protected], valid for 14 days, enable reminders, subject: 'Contract to sign'"

CopilotKit extracts:

  • signerEmails: [[email protected]]
  • daysValid: 14
  • areRemindersEnabled: true
  • emailSubject: "Contract to sign"

The framework understands context and intent, so users don't need to use specific keywords or follow a rigid structure.

Context persistence

The demo project maintains context throughout the conversation. Users can create a request and then ask "What's the status of this request?", or say "Cancel the ten last requests" — referring to the most recently created ones. CopilotKit's context system makes multi-turn conversations natural and intuitive.

Box Content Preview and human-in-the-loop (HITL)

Before sending any signature request, the LLM calls frontend prepare_signature_request action with the parsed parameters, displays the document preview using Box Content Preview and shows all request details. The agent, before proceeding, asks for confirmation, or can update the request.

This human-in-the-loop preview and confirmation step is critical because signature workflows are legally sensitive, so requiring explicit user approval before sending helps prevent compliance issues, avoid high-impact mistakes, and build durable trust in the system’s AI-driven actions.

Agents instructions: Handling complex workflows

The AI agent instructions define how to handle advanced Box Sign workflows by restricting the assistant to signing related tasks, requiring a preview step using prepare_signature_request and explicit user confirmation before sending, and ensuring required inputs such as a valid file ID and real participant emails are collected first.

They also describe how to correctly configure complex recipient scenarios by assigning each person the right role (as Signer, Approver, or Final copy reader), setting routing order, and mapping these decisions to the correct Box Sign API parameters, plus performing bulk request operations such as canceling the most recent 10 matching sign requests by finding the relevant requests and applying the action in a batch.

The happy path: Chat interaction example

Let's see a complete AI e-signature workflow in action. User enters a prompt with a request to create a Box Sign workflow:

Search for my Vendor agreement.pdf, then create a signature request with one approver: [email protected], then pass it to a signer: [email protected]. Make it sequential, valid for 7 days, enable reminders, and set the email subject to "Please Sign the Vendor Agreement".

AI Assistant:

  1. Calls search_files with a provided file name (alternatively it can also be a file ID)
  2. Returns file details from Box File API
  3. Extracts all relevant parameters from the user prompt
  4. Calls prepare_signature_request with data extracted from the prompt (no call to Box API)
  5. Displays document in Box Content Preview and the parsed request parameters in the app UI
  6. Writes a message in the AI chat:
Please confirm if everything looks correct, and I will proceed with sending the signature request.

User confirms:

Looks good!

AI Assistant:

  1. Calls create_signature_request with the same parameters by calling Box Sign API
  2. Returns confirmation message in the AI chat
  3. Updates the UI table which displays active e-signature requests

That was an example of the happy path of a Box Sign workflow. However, the demo agent is capable of much more, including canceling the existing requests, suggesting additional parameters for the e-signature workflows, or performing bulk actions. Signatures often involve sensitive data and legal commitments, so be clear about limitations before rolling out an AI-driven workflow.

Implementation highlights

The demo project implementation uses modern web technologies:

  • Next.js: React framework for the application structure
  • CopilotKit React components: CopilotKit , CopilotSidebar , and useCopilotAction hooks
  • Box Node.js SDK v10: Server-side Box API integration for search, sign requests, and file operations
  • Box Content Preview component: For showing document previews before sending

All Box operations run server-side in Next.js API routes. The frontend only communicates with CopilotKit, which decides when to call Box actions based on user input. This architecture keeps credentials secure and provides a clean separation of concerns.

High-level overview

The BoxSignAssistant component serves as the main orchestrator of the conversational interface. It manages three key pieces of state: previewData for document preview information, activeSignRequests for tracking pending signature workflows, and activeSignRequestsLoading for UI feedback during API calls.

The component wraps the entire experience in CopilotKit’s provider, configured with the runtime URL and error handling. It also establishes a React context to share state across child components.

The BoxSignActions component defines all CopilotKit actions (search, create, cancel, etc.), while CopilotSidebar provides the conversational interface with custom labels and initial instructions.

The UI switches between two states. When previewData exists, it displays the BoxPreviewPanel, showing the document and request details for user confirmation. When no preview is active, it shows a welcome screen with feature cards explaining search, creation, and tracking capabilities.

The ActiveSigningRequestsPanel component remains visible throughout and auto-refreshes to display all active signature requests in a table, including participant roles, status, and expiration information.

This architecture ensures a seamless flow from conversation to preview to confirmation to execution, while maintaining visibility into ongoing workflows.

Here's the simplified component structure:

export default function BoxSignAssistant() {
  const [previewData, setPreviewData] = useState<PreviewData | null>(null);
  const [activeSignRequests, setActiveSignRequests] = useState<ActiveSignRequest[]>([]);
  const [activeSignRequestsLoading, setActiveSignRequestsLoading] = useState(false);

  return (
    <CopilotKit
      runtimeUrl="/api/copilotkit"
      onError={handleCopilotError}
    >
      <PreviewContext.Provider
        value={{
          previewData,
          setPreviewData,
          activeSignRequests,
          setActiveSignRequests,
          activeSignRequestsLoading,
          setActiveSignRequestsLoading,
        }}
      >
        {/* Defines all CopilotKit actions (tool for searching files, creating sign requests, etc.) */}
        <BoxSignActions />
        
        {/* Conversational chat interface */}
        <CopilotSidebar
          defaultOpen
          labels={{
            title: "Box Sign AI Assistant",
            initial: "Get help with signing documents in Box..."
          }}
        />
        
        {/* Main content area */}
  <div>
          {previewData ? (
            // Show document preview and request details for confirmation
            <BoxPreviewPanel
              data={previewData}
              onDismiss={() => setPreviewData(null)}
            />
          ) : (
            // Show welcome screen with feature cards
            <main>
              <h1>Box Sign AI Assistant</h1>
              {/* Feature cards: Search, Create Requests, Track Progress */}
            </main>
          )}
          
          {/* Always visible: auto-refreshing list of active requests */}
          <ActiveSigningRequestsPanel
            requests={activeSignRequests}
            loading={activeSignRequestsLoading}
          />
        </div>
      </PreviewContext.Provider>
    </CopilotKit>
  );
}

Example CopilotAction code snippet

Let’s take a look at the useCopilotAction definition. Here's how the action that prefaces the human-in-the-loop step, called the prepare_signature_request action:

 useCopilotAction({
    name: "prepare_signature_request",
    description:
      "Prepare a signature request by showing the user a document preview and request details. Call this FIRST before create_signature_request. Use the same parameters (fileId, participants, etc.). After the user confirms in chat (e.g. 'yes' or 'confirm'), call create_signature_request with the same parameters.",
    parameters: [
      { name: "fileId", type: "string", description: "Box file ID to be signed", required: true },
      { name: "parentFolderId", type: "string", description: "Box folder ID for signed document. Optional.", required: false },
      { name: "approverEmails", type: "string[]", description: "Emails of approvers.", required: false },
      { name: "signerEmails", type: "string[]", description: "Emails of signers.", required: false },
      { name: "finalCopyReaderEmails", type: "string[]", description: "Emails for final copy.", required: false },
      { name: "participants", type: "object[]", description: "Array of { email, role }. Optional.", required: false },
      { name: "isSequential", type: "boolean", description: "Sequential signing order.", required: false },
      { name: "daysValid", type: "number", description: "Days until expiration.", required: false },
      { name: "areRemindersEnabled", type: "boolean", description: "Enable reminders.", required: false },
      { name: "name", type: "string", description: "Request name.", required: false },
      { name: "emailSubject", type: "string", description: "Email subject.", required: false },
      { name: "emailMessage", type: "string", description: "Email message body.", required: false },
    ],
    handler: async (params) => {
      const participants = buildParticipantsFromParams({
        participantsRaw: params.participants,
        approverEmailsRaw: params.approverEmails,
        signerEmailsRaw: params.signerEmails,
        finalCopyReaderEmailsRaw: params.finalCopyReaderEmails,
      });
      const fileId = params.fileId ? String(params.fileId).trim() : "";
      if (!fileId) throw new Error("fileId is required.");
      if (!participants.length)
        throw new Error("Specify who is involved: participants or approverEmails/signerEmails.");
      const { fileName, requestSummary } = await stagePreviewAndDetails(
        params as Record<string, unknown>,
        participants
      );
      pendingCreateSignatureRef.current = buildCreateSignature(
        params as Record<string, unknown>,
        participants
      );
      return `I've prepared the signature request. The document **${fileName}** is shown in the preview panel next to this chat, with the request details below it. Please review and confirm: **Is this the correct document to sign?** Reply **Yes** or **Confirm** to send the signature request; I will then create it with the same parameters (${participants.length} participant(s), ${requestSummary.isSequential ? "sequential signing" : "any order"}${requestSummary.daysValid ? `, expires in ${requestSummary.daysValid} days` : ""}${requestSummary.areRemindersEnabled ? ", reminders enabled" : ""}).`;
    },
  });

The LLM uses the parameter descriptions to understand what information it needs from the user and when to call this action. With those technical details in mind, let’s dive deep in interactions between user, AI chat, Box API, and UI.

Try it yourself

The complete implementation is available as an open-source project. In order to run it you’ll need:

  1. Node.js 18+
  2. Box account with access to Box Sign API (available for Business and above plans).
  3. A Box Custom App with read/write files and manage signature requests scopes enabled
  4. Localhost added to Custom App CORS settings
  5. An OpenAI API key

To run the project, simply clone the repository, configure your environment variables, install dependencies, and run npm run dev to see the conversational interface in action.

Disclaimer: this project is a proof of concept for demonstration and evaluation only. It is not production-ready and should not be deployed without additional improvements and thorough testing. Finally, consider cost implications, as this solution leverages OpenAI services and results in usage-based charges.

Looking ahead

This pattern applies to any complex workflow, not just e-signatures. CopilotKit provides the infrastructure to build conversational, action-based experiences today, so you can focus on domain logic instead of AI plumbing.

If you build something with this approach or have suggestions, share your experience on the Box Developer Community forum. Happy building!

Resources