Documents Overview

The documents service enables you to upload, process, and semantically search documents within Mirra's knowledge graph. Documents are automatically chunked, embedded, and made searchable across your personal graph or shared graphs (groups and user contacts).

What are Documents?

The documents service provides end-to-end document processing, from upload to semantic search. When you upload a document, Mirra performs several automated operations:

Text Extraction - Extracts text content from the uploaded file based on its MIME type
Chunking - Splits the document into semantic chunks (typically 500-1000 tokens each)
Embedding - Generates vector embeddings for each chunk using OpenAI's text-embedding-3-small model
Storage - Stores document metadata and chunks in Neo4j with relationships to the target graph
Indexing - Creates a vector index for fast semantic similarity search

Documents are associated with a graph ID, which determines access permissions. A document can exist in multiple graphs through the sharing mechanism, enabling collaboration while maintaining access control.

Syntax

Uploading a Document

POST /api/sdk/v1/documents/upload

Document Structure

{
  file: string;              // Base64 encoded file content
  filename: string;          // Original filename (e.g., "report.pdf")
  mimeType: string;          // MIME type (e.g., "application/pdf")
  graphId?: string;          // Target graph ID (defaults to personal graph)
  title?: string;            // Optional document title
  author?: string;           // Optional document author
  productTags?: string[];    // Optional tags
  accessLevel?: 'internal' | 'public';  // Access level (default: internal)
}

Supported File Types

Mirra supports automatic text extraction and processing for common document formats.

PDF

{
  filename: 'report.pdf',
  mimeType: 'application/pdf',
  file: base64EncodedContent
}

PDF processing preserves page numbers and metadata for accurate chunk attribution.

Microsoft Word

{
  filename: 'document.docx',
  mimeType: 'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
  file: base64EncodedContent
}

Plain Text

{
  filename: 'notes.txt',
  mimeType: 'text/plain',
  file: base64EncodedContent
}

Markdown

{
  filename: 'README.md',
  mimeType: 'text/markdown',
  file: base64EncodedContent
}

Graph-Based Access Control

All document operations respect graph-based permissions. Users can only access documents in graphs they belong to:

Personal Graph (user:{userId}) - Private documents accessible only by the owner
Group Graphs (group:{groupId}) - Documents shared with group members
User Contact Graphs (user_contact:{contactId}) - Documents shared in direct conversations

When uploading a document without specifying a graphId, it defaults to your personal graph. To share a document with others, use the share operation to add it to a group or contact graph.

Search Capabilities

The documents service provides semantic search across document chunks using vector similarity. Unlike keyword search, semantic search understands the meaning and context of your query, returning relevant results even when exact keywords don't match.

Search queries are embedded using the same model as document chunks, then compared against the vector index using cosine similarity. Results are ranked by similarity score (0.0 to 1.0), with higher scores indicating stronger semantic relevance.

Note: Search is scoped to a single graph ID. To search across multiple graphs, you must make separate search requests for each graph.

Document Operations

The documents service provides ten core operations for managing the document lifecycle:

Upload - Upload and process a new document
Get - Retrieve full document metadata including all chunks
Get Status - Check processing status without retrieving chunks
Get Chunks - Retrieve all chunks for a document
Delete - Delete a document and all associated chunks
Share - Share a document to another graph
Unshare - Remove document access from a specific graph
List Graphs - List all graphs a document is shared in
Search - Perform semantic search across documents
List - List all documents in a graph

Next Steps

Endpoints - Complete API reference for all document operations
Examples - Practical code examples for document management
Technical Notes - Processing pipeline, embeddings, and troubleshooting