Mirra
Mirra APIDocuments

Documents Overview

The documents service enables you to upload, process, and semantically search documents within Mirra's knowledge graph. Documents are automatically chunked, embedded, and made searchable across your personal graph or shared graphs (groups and user contacts).

What are Documents?

The documents service provides end-to-end document processing, from upload to semantic search. When you upload a document, Mirra performs several automated operations:

  1. Text Extraction - Extracts text content from the uploaded file based on its MIME type
  2. Chunking - Splits the document into semantic chunks (typically 500-1000 tokens each)
  3. Embedding - Generates vector embeddings for each chunk using OpenAI's text-embedding-3-small model
  4. Storage - Stores document metadata and chunks in Neo4j with relationships to the target graph
  5. Indexing - Creates a vector index for fast semantic similarity search

Documents are associated with a graph ID, which determines access permissions. A document can exist in multiple graphs through the sharing mechanism, enabling collaboration while maintaining access control.

Syntax

Uploading a Document

POST /api/sdk/v1/documents/upload

Document Structure

{
  file: string;              // Base64 encoded file content
  filename: string;          // Original filename (e.g., "report.pdf")
  mimeType: string;          // MIME type (e.g., "application/pdf")
  graphId?: string;          // Target graph ID (defaults to personal graph)
  title?: string;            // Optional document title
  author?: string;           // Optional document author
  productTags?: string[];    // Optional tags
  accessLevel?: 'internal' | 'public';  // Access level (default: internal)
}

Supported File Types

Mirra supports automatic text extraction and processing for common document formats.

PDF

{
  filename: 'report.pdf',
  mimeType: 'application/pdf',
  file: base64EncodedContent
}

PDF processing preserves page numbers and metadata for accurate chunk attribution.

Microsoft Word

{
  filename: 'document.docx',
  mimeType: 'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
  file: base64EncodedContent
}

Plain Text

{
  filename: 'notes.txt',
  mimeType: 'text/plain',
  file: base64EncodedContent
}

Markdown

{
  filename: 'README.md',
  mimeType: 'text/markdown',
  file: base64EncodedContent
}

Graph-Based Access Control

All document operations respect graph-based permissions. Users can only access documents in graphs they belong to:

  • Personal Graph (user:{userId}) - Private documents accessible only by the owner
  • Group Graphs (group:{groupId}) - Documents shared with group members
  • User Contact Graphs (user_contact:{contactId}) - Documents shared in direct conversations

When uploading a document without specifying a graphId, it defaults to your personal graph. To share a document with others, use the share operation to add it to a group or contact graph.


Search Capabilities

The documents service provides semantic search across document chunks using vector similarity. Unlike keyword search, semantic search understands the meaning and context of your query, returning relevant results even when exact keywords don't match.

Search queries are embedded using the same model as document chunks, then compared against the vector index using cosine similarity. Results are ranked by similarity score (0.0 to 1.0), with higher scores indicating stronger semantic relevance.

Note: Search is scoped to a single graph ID. To search across multiple graphs, you must make separate search requests for each graph.


Document Operations

The documents service provides ten core operations for managing the document lifecycle:

  • Upload - Upload and process a new document
  • Get - Retrieve full document metadata including all chunks
  • Get Status - Check processing status without retrieving chunks
  • Get Chunks - Retrieve all chunks for a document
  • Delete - Delete a document and all associated chunks
  • Share - Share a document to another graph
  • Unshare - Remove document access from a specific graph
  • List Graphs - List all graphs a document is shared in
  • Search - Perform semantic search across documents
  • List - List all documents in a graph

Next Steps

  • Endpoints - Complete API reference for all document operations
  • Examples - Practical code examples for document management
  • Technical Notes - Processing pipeline, embeddings, and troubleshooting

See Also

  • Scripts - Build serverless functions that search documents
  • Resources - API integrations for external document sources
  • Templates - Create document management interfaces

On this page