Documents Overview
The documents service enables you to upload, process, and semantically search documents within Mirra's knowledge graph. Documents are automatically chunked, embedded, and made searchable across your personal graph or shared graphs (groups and user contacts).
What are Documents?
The documents service provides end-to-end document processing, from upload to semantic search. When you upload a document, Mirra performs several automated operations:
- Text Extraction - Extracts text content from the uploaded file based on its MIME type
- Chunking - Splits the document into semantic chunks (typically 500-1000 tokens each)
- Embedding - Generates vector embeddings for each chunk using OpenAI's text-embedding-3-small model
- Storage - Stores document metadata and chunks in Neo4j with relationships to the target graph
- Indexing - Creates a vector index for fast semantic similarity search
Documents are associated with a graph ID, which determines access permissions. A document can exist in multiple graphs through the sharing mechanism, enabling collaboration while maintaining access control.
Syntax
Uploading a Document
Document Structure
Supported File Types
Mirra supports automatic text extraction and processing for common document formats.
PDF processing preserves page numbers and metadata for accurate chunk attribution.
Microsoft Word
Plain Text
Markdown
Graph-Based Access Control
All document operations respect graph-based permissions. Users can only access documents in graphs they belong to:
- Personal Graph (
user:{userId}) - Private documents accessible only by the owner - Group Graphs (
group:{groupId}) - Documents shared with group members - User Contact Graphs (
user_contact:{contactId}) - Documents shared in direct conversations
When uploading a document without specifying a graphId, it defaults to your personal graph. To share a document with others, use the share operation to add it to a group or contact graph.
Search Capabilities
The documents service provides semantic search across document chunks using vector similarity. Unlike keyword search, semantic search understands the meaning and context of your query, returning relevant results even when exact keywords don't match.
Search queries are embedded using the same model as document chunks, then compared against the vector index using cosine similarity. Results are ranked by similarity score (0.0 to 1.0), with higher scores indicating stronger semantic relevance.
Note: Search is scoped to a single graph ID. To search across multiple graphs, you must make separate search requests for each graph.
Document Operations
The documents service provides ten core operations for managing the document lifecycle:
- Upload - Upload and process a new document
- Get - Retrieve full document metadata including all chunks
- Get Status - Check processing status without retrieving chunks
- Get Chunks - Retrieve all chunks for a document
- Delete - Delete a document and all associated chunks
- Share - Share a document to another graph
- Unshare - Remove document access from a specific graph
- List Graphs - List all graphs a document is shared in
- Search - Perform semantic search across documents
- List - List all documents in a graph
Next Steps
- Endpoints - Complete API reference for all document operations
- Examples - Practical code examples for document management
- Technical Notes - Processing pipeline, embeddings, and troubleshooting