Memory Technical Notes
Graph structure, embeddings, search algorithms, and performance optimization
This page covers advanced topics for the memory system including the knowledge graph structure, embedding system, search algorithms, and best practices for optimal performance.
Knowledge Graph Structure
The memory system is built on Neo4j, a native graph database. All entities are stored as nodes with typed relationships connecting related information.
Node Types
Each memory type corresponds to a specific node label in Neo4j:
:Topic- General knowledge, concepts, subjects:Task- Actionable items with deadlines:Note- General notes, observations, reference material:Idea- Concepts, plans, creative thoughts:ShoppingItem- Shopping list items:Contact- People and their contact information:Document- Document metadata:DocumentChunk- Document content chunks with embeddings
Node Properties
All nodes inherit common base properties:
id- Unique identifier (e.g.,task_abc123)type- Entity type for filteringcontent- Main text contentcreatedAt- Creation timestamp (Unix milliseconds)updatedAt- Last modification timestampgraphId- Access control graph identifiersearchKeywords- Extracted keywords for keyword searchembedding- 1536-dimensional vector (when available)
Relationships
Entities are connected through typed relationships:
:CREATED- User → Entity (who created the entity):MENTIONS- Entity → Entity (references/links):RELATES_TO- Entity → Entity (semantic similarity):IN_GRAPH- Entity → Graph (access control):CHUNK_OF- DocumentChunk → Document (document structure)
Search System
The memory adapter implements a hybrid search system combining semantic search, keyword search, and filtered queries.
Semantic Search (Vector-Based)
Semantic search uses vector embeddings to find conceptually similar memories.
How it works:
- Query text is embedded using
text-embedding-3-small(1536 dimensions) - Neo4j computes cosine similarity between query embedding and entity embeddings
- Results are ranked by similarity score (0.0 to 1.0)
Scoring:
0.9-1.0- Highly relevant, nearly identical semantic meaning0.8-0.9- Very relevant, similar concepts0.7-0.8- Relevant, related topics0.6-0.7- Somewhat relevant, tangentially related< 0.6- Likely not relevant
Best for:
- Conceptual queries ("tasks related to marketing")
- Natural language questions ("what did I learn about React?")
- Cross-type searches (tasks, notes, ideas together)
Keyword Search (Text-Based)
Keyword search uses extracted keywords for fast text matching.
How it works:
- Query text is tokenized into keywords
- Neo4j matches keywords against the
searchKeywordsproperty - Results are scored by keyword overlap percentage
Best for:
- Exact phrase matching
- Short queries (1-3 words)
- Known terminology or names
Query (Filtered Search)
Query enables precise filtering by type, status, metadata fields, and pagination.
How it works:
- Filters are converted to Cypher WHERE clauses
- Neo4j executes graph traversal with constraints
- Results are returned with pagination metadata
Best for:
- Listing entities by type and status
- Finding entities with specific metadata
- Building UI list views with pagination
Embedding System
The memory adapter uses OpenAI's text-embedding-3-small model for generating semantic embeddings.
Embedding Properties
- Model:
text-embedding-3-small - Dimensions: 1536
- Max tokens: 8191 tokens per input
- Cost: Optimized for cost/performance balance
When Embeddings Are Generated
Embeddings are created asynchronously when:
- Creating a new memory entity
- Updating memory content
- Uploading documents (for each chunk)
Note: Search may return incomplete results immediately after creation. Allow 1-2 seconds for embedding generation to complete.
Embedding Storage
Embeddings are stored directly on entity nodes as a 1536-dimensional float array. Neo4j's vector index enables efficient cosine similarity search across millions of entities.
Graph-Based Access Control
All memory operations respect graph-based permissions using graphId filtering.
Graph ID Format
- Personal Graph:
user:{userId}(e.g.,user:507f1f77bcf86cd799439011) - Group Graph:
group:{groupId}(e.g.,group:abc123) - Contact Graph:
user_contact:{contactId}(e.g.,user_contact:xyz789)
Access Rules
- Users can only access memories in graphs they belong to
- Personal graphs are private (single user)
- Group graphs are shared (multiple users)
- Contact graphs enable 1:1 memory sharing
Service Context
The memory adapter automatically determines the correct graphId from the service context:
- Standard mode: Uses authenticated user's personal graph
- Delegated mode: Uses visitor's temporary graph (limited access)
- Service mode: Uses specified graph ID (full access)
Memory Types and Metadata
Each memory type has specific metadata fields optimized for its use case.
Task Metadata
Note Metadata
Idea Metadata
Shopping Item Metadata
Performance Optimization
Query Limits
All list-based operations enforce limits to prevent excessive token usage:
- Default limit: 50 items
- Maximum limit: 100 items
- Pagination: Use
offsetandlimitfor large result sets
Indexing
Neo4j maintains multiple indexes for fast lookups:
- ID index: Unique constraint on entity IDs
- Vector index: For semantic similarity search
- Graph index: For access control filtering
- Keyword index: For text matching
Batch Operations
For creating multiple memories, use Promise.all for concurrent operations:
Caching
Frequently accessed memories should be cached at the application level:
Troubleshooting
Search Returns No Results
Possible causes:
- Embeddings not ready - Wait 1-2 seconds after creation
- Wrong graph ID - Verify you're searching the correct graph
- Query too specific - Broaden query terms
- Threshold too high - Accept lower similarity scores
- No matching entities - Verify entities exist in the graph
Solutions:
- Check entity exists:
await mirra.memory.findOne({ filters: { id } }) - Verify graph membership
- Use broader search terms
- Try keyword search instead of semantic search
Updates Not Reflected
Possible causes:
- Wrong entity ID - Verify ID is correct
- Concurrent updates - Check for race conditions
- Permission denied - Verify graph access
- Invalid metadata - Check metadata format
Solutions:
- Use
findOneto verify entity exists before updating - Implement optimistic locking for concurrent updates
- Validate metadata schema before calling update
Performance Issues
Symptoms:
- Slow search responses
- Timeout errors
- High memory usage
Solutions:
- Reduce limit - Fetch fewer results per query
- Add filters - Narrow search scope with type/status filters
- Use pagination - Implement cursor-based pagination
- Cache results - Store frequently accessed memories locally
- Optimize queries - Use specific types instead of searching all entities
Best Practices
Memory Creation
- Use specific types - Choose the most appropriate memory type
- Add metadata - Include priority, tags, and categorization
- Write clear content - Content is used for search relevance
- Set deadlines - Use ISO 8601 format for dates
- Tag appropriately - Tags improve discoverability
Search Strategy
- Start broad - Use semantic search for exploratory queries
- Add filters - Narrow results with type and status filters
- Check scores - Review similarity scores for relevance
- Iterate - Refine queries based on initial results
- Combine methods - Use semantic + keyword + filters together
Data Hygiene
- Delete completed tasks - Remove old tasks after completion
- Archive old notes - Move inactive notes to long-term storage
- Update status - Keep task/item status current
- Deduplicate - Remove duplicate memories periodically
- Prune tags - Consolidate similar tags
Graph Organization
- Use personal graph - Default for private memories
- Share selectively - Only share to relevant group graphs
- Document permissions - Track who has access to what
- Regular audits - Review graph membership periodically
- Clean up - Remove memories from graphs when collaboration ends
Limitations
Vector Search Limits
- Maximum 100 results per search (hard limit)
- Embedding generation may take 1-2 seconds
- Vector index requires minimum 100 entities for optimal performance
Metadata Constraints
- Metadata must be valid JSON
- Metadata size limited to 32KB per entity
- Custom metadata fields must use valid Neo4j property types
Graph Traversal
- Deep relationship traversals (>5 hops) may be slow
- Complex queries may timeout (30 second limit)
- Concurrent updates may cause version conflicts
See Also
- Overview - Memory system concepts and types
- Operations - Complete API reference
- Examples - Practical code examples
- Documents - Document embedding system