Mirra
SDK ReferenceMemory

Memory Technical Notes

Graph structure, embeddings, search algorithms, and performance optimization

This page covers advanced topics for the memory system including the knowledge graph structure, embedding system, search algorithms, and best practices for optimal performance.

Knowledge Graph Structure

The memory system is built on Neo4j, a native graph database. All entities are stored as nodes with typed relationships connecting related information.

Node Types

Each memory type corresponds to a specific node label in Neo4j:

  • :Topic - General knowledge, concepts, subjects
  • :Task - Actionable items with deadlines
  • :Note - General notes, observations, reference material
  • :Idea - Concepts, plans, creative thoughts
  • :ShoppingItem - Shopping list items
  • :Contact - People and their contact information
  • :Document - Document metadata
  • :DocumentChunk - Document content chunks with embeddings

Node Properties

All nodes inherit common base properties:

  • id - Unique identifier (e.g., task_abc123)
  • type - Entity type for filtering
  • content - Main text content
  • createdAt - Creation timestamp (Unix milliseconds)
  • updatedAt - Last modification timestamp
  • graphId - Access control graph identifier
  • searchKeywords - Extracted keywords for keyword search
  • embedding - 1536-dimensional vector (when available)

Relationships

Entities are connected through typed relationships:

  • :CREATED - User → Entity (who created the entity)
  • :MENTIONS - Entity → Entity (references/links)
  • :RELATES_TO - Entity → Entity (semantic similarity)
  • :IN_GRAPH - Entity → Graph (access control)
  • :CHUNK_OF - DocumentChunk → Document (document structure)

Search System

The memory adapter implements a hybrid search system combining semantic search, keyword search, and filtered queries.

Semantic Search (Vector-Based)

Semantic search uses vector embeddings to find conceptually similar memories.

How it works:

  1. Query text is embedded using text-embedding-3-small (1536 dimensions)
  2. Neo4j computes cosine similarity between query embedding and entity embeddings
  3. Results are ranked by similarity score (0.0 to 1.0)

Scoring:

  • 0.9-1.0 - Highly relevant, nearly identical semantic meaning
  • 0.8-0.9 - Very relevant, similar concepts
  • 0.7-0.8 - Relevant, related topics
  • 0.6-0.7 - Somewhat relevant, tangentially related
  • < 0.6 - Likely not relevant

Best for:

  • Conceptual queries ("tasks related to marketing")
  • Natural language questions ("what did I learn about React?")
  • Cross-type searches (tasks, notes, ideas together)

Keyword Search (Text-Based)

Keyword search uses extracted keywords for fast text matching.

How it works:

  1. Query text is tokenized into keywords
  2. Neo4j matches keywords against the searchKeywords property
  3. Results are scored by keyword overlap percentage

Best for:

  • Exact phrase matching
  • Short queries (1-3 words)
  • Known terminology or names

Query enables precise filtering by type, status, metadata fields, and pagination.

How it works:

  1. Filters are converted to Cypher WHERE clauses
  2. Neo4j executes graph traversal with constraints
  3. Results are returned with pagination metadata

Best for:

  • Listing entities by type and status
  • Finding entities with specific metadata
  • Building UI list views with pagination

Embedding System

The memory adapter uses OpenAI's text-embedding-3-small model for generating semantic embeddings.

Embedding Properties

  • Model: text-embedding-3-small
  • Dimensions: 1536
  • Max tokens: 8191 tokens per input
  • Cost: Optimized for cost/performance balance

When Embeddings Are Generated

Embeddings are created asynchronously when:

  1. Creating a new memory entity
  2. Updating memory content
  3. Uploading documents (for each chunk)

Note: Search may return incomplete results immediately after creation. Allow 1-2 seconds for embedding generation to complete.

Embedding Storage

Embeddings are stored directly on entity nodes as a 1536-dimensional float array. Neo4j's vector index enables efficient cosine similarity search across millions of entities.


Graph-Based Access Control

All memory operations respect graph-based permissions using graphId filtering.

Graph ID Format

  • Personal Graph: user:{userId} (e.g., user:507f1f77bcf86cd799439011)
  • Group Graph: group:{groupId} (e.g., group:abc123)
  • Contact Graph: user_contact:{contactId} (e.g., user_contact:xyz789)

Access Rules

  1. Users can only access memories in graphs they belong to
  2. Personal graphs are private (single user)
  3. Group graphs are shared (multiple users)
  4. Contact graphs enable 1:1 memory sharing

Service Context

The memory adapter automatically determines the correct graphId from the service context:

  • Standard mode: Uses authenticated user's personal graph
  • Delegated mode: Uses visitor's temporary graph (limited access)
  • Service mode: Uses specified graph ID (full access)

Memory Types and Metadata

Each memory type has specific metadata fields optimized for its use case.

Task Metadata

{
  priority: "low" | "medium" | "high",
  status: "open" | "in_progress" | "completed" | "cancelled",
  deadline: string,  // ISO 8601 timestamp
  effort_level: "low" | "medium" | "high",
  assigned_to_user_id: string,
  completion_notes: string,
  cancellation_notes: string
}

Note Metadata

{
  category: "reference" | "observation" | "meeting" | "learning" | "personal",
  importance: "high" | "medium" | "low",
  tags: string[],
  source: string  // Where the note came from
}

Idea Metadata

{
  idea_type: "business" | "creative" | "technical" | "personal" | "improvement",
  feasibility: "high" | "medium" | "low" | "research_needed",
  development_effort: "weekend_project" | "month_project" | "major_undertaking",
  original_description: string
}

Shopping Item Metadata

{
  priority: "low" | "medium" | "high",
  status: "open" | "purchased" | "cancelled",
  store_preference: string,
  category: string,
  original_purpose: string
}

Performance Optimization

Query Limits

All list-based operations enforce limits to prevent excessive token usage:

  • Default limit: 50 items
  • Maximum limit: 100 items
  • Pagination: Use offset and limit for large result sets

Indexing

Neo4j maintains multiple indexes for fast lookups:

  • ID index: Unique constraint on entity IDs
  • Vector index: For semantic similarity search
  • Graph index: For access control filtering
  • Keyword index: For text matching

Batch Operations

For creating multiple memories, use Promise.all for concurrent operations:

const memories = [
  { type: "task", content: "Task 1" },
  { type: "task", content: "Task 2" },
  { type: "task", content: "Task 3" }
];
 
const results = await Promise.all(
  memories.map(mem => mirra.memory.create(mem))
);

Caching

Frequently accessed memories should be cached at the application level:

const memoryCache = new Map();
 
async function getMemory(id) {
  if (memoryCache.has(id)) {
    return memoryCache.get(id);
  }
  
  const result = await mirra.memory.findOne({
    filters: { id }
  });
  
  if (result.data.entity) {
    memoryCache.set(id, result.data.entity);
  }
  
  return result.data.entity;
}

Troubleshooting

Search Returns No Results

Possible causes:

  1. Embeddings not ready - Wait 1-2 seconds after creation
  2. Wrong graph ID - Verify you're searching the correct graph
  3. Query too specific - Broaden query terms
  4. Threshold too high - Accept lower similarity scores
  5. No matching entities - Verify entities exist in the graph

Solutions:

  • Check entity exists: await mirra.memory.findOne({ filters: { id } })
  • Verify graph membership
  • Use broader search terms
  • Try keyword search instead of semantic search

Updates Not Reflected

Possible causes:

  1. Wrong entity ID - Verify ID is correct
  2. Concurrent updates - Check for race conditions
  3. Permission denied - Verify graph access
  4. Invalid metadata - Check metadata format

Solutions:

  • Use findOne to verify entity exists before updating
  • Implement optimistic locking for concurrent updates
  • Validate metadata schema before calling update

Performance Issues

Symptoms:

  • Slow search responses
  • Timeout errors
  • High memory usage

Solutions:

  1. Reduce limit - Fetch fewer results per query
  2. Add filters - Narrow search scope with type/status filters
  3. Use pagination - Implement cursor-based pagination
  4. Cache results - Store frequently accessed memories locally
  5. Optimize queries - Use specific types instead of searching all entities

Best Practices

Memory Creation

  1. Use specific types - Choose the most appropriate memory type
  2. Add metadata - Include priority, tags, and categorization
  3. Write clear content - Content is used for search relevance
  4. Set deadlines - Use ISO 8601 format for dates
  5. Tag appropriately - Tags improve discoverability

Search Strategy

  1. Start broad - Use semantic search for exploratory queries
  2. Add filters - Narrow results with type and status filters
  3. Check scores - Review similarity scores for relevance
  4. Iterate - Refine queries based on initial results
  5. Combine methods - Use semantic + keyword + filters together

Data Hygiene

  1. Delete completed tasks - Remove old tasks after completion
  2. Archive old notes - Move inactive notes to long-term storage
  3. Update status - Keep task/item status current
  4. Deduplicate - Remove duplicate memories periodically
  5. Prune tags - Consolidate similar tags

Graph Organization

  1. Use personal graph - Default for private memories
  2. Share selectively - Only share to relevant group graphs
  3. Document permissions - Track who has access to what
  4. Regular audits - Review graph membership periodically
  5. Clean up - Remove memories from graphs when collaboration ends

Limitations

Vector Search Limits

  • Maximum 100 results per search (hard limit)
  • Embedding generation may take 1-2 seconds
  • Vector index requires minimum 100 entities for optimal performance

Metadata Constraints

  • Metadata must be valid JSON
  • Metadata size limited to 32KB per entity
  • Custom metadata fields must use valid Neo4j property types

Graph Traversal

  • Deep relationship traversals (>5 hops) may be slow
  • Complex queries may timeout (30 second limit)
  • Concurrent updates may cause version conflicts

See Also