AI-Dad Technical Architecture | RAG Implementation & Socratic Optimization

Technical Overview

49,223

Total Emails

58,547

Total Content Items

4,838

Q&A Pairs

384

Embedding Dimensions

Core Technologies

Language Model: Claude Sonnet for all text generation and responses
Vector Database: PostgreSQL 16.10 with pgvector extension for semantic search
Embeddings: SentenceTransformer all-MiniLM-L6-v2 (384 dimensions) with cosine similarity
Vision Analysis: Claude Vision API for automated image description and metadata extraction (all 2,114 images)
Voice Generation: ElevenLabs voice cloning for natural speech interface
Programming Paradigm: Natural Language Programming with Claude Code
Backend: Python 3.8+ with Flask, psycopg2, and async processing
Search Strategy: Multi-tier semantic search with contextual learning and weighted precedence

RAG Pipeline Architecture

User Input

Preprocessing

Multi-Search

Ranking

Generation

Pipeline Steps in Detail

Step 1: Message Analysis

The system performs semantic parsing to extract meaningful terms:

Stop word filtering (removes 'the', 'and', etc.)
Meaningful word extraction (words > 2 characters)
Important term identification (words > 4 characters)

Step 2: Multi-Tier Search Strategy

Three-tier approach for comprehensive retrieval:

Tier 1: Full context search (top 10 meaningful words)
Tier 2: Two-word combinations for precision
Tier 3: Individual important terms

Step 3: Evidence Extraction

Searches across multiple data sources simultaneously:

Email database (semantic search)
PDF documents (full-text search)
Image database (AI vision + metadata)
User feedback (exact match)

Step 4: Context Assembly

Builds comprehensive context with priority ordering:

User corrections and verified facts
Complete conversation history (no limits)
Retrieved email evidence
Domain-specific context (legal/estate)

Step 5: Response Generation

AI model generates response with personality preservation:

Dad's communication style template
Context-aware response generation
Source attribution for transparency
Feedback integration for improvement

PostgreSQL Database Architecture

🗄️ Production Database Schema

AI-Dad uses PostgreSQL 16.10 with the pgvector extension to store and search 58,547+ documents with 384-dimensional semantic vectors.

-- Main documents table with pgvector integration
CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    document_id VARCHAR(255) UNIQUE NOT NULL,
    content_type VARCHAR(50) NOT NULL,
    source_path TEXT,
    content TEXT NOT NULL,
    embedding vector(384),  -- pgvector extension
    metadata JSONB,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Index for fast vector similarity search (cosine distance)
CREATE INDEX documents_embedding_idx
ON documents
USING ivfflat (embedding vector_cosine_ops);

-- Index for content type filtering
CREATE INDEX documents_content_type_idx
ON documents(content_type);

-- Search sessions for contextual learning
CREATE TABLE search_sessions (
    session_id VARCHAR(255) PRIMARY KEY,
    username VARCHAR(100) NOT NULL,
    search_query TEXT,
    search_results JSONB,
    feedback_data JSONB,
    context_data JSONB,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
                

Key Database Features

✅ pgvector Extension: Native vector similarity search with cosine distance operator (<=>)
✅ IVFFlat Index: Inverted file flat index for fast approximate nearest neighbor search
✅ JSONB Metadata: Flexible schema for storing email headers, image analysis, Q&A context
✅ Concurrent-Safe: MVCC architecture allows parallel builds and real-time queries
✅ Contextual Feedback: search_sessions table tracks user feedback per query context

Vectorization & Semantic Search

🎯 How Text Becomes Searchable Vectors

Every piece of text (emails, documents, Q&A pairs) is transformed into a 384-dimensional mathematical representation that captures semantic meaning using the SentenceTransformer model.

The Vectorization Pipeline

1. Original Text

"Dad always said to read the fine print in contracts"

2. Tokenization

[Dad] [always] [said] [read] [fine] [print] [contracts]

3. Embedding Model

all-MiniLM-L6-v2
Transformer Neural Network

4. 384-D Vector

[0.12, -0.45, 0.78, ..., 0.23]
Semantic fingerprint

How Vector Search Works

User Query:

"What did Dad say about contract negotiations?"

Semantic Matching Process:

Query converted to vector
Cosine similarity calculated against all stored vectors
Nearest neighbors retrieved (highest similarity)
Results ranked by relevance + user annotation boost

Why Vectors Are Powerful

🎯

Semantic Understanding

🔍

Finds Related Concepts

⚡

Fast Similarity Search

🌐

Language Agnostic

Example: Semantic Similarity in Action

When you search for "contract negotiation", the system finds related content even with different wording:

✅ "agreement discussions" (0.89 similarity)
✅ "deal terms" (0.85 similarity)
✅ "legal negotiations" (0.92 similarity)
✅ "bargaining position" (0.81 similarity)

The numbers represent cosine similarity scores (0-1), where 1 is identical meaning.

Socratic RAG Optimization - The "Big-R"

🎓 What Makes It "Socratic"?

The system learns through dialogue analysis, extracting wisdom not just from what was said, but how it was communicated across 49,222 email conversations and 4,838 Q&A pairs.

Implementation Details

# Socratic Extraction Process
def extract_socratic_patterns(email_thread):
    """
    Extract Q&A patterns from email conversations
    preserving full context and communication style
    """
    
    # Parse complete thread structure
    thread_context = parse_email_thread(email_thread)
    
    # Generate multiple vectors for comprehensive search
    vectors = {
        'full_thread': embed(thread_context.full_text),
        'questions': [embed(q) for q in thread_context.questions],
        'answers': [embed(a) for a in thread_context.answers],
        'combined': embed(thread_context.qa_summary)
    }
    
    return vectors
                

Multi-Vector Indexing Strategy

Vector Type	Purpose	Weight
Full Thread	Complete conversational context	1.0
Question Vectors	Query matching	1.3
Answer Vectors	Response retrieval	1.5
Combined Q&A	Semantic relationship	1.2

Thread Preservation Benefits

✅ Maintains temporal context and conversation flow
✅ Preserves Dad's communication patterns and style
✅ Enables understanding of evolving topics
✅ Captures nuanced legal advice in context
✅ Retains emotional tone and personal touches

Learning & Optimization Strategy

Three-Phase Learning Algorithm

Phase 1: Validation-Based Optimization

# Use Q&A pairs as ground truth for optimization
def optimize_weights():
    success_metrics = []
    
    for qa_pair in socratic_pairs:
        # Search using just the question
        results = search_database(qa_pair.question)
        
        # Check retrieval accuracy
        if qa_pair.answer in results.top_5:
            success_metrics.append({
                'source': results.source_type,
                'rank': results.rank,
                'score': results.similarity_score
            })
        
        # Adjust weights based on performance
        update_source_weights(success_metrics)
                    

Phase 2: User Feedback Integration

Track user reactions (👍/👎) to responses
Store corrections for future reference
Increase weights for validated sources
Decrease weights for contradicting sources

Phase 3: Advanced Pattern Recognition

Intersection Ranking: Higher scores for multi-vector hits
Thread Coherence: Complete threads weighted over fragments
Temporal Proximity: Recent interactions prioritized
Legal Patterns: Domain-specific boost for legal queries

Configuration Management

Dynamic Weight Configuration

{
  "precedence": {
    "dad_emails": {
      "weight": 1.5,
      "priority": 1,
      "description": "Primary source - highest authenticity"
    },
    "qa_pairs": {
      "weight": 1.3,
      "priority": 2,
      "description": "Validated conversation patterns"
    },
    "pdf_documents": {
      "weight": 1.0,
      "priority": 3,
      "description": "Legal and financial documents"
    }
  },
  "learning_parameters": {
    "learning_rate": 0.01,
    "momentum": 0.9,
    "optimization_metric": "retrieval_accuracy"
  }
}

System Metrics & Performance

49,223

Individual Emails

521

Attachments

2,114

Vision-Analyzed Images

1,450

Referenced URLs

Data Source Priority Matrix

Priority	Data Source	Count	Weight	Search Method	Use Case
1	Context Files	5	4.00	Domain-Specific	Estate, Family, Professional
2	User Corrections	113	3.00	Exact Match	Verified Facts
3	User Annotations	Growing	2.50	Image Description	User-Provided Context
4	Chat History	283	N/A	Complete History	Conversation Context
5	Email Database	49,223	2.20	Semantic (pgvector)	Primary Evidence
6	Vision AI Images	2,114	1.86	Claude Vision Analysis	Visual Memories
7	Q&A Pairs	4,838	1.70	Semantic (pgvector)	Socratic Patterns
8	Attachments	521	1.25	Full-Text (PostgreSQL)	Legal/Financial Docs
9	Referenced URLs	1,450	0.85	Content Extraction	External Resources
10	AI Responses	143	0.50	Previous AI Output	Response History

Performance Optimizations

✅ PostgreSQL pgvector cosine similarity with optimized indexes
✅ Intelligent search multipliers (n_results * 2 for diversity)
✅ Sub-second queries: 0.8-0.9s typical response time
✅ Async processing for parallel searches across content types
✅ Complete conversation history (no limits) with context-aware ranking
✅ Real-time file watcher for incremental updates

Enhanced Features (October 2025)

Contextual Learning: Query-context-specific feedback system that learns what's relevant for each type of question (e.g., Helen photos vs. legal documents) without affecting unrelated queries
Vision AI Integration: Claude Vision API automatically analyzes all 2,114 images, generating searchable descriptions, metadata, and semantic embeddings
Smart Image Search: Intelligent filename parsing with user annotation priority (+30 score boost) and context-aware ranking
Annotation Interface: In-chat image annotation with instant PostgreSQL updates and delete/insert strategy for seamless editing
Review Query Detection: Automatically prioritizes full email content (1.5x boost) over annotations when user asks to "review" or "summarize" documents
Adaptive Search: Dynamic content-type balancing ensures diverse results (emails, images, Q&A, PDFs) in every response

Future Enhancements

Planned Improvements

Automated A/B Testing: Continuous weight optimization through experimentation
Conversational Templates: Pattern mining for common query types
Temporal Weighting: Dynamic adjustment based on recency and relevance
Picture Similarity Searching: Advanced image search using visual similarity algorithms
Multi-User Support: Personalized interactions for different family members
Voice Integration: Natural speech interface using ElevenLabs voice cloning (work in progress - aging my voice clone towards my father's voice)
Open Framework Release: After refining the system, will make the general framework available for others to preserve their own family legacies