Search Infrastructure Architecture¶

Overview¶

Vellocity's search infrastructure provides agent-queryable search across user documents, platform knowledge bases, and external data sources. The system is built on a pluggable capability model where AI agents autonomously discover, query, and synthesize information from multiple providers.

Architecture Diagram¶

graph TB
    subgraph Clients["Client Layer"]
        UI[Dashboard UI]
        API[REST API]
        Agent[AI Agent]
    end

    subgraph Search["Search Router"]
        GS[General Search<br/>POST /general/search]
        CS[Capability Search<br/>QueryKnowledgeBase<br/>QueryPlatformKnowledge<br/>DiscoverSearchQuestions]
        DS[Dry-Run Sessions<br/>POST /dry-run/sessions]
    end

    subgraph Providers["Search Providers"]
        subgraph Vector["Vector Search"]
            OAI[OpenAI Embeddings<br/>text-embedding-3-small<br/>1536 dimensions]
            BKB[AWS Bedrock KB<br/>Titan Embeddings<br/>Managed Index]
            S3V[AWS S3 Vectors<br/>Float32 / Cosine<br/>1024 dimensions]
        end

        subgraph External["External Search"]
            Serper[SerperDev API<br/>Google SERP + PAA]
            Perplexity[Perplexity API<br/>Deep Research]
            DataForSEO[DataForSEO<br/>SERP Analysis]
        end

        subgraph Graph["Graph Search"]
            LinkedIn[LinkedIn API<br/>Professional Data]
            Slack[Slack API<br/>Channel Knowledge]
        end
    end

    subgraph Storage["Data Layer"]
        PdfData[(pdf_data<br/>Vectors + Content)]
        PlatformKB[(platform_knowledge_bases<br/>KB Configurations)]
        CapMap[(capability_kb_mapping<br/>Capability → KB Links)]
        KbLog[(kb_query_log<br/>Query Analytics)]
    end

    UI --> GS
    API --> CS
    Agent --> CS
    Agent --> DS

    GS --> PdfData
    CS --> OAI
    CS --> BKB
    CS --> S3V
    CS --> Serper
    CS --> Perplexity
    CS --> DataForSEO
    CS --> LinkedIn
    CS --> Slack

    DS --> CS

    OAI --> PdfData
    BKB --> PlatformKB
    S3V --> PlatformKB
    CS --> CapMap
    CS --> KbLog

Search Layers¶

Layer 1: General Search¶

Full-text keyword search across the user's workspace. Powered by SQL LIKE queries against multiple tables.

Index	Table	Fields Searched	Use Case
Templates	`openai`	`title`, `description`	Find AI workflow templates
Workbooks	`user_openai`	`title`, `output`	Find generated content
Chat Categories	`ai_chat_category`	`name`	Find chat topics
Chat History	`user_openai` (chat type)	`input`, `output`	Find past conversations

Limitations: No ranking, no fuzzy matching, no semantic understanding. Best for exact or partial keyword matches.

Layer 2: Vector Similarity Search¶

Semantic search using embeddings and cosine similarity. Two sub-providers:

OpenAI RAG (User Documents)¶

User Query → OpenAI Embedding → Cosine Similarity vs. pdf_data vectors → Ranked Results

Embedding Model: text-embedding-3-small (1,536 dimensions)
Storage: pdf_data.vector column as JSON array
Similarity: Cosine distance calculated in PHP
Scope: Per-user + team-shared documents
Filtering: Category, min similarity threshold

AWS Bedrock Knowledge Base (Platform Knowledge)¶

User Query → Bedrock Retrieve API → Managed Vector Index → Scored Results

Embedding Model: Amazon Titan or configured model
Storage: AWS-managed OpenSearch Serverless
Similarity: Native Bedrock scoring
Scope: Platform-wide curated knowledge bases
Filtering: Metadata filters, min score threshold

Layer 3: External Discovery¶

Search external sources for market intelligence and content planning.

Provider	API	Data Returned	Credits
SerperDev	Google SERP	People Also Ask, related searches, snippets	3
Perplexity	Deep research	Synthesized research with citations	5+
DataForSEO	SERP analysis	Rankings, keyword volume, competition	3

Layer 4: Graph Search¶

Query structured external data sources.

Provider	Data	Access Method
LinkedIn	Professional profiles, companies	LinkedIn API + OAuth
Slack	Channel messages, threads	Slack Bot Token

Capability System Integration¶

Search capabilities are registered in the CapabilityRegistry and can be assigned to any agent. When an agent executes, the WorkflowPlanner determines which capabilities to invoke based on the user's request.

Search Capability Registry¶

classDiagram
    class BaseCapability {
        +getName() string
        +getDescription() string
        +getRequiredParameters() array
        +getEstimatedCredits() int
        +execute(params, context, execution) array
    }

    class QueryKnowledgeBaseCapability {
        -KnowledgeBaseService $kbService
        +execute() results with similarity scores
    }

    class QueryPlatformKnowledgeCapability {
        -PlatformKnowledgeBaseService $platformService
        +execute() results from multiple KBs
    }

    class DiscoverSearchQuestionsCapability {
        -SerperDevSearch $serper
        +execute() PAA questions + intent analysis
    }

    class SlackKnowledgeSourceCapability {
        -SlackService $slack
        +execute() channel search results
    }

    class LinkedInGraphCapability {
        -LinkedInService $linkedin
        +execute() profile/company data
    }

    BaseCapability <|-- QueryKnowledgeBaseCapability
    BaseCapability <|-- QueryPlatformKnowledgeCapability
    BaseCapability <|-- DiscoverSearchQuestionsCapability
    BaseCapability <|-- SlackKnowledgeSourceCapability
    BaseCapability <|-- LinkedInGraphCapability

Agent Execution with Search¶

sequenceDiagram
    participant User
    participant Agent as AgentOrchestrator
    participant Planner as WorkflowPlanner
    participant Registry as CapabilityRegistry
    participant KB as QueryKnowledgeBase
    participant Platform as QueryPlatformKnowledge
    participant Bedrock as AWS Bedrock (LLM)

    User->>Agent: "Find best practices for marketplace listings"
    Agent->>Planner: Plan workflow steps
    Planner->>Bedrock: Analyze request, determine capabilities
    Bedrock-->>Planner: Step plan: [KB search, Platform search, Synthesize]

    Planner->>Registry: Resolve QueryKnowledgeBase
    Registry-->>Planner: Capability instance
    Planner->>KB: execute(query: "marketplace listing best practices")
    KB-->>Planner: 3 user document results

    Planner->>Registry: Resolve QueryPlatformKnowledge
    Registry-->>Planner: Capability instance
    Planner->>Platform: execute(query: "marketplace listing best practices")
    Platform-->>Planner: 2 platform KB results

    Planner->>Bedrock: Synthesize all results with brand context
    Bedrock-->>Planner: Final response

    Planner-->>Agent: Workflow complete
    Agent-->>User: Synthesized answer with citations

Data Model¶

pdf_data (User Documents + Vectors)¶

Column	Type	Description
`id`	bigint	Primary key
`user_id`	bigint	Document owner
`team_id`	bigint	Team association (nullable)
`chat_id`	bigint	Associated chat ID
`content`	text	Document text content
`vector`	json	Embedding array (1,536 floats)
`category`	varchar	Document category
`file_name`	varchar	Original filename
`is_team_shared`	boolean	Shared with team members
`usage_count`	integer	Times queried
`last_used_at`	timestamp	Last query timestamp

Access Scopes:

-- User's own documents
WHERE user_id = :current_user_id

-- User's docs + team shared
WHERE user_id = :current_user_id
   OR (team_id = :current_team_id AND is_team_shared = true)

platform_knowledge_bases¶

Column	Type	Description
`id`	bigint	Primary key
`knowledge_base_id`	varchar	AWS Bedrock KB identifier
`name`	varchar	Display name
`description`	text	KB description
`scope`	enum	`platform`, `company`, `team`
`category`	varchar	KB category
`region`	varchar	AWS region
`status`	enum	`active`, `syncing`, `error`
`embedding_model`	varchar	Model used for embeddings
`query_count`	integer	Total queries served

capability_knowledge_base_mapping¶

Column	Type	Description
`id`	bigint	Primary key
`capability_slug`	varchar	Capability identifier
`knowledge_base_id`	bigint	FK to platform_knowledge_bases
`priority`	integer	Query order (lower = first)
`max_results`	integer	Override for top_results
`min_score_threshold`	float	Override for min_similarity

kb_query_log¶

Column	Type	Description
`id`	bigint	Primary key
`user_id`	bigint	Who queried
`knowledge_base_id`	varchar	Which KB was queried
`query`	text	Search query text
`results_count`	integer	Number of results returned
`top_score`	float	Highest similarity score
`provider`	varchar	`openai`, `bedrock`, `s3vectors`
`credits_used`	integer	Credits consumed
`created_at`	timestamp	Query timestamp

Credential Resolution¶

Search services support three credential modes for accessing AWS resources:

flowchart TD
    A[Search Request] --> B{bedrock_access_model setting}
    B -->|platform| C[Use Platform Credentials<br/>config services.s3]
    B -->|user_keys| D[Use User's AWS Keys<br/>Encrypted in user settings]
    B -->|user_role| E[Assume IAM Role<br/>via STS AssumeRole]
    C --> F[Execute Search]
    D --> F
    E --> F

Mode	Credentials Source	Use Case
`platform`	`config('services.s3')`	Default — Vellocity-managed infrastructure
`user_keys`	`user.settings.aws_access_key_id` (encrypted)	Users with their own AWS accounts
`user_role`	STS `AssumeRole` with user's Role ARN	Enterprise BYOC (Bring Your Own Cloud)

S3 Vectors Integration¶

AWS S3 Vectors provides cost-optimized, FTR-compliant vector storage as an alternative to the pdf_data JSON column.

Configuration¶

[
    'region'        => 'us-east-1',
    'vector_bucket' => 'vell-knowledge-vectors',
    'index_name'    => 'kb-embeddings',
    'dimension'     => 1024,
    'metric'        => 'cosine',
    'data_type'     => 'float32'
]

Operations¶

Operation	Method	Batch Limit	Description
Store vector	`putVector()`	1	Store a single embedding with metadata
Batch store	`putVectors()`	100 per request	Auto-chunked batch insert
Query	`query()`	—	Cosine similarity search with optional filters
Delete	`deleteVector()`	1	Remove a single vector
Batch delete	`deleteVectors()`	—	Remove multiple vectors

Query Response Format¶

[
  {
    "key": "doc_chunk_1234",
    "distance": 0.15,
    "similarity": 0.85,
    "metadata": {
      "source_file": "guide.pdf",
      "category": "aws",
      "chunk_index": 3
    }
  }
]

Similarity Calculation

S3 Vectors returns distance (lower = more similar). The service converts this to similarity = 1 - distance for consistency with other providers.

Performance Characteristics¶

Provider	Latency (p50)	Latency (p99)	Max Results	Scaling
OpenAI RAG (pdf_data)	~200ms	~500ms	Limited by document count	Linear with documents
Bedrock KB	~300ms	~800ms	Configurable	AWS-managed
S3 Vectors	~150ms	~400ms	Configurable	AWS-managed
SerperDev	~500ms	~1.5s	10 PAA questions	API-limited
Slack	~400ms	~1s	Channel-dependent	API-limited

Scaling Considerations¶

OpenAI RAG Limitation

The pdf_data JSON vector column requires loading all vectors into PHP memory for cosine similarity calculation. This works well for < 10,000 documents per user but degrades beyond that. For high-volume use cases, prefer S3 Vectors or Bedrock KB.

Recommended migration path for scale:

< 1,000 docs: OpenAI RAG with pdf_data (default)
1,000–50,000 docs: AWS S3 Vectors
50,000+ docs or enterprise: AWS Bedrock Knowledge Base with managed indexing

Known Gaps¶

Gap	Impact	Planned Resolution
No full-text keyword search	Cannot find exact strings in documents	Add PostgreSQL `tsvector` index
No cross-KB unified query	Must query each KB separately	Build aggregation layer
No saved searches	Users cannot bookmark frequent queries	Add saved search model
No search analytics dashboard	No visibility into search patterns	Build on `kb_query_log` data
No MCP server for search	External AI tools cannot query search	Expose via MCP protocol
PHP cosine similarity	CPU-bound for large document sets	Migrate to pgvector or S3 Vectors

Configuration Reference¶

Environment Variables¶

Variable	Description	Default
`OPENAI_API_KEY`	OpenAI API key for embeddings	—
`AWS_BEDROCK_KB_ID`	Legacy default Bedrock KB ID	—
`AWS_BEDROCK_REGION`	Bedrock service region	`us-east-1`
`SERPER_API_KEY`	SerperDev API key	—
`S3_VECTORS_BUCKET`	S3 Vectors bucket name	`vell-knowledge-vectors`
`S3_VECTORS_INDEX`	S3 Vectors index name	`kb-embeddings`

User Settings (Database)¶

Setting Key	Description	Values
`kb_provider`	Knowledge base provider	`openai`, `bedrock`
`bedrock_access_model`	Credential mode for Bedrock	`platform`, `user_keys`, `user_role`
`aws_access_key_id`	User's AWS key (encrypted)	—
`aws_secret_access_key`	User's AWS secret (encrypted)	—
`aws_role_arn`	IAM role for BYOC	`arn:aws:iam::...`