Knowledge Base Implementation Guide¶

AI-Powered Partner GTM Content Generation¶

Date: 2025-11-17 (Updated: 2026-01-09) Status: Implementation Complete

Overview¶

This guide documents the implementation of the Knowledge Base system that allows agents to automatically reference uploaded documents (PDFs, battle cards, AWS docs, etc.) when generating content.

What Was Implemented¶

QueryKnowledgeBaseCapability - New agent capability for semantic search
Enhanced PdfData Model - Added categorization, team sharing, and usage tracking
GenerateTextCapability Integration - Automatic knowledge base querying during content generation
Database Enhancements - New columns for knowledge base organization

Architecture¶

How It Works¶

User uploads document (PDF, battle card, case study)
  ↓
Content chunked into 3500-4000 char segments
  ↓
Each chunk converted to 1536-dim embedding vector (OpenAI Ada-002)
  ↓
Stored in pdf_data table with metadata (category, file_name, user_id, team_id)
  ↓
Agent receives task: "Create competitive email"
  ↓
GenerateTextCapability checks if agent has query_knowledge_base capability
  ↓
If yes: Query knowledge base with topic
  ↓
Vector similarity search finds top 3-5 most relevant chunks
  ↓
Chunks injected into AI prompt as reference material
  ↓
AI generates content citing specific facts from knowledge base
  ↓
Output: Context-aware, fact-based content

Database Changes¶

New Migration: `2025_11_17_000002_enhance_pdf_data_for_knowledge_base.php`¶

Adds the following columns to pdf_data table:

Column	Type	Purpose
`user_id`	integer	Owner of the document
`team_id`	integer	Team that owns the document (from user's active team)
`category`	string(50)	Document type (sales, technical, partner, compliance)
`file_name`	string	Original filename for reference
`is_team_shared`	boolean	Allow team members to access
`usage_count`	integer	Track how often document is referenced
`last_used_at`	timestamp	Last time document was queried

Indexes added: - (user_id, category) - Fast category filtering - (team_id, is_team_shared) - Team access queries - last_used_at - Usage analytics

Team-Scoped Document Access¶

With the team-centric architecture, documents are scoped to teams:

┌─────────────────────────────────────────────────────────────────────┐
│                    DOCUMENT ACCESS MODEL                             │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  User's Active Team (active_team_id)                                │
│           │                                                          │
│           ▼                                                          │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │  Documents visible to user:                                  │    │
│  │                                                              │    │
│  │  1. Own documents (user_id = current_user)                   │    │
│  │  2. Team-shared docs (team_id = active_team AND             │    │
│  │                       is_team_shared = true)                 │    │
│  └─────────────────────────────────────────────────────────────┘    │
│                                                                      │
│  Note: Users belonging to multiple teams only see documents         │
│  from their ACTIVE team context. They can switch teams to           │
│  access other team's shared documents.                              │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Query Logic:

// Get documents accessible by current user in active team context
$activeTeamId = auth()->user()->activeTeam()?->id;

PdfData::where('user_id', auth()->id())
    ->orWhere(function ($query) use ($activeTeamId) {
        $query->where('team_id', $activeTeamId)
              ->where('is_team_shared', true);
    })
    ->get();

Multi-Team Considerations: - Documents uploaded are automatically assigned team_id from user's active team - When user switches teams, they see that team's shared documents - Users can have private documents (not team-shared) that follow them across teams

Run Migration¶

php artisan migrate

New Capability Registration¶

Seeder: `QueryKnowledgeBaseCapabilitySeeder.php`¶

Registers the query_knowledge_base capability in the database.

Run Seeder:

php artisan db:seed --class=QueryKnowledgeBaseCapabilitySeeder

Capability Details: - Slug: query_knowledge_base - Category: knowledge - Required Params: query - Optional Params: category, top_results, min_similarity - Credits: 2 per query

Code Components¶

1. QueryKnowledgeBaseCapability¶

Location: app/Extensions/ContentManager/System/Services/Capabilities/QueryKnowledgeBaseCapability.php

Key Features: - Vector similarity search using OpenAI embeddings - Scoped to user + team-shared documents - Category filtering (sales, technical, partner, compliance) - Configurable similarity threshold (default: 0.7) - Returns top N most relevant chunks (default: 5)

Example Usage:

$capability = app(QueryKnowledgeBaseCapability::class);

$result = $capability->execute([
    'query' => 'AWS Marketplace listing best practices',
    'category' => 'compliance',
    'top_results' => 5,
    'min_similarity' => 0.7,
], $context, $execution);

// $result['results'] contains array of relevant document chunks
// Each with: id, content, similarity, source_file, category

2. Enhanced PdfData Model¶

Location: app/Models/PdfData.php

New Methods: - recordUsage() - Increment usage counter - scopeCategory($category) - Filter by category - scopeTeamShared($teamId) - Get team's shared docs - scopeAccessibleBy($userId, $teamId) - Get user's accessible docs (own + team shared)

Relationships: - user() - BelongsTo User - team() - BelongsTo Team

3. GenerateTextCapability Integration¶

Location: app/Extensions/ContentManager/System/Services/Capabilities/GenerateTextCapability.php

Changes: - Added queryKnowledgeBase() method - Updated buildPrompt() to accept KB results - KB context injected at top of prompt (highest priority) - Only queries KB if agent has query_knowledge_base capability enabled

How It Works:

Check if agent has query_knowledge_base in capabilities array
If yes, build query from topic + content type
Execute QueryKnowledgeBaseCapability
Format results as "Reference Material" in prompt
Instruct AI to cite facts from references

Example Prompt Structure:

# Relevant Knowledge Base Content

## Reference 1 (Relevance: 89.5%)
Source: AWS_Marketplace_Guidelines.pdf

[content chunk here...]

---

## Reference 2 (Relevance: 85.2%)
Source: Battle_Card_Competitor_X.pdf

[content chunk here...]

---

IMPORTANT: Use the above reference material when relevant...

# Brand Voice Context
Company: Your Company
Tone: Professional

# Your Task
Generate a medium email about: Competitive positioning against Competitor X
...

Usage Instructions¶

For Partners/Users¶

1. Upload Documents to Knowledge Base¶

Currently, upload via AI PDF Chat interface at: /dashboard/user/openai/generator/ai_pdf/workbook

Categories: - sales - Battle cards, sales decks, case studies, objection handlers - technical - Product docs, architecture diagrams, API specs - partner - Partner materials, joint solution briefs, co-sell docs - compliance - AWS Marketplace guidelines, security compliance docs - general - Anything else

2. Create Agent with Knowledge Base Access¶

When creating or editing an agent, enable the query_knowledge_base capability:

{
  "name": "Partner Email Writer",
  "capabilities": [
    "generate_text",
    "query_knowledge_base"
  ],
  "settings": {
    "kb_category": "sales",  // Optional: restrict to specific category
    "kb_results": 3,         // Optional: number of references to retrieve
    "kb_min_similarity": 0.7 // Optional: relevance threshold
  }
}

3. Execute Agent Task¶

Task: "Create a competitive positioning email for prospect Acme Corp highlighting our advantages over Competitor X"

Result: Agent will:
1. Query knowledge base for "competitive email Competitor X"
2. Find relevant battle cards, case studies
3. Generate email citing specific differentiation points
4. Include case study references where applicable

Testing¶

Test Scenario 1: Sales Email with Battle Card Reference¶

Setup: 1. Upload Competitor_X_Battle_Card.pdf with category sales 2. Create agent with generate_text + query_knowledge_base capabilities 3. Execute task: "Write sales email about our advantages over Competitor X"

Expected Result: - Email contains specific differentiation points from battle card - References actual features/benefits from uploaded doc - Maintains brand voice and tone

Test Scenario 2: AWS Marketplace Listing with Compliance Docs¶

Setup: 1. Upload AWS_Marketplace_Guidelines.pdf with category compliance 2. Upload 3 successful listing examples with category compliance 3. Create agent with marketplace rewriter capabilities + KB 4. Execute task: "Rewrite our AWS Marketplace listing"

Expected Result: - Listing follows AWS guidelines from uploaded docs - Incorporates best practices from successful examples - Meets compliance requirements

Test Scenario 3: Co-Sell Content with Partner Materials¶

Setup: 1. Partner A uploads their product brochure (category: partner) 2. Partner B uploads their positioning guide (category: partner) 3. Create co-sell agent with KB access 4. Execute task: "Generate joint solution brief"

Expected Result: - Solution brief combines info from both partners' docs - Identifies complementary capabilities - Maintains both partners' brand voices

API Reference¶

QueryKnowledgeBaseCapability¶

Parameters:

Parameter	Type	Required	Default	Description
`query`	string	Yes	-	Search query text
`category`	string	No	null	Filter by category
`top_results`	integer	No	5	Number of results to return
`min_similarity`	float	No	0.7	Minimum relevance score (0-1)

Response:

{
  "success": true,
  "results": [
    {
      "id": 123,
      "content": "AWS Marketplace requires...",
      "similarity": 0.89,
      "source_file": "AWS_Guidelines.pdf",
      "category": "compliance",
      "created_at": "2025-11-01 12:00:00"
    }
  ],
  "context_found": true,
  "num_results": 3,
  "query": "AWS Marketplace listing best practices",
  "credits_used": 2
}

Configuration¶

Agent Settings¶

Configure KB behavior in agent settings:

{
  "default_context": {
    "kb_category": "sales",        // Default category filter
    "kb_results": 3,               // Results per query
    "kb_min_similarity": 0.75      // Relevance threshold
  }
}

Per-Request Overrides¶

Override in task parameters:

$agent->execute([
    'content_type' => 'email',
    'topic' => 'Competitive positioning',
    'kb_category' => 'technical',  // Override default category
    'kb_results' => 5,              // Get more results
    'use_knowledge_base' => true    // Explicitly enable/disable
]);

Performance Considerations¶

Vector Search Performance¶

Current: Linear scan O(n) with cosine similarity
Recommended for >10K docs:
Use PostgreSQL with pgvector extension
Or migrate to Pinecone/Weaviate
Or use AWS OpenSearch vector engine

Embedding API Costs¶

Cost per query: ~$0.0001 (OpenAI Ada-002)
Cost per document upload: ~$0.0004 per 1000 tokens
Monthly estimate for 1000 docs + 10K queries: ~$2-3

Context Window Management¶

Default: 3 results × 4000 chars = 12K chars context
Claude 3 Sonnet: 200K context window (plenty of room)
No truncation needed for reasonable KB sizes

Troubleshooting¶

Issue: Knowledge base not queried¶

Possible Causes: 1. Agent doesn't have query_knowledge_base capability enabled 2. No documents uploaded for user/team 3. Documents don't match category filter 4. Similarity threshold too high

Solution:

// Check agent capabilities
$agent = Agent::find($agentId);
dd($agent->capabilities); // Should include 'query_knowledge_base'

// Check documents exist
$docs = PdfData::accessibleBy($userId, $teamId)->count();
dd($docs); // Should be > 0

// Lower similarity threshold
$agent->settings = ['kb_min_similarity' => 0.5];

Issue: Irrelevant results returned¶

Solution: - Increase min_similarity threshold (default 0.7 → 0.8 or 0.9) - Use more specific categories - Improve document chunking (split on sections vs fixed length) - Add more specific queries

Issue: Documents not found for team members¶

Solution:

// Ensure document is marked as team shared
$pdfData = PdfData::find($id);
$pdfData->update([
    'team_id' => $teamId,
    'is_team_shared' => true
]);

Future Enhancements¶

Phase 2 (Planned)¶

Knowledge Base Management UI
Route: /dashboard/user/content-manager/knowledge-base
Features:
- Upload with category selection
- Team sharing toggle
- Usage analytics dashboard
- Search/filter interface
- Document versioning
Enhanced Categorization
Subcategories (sales → battle-cards, case-studies, decks)
Tags/keywords
Auto-categorization using AI
Usage Analytics
Most referenced documents
Category usage breakdown
Agent KB effectiveness metrics
Advanced Search
Hybrid search (vector + keyword)
Multi-query expansion
Re-ranking algorithms

Phase 3 (Future)¶

Enterprise Features
Document access control lists (ACLs)
Audit logging
Document expiration dates
Version history
AI Improvements
Fine-tuned embeddings for domain-specific content
Semantic chunking (vs fixed-length)
Cross-document summarization

Migration from Old System¶

If you have existing PDF chat data:¶

-- Update existing records with default values
UPDATE pdf_data
SET
    user_id = (SELECT user_id FROM user_openai_chats WHERE id = pdf_data.chat_id LIMIT 1),
    category = 'general',
    is_team_shared = false,
    usage_count = 0
WHERE user_id IS NULL;

FAQ¶

Q: Do I need to re-upload documents after migration?¶

A: No, existing documents will work. They'll just have category = NULL until you update them.

A: Yes! All agents for a user/team access the same knowledge base. Category filtering allows specialization.

Q: How do I prevent an agent from using the knowledge base?¶

A: Remove query_knowledge_base from the agent's capabilities array, or set use_knowledge_base: false in task parameters.

Q: What file types are supported?¶

A: Currently PDFs. Future: DOCX, TXT, MD, PPTX, XLSX.

Q: How big can uploaded documents be?¶

A: No hard limit, but large docs are chunked into 4000-char segments. A 100-page PDF might create 50-100 chunks.

Q: Are embeddings recalculated when documents are updated?¶

A: No automatic re-embedding yet. Delete and re-upload for now. Version management coming in Phase 2.

Resources¶

OpenAI Embeddings API: https://platform.openai.com/docs/guides/embeddings
Vector Search Best Practices: https://www.pinecone.io/learn/vector-search/
AWS Bedrock Claude Models: https://docs.aws.amazon.com/bedrock/latest/userguide/claude.html

Support¶

For issues or questions: 1. Check troubleshooting section above 2. Review logs in storage/logs/laravel.log 3. Search for [Query Knowledge Base] log entries 4. Check agent execution history for KB query results

Document Version: 1.0 Last Updated: 2025-11-17 Implementation Status: ✅ Complete