Skip to content

Knowledge Base Implementation Guide

AI-Powered Partner GTM Content Generation

Date: 2025-11-17 (Updated: 2026-01-09) Status: Implementation Complete


Overview

This guide documents the implementation of the Knowledge Base system that allows agents to automatically reference uploaded documents (PDFs, battle cards, AWS docs, etc.) when generating content.

What Was Implemented

  1. QueryKnowledgeBaseCapability - New agent capability for semantic search
  2. Enhanced PdfData Model - Added categorization, team sharing, and usage tracking
  3. GenerateTextCapability Integration - Automatic knowledge base querying during content generation
  4. Database Enhancements - New columns for knowledge base organization

Architecture

How It Works

User uploads document (PDF, battle card, case study)
Content chunked into 3500-4000 char segments
Each chunk converted to 1536-dim embedding vector (OpenAI Ada-002)
Stored in pdf_data table with metadata (category, file_name, user_id, team_id)
Agent receives task: "Create competitive email"
GenerateTextCapability checks if agent has query_knowledge_base capability
If yes: Query knowledge base with topic
Vector similarity search finds top 3-5 most relevant chunks
Chunks injected into AI prompt as reference material
AI generates content citing specific facts from knowledge base
Output: Context-aware, fact-based content

Database Changes

New Migration: 2025_11_17_000002_enhance_pdf_data_for_knowledge_base.php

Adds the following columns to pdf_data table:

Column Type Purpose
user_id integer Owner of the document
team_id integer Team that owns the document (from user's active team)
category string(50) Document type (sales, technical, partner, compliance)
file_name string Original filename for reference
is_team_shared boolean Allow team members to access
usage_count integer Track how often document is referenced
last_used_at timestamp Last time document was queried

Indexes added: - (user_id, category) - Fast category filtering - (team_id, is_team_shared) - Team access queries - last_used_at - Usage analytics

Team-Scoped Document Access

With the team-centric architecture, documents are scoped to teams:

┌─────────────────────────────────────────────────────────────────────┐
│                    DOCUMENT ACCESS MODEL                             │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  User's Active Team (active_team_id)                                │
│           │                                                          │
│           ▼                                                          │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │  Documents visible to user:                                  │    │
│  │                                                              │    │
│  │  1. Own documents (user_id = current_user)                   │    │
│  │  2. Team-shared docs (team_id = active_team AND             │    │
│  │                       is_team_shared = true)                 │    │
│  └─────────────────────────────────────────────────────────────┘    │
│                                                                      │
│  Note: Users belonging to multiple teams only see documents         │
│  from their ACTIVE team context. They can switch teams to           │
│  access other team's shared documents.                              │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Query Logic:

// Get documents accessible by current user in active team context
$activeTeamId = auth()->user()->activeTeam()?->id;

PdfData::where('user_id', auth()->id())
    ->orWhere(function ($query) use ($activeTeamId) {
        $query->where('team_id', $activeTeamId)
              ->where('is_team_shared', true);
    })
    ->get();

Multi-Team Considerations: - Documents uploaded are automatically assigned team_id from user's active team - When user switches teams, they see that team's shared documents - Users can have private documents (not team-shared) that follow them across teams

Run Migration

php artisan migrate

New Capability Registration

Seeder: QueryKnowledgeBaseCapabilitySeeder.php

Registers the query_knowledge_base capability in the database.

Run Seeder:

php artisan db:seed --class=QueryKnowledgeBaseCapabilitySeeder

Capability Details: - Slug: query_knowledge_base - Category: knowledge - Required Params: query - Optional Params: category, top_results, min_similarity - Credits: 2 per query


Code Components

1. QueryKnowledgeBaseCapability

Location: app/Extensions/ContentManager/System/Services/Capabilities/QueryKnowledgeBaseCapability.php

Key Features: - Vector similarity search using OpenAI embeddings - Scoped to user + team-shared documents - Category filtering (sales, technical, partner, compliance) - Configurable similarity threshold (default: 0.7) - Returns top N most relevant chunks (default: 5)

Example Usage:

$capability = app(QueryKnowledgeBaseCapability::class);

$result = $capability->execute([
    'query' => 'AWS Marketplace listing best practices',
    'category' => 'compliance',
    'top_results' => 5,
    'min_similarity' => 0.7,
], $context, $execution);

// $result['results'] contains array of relevant document chunks
// Each with: id, content, similarity, source_file, category

2. Enhanced PdfData Model

Location: app/Models/PdfData.php

New Methods: - recordUsage() - Increment usage counter - scopeCategory($category) - Filter by category - scopeTeamShared($teamId) - Get team's shared docs - scopeAccessibleBy($userId, $teamId) - Get user's accessible docs (own + team shared)

Relationships: - user() - BelongsTo User - team() - BelongsTo Team

3. GenerateTextCapability Integration

Location: app/Extensions/ContentManager/System/Services/Capabilities/GenerateTextCapability.php

Changes: - Added queryKnowledgeBase() method - Updated buildPrompt() to accept KB results - KB context injected at top of prompt (highest priority) - Only queries KB if agent has query_knowledge_base capability enabled

How It Works:

  1. Check if agent has query_knowledge_base in capabilities array
  2. If yes, build query from topic + content type
  3. Execute QueryKnowledgeBaseCapability
  4. Format results as "Reference Material" in prompt
  5. Instruct AI to cite facts from references

Example Prompt Structure:

# Relevant Knowledge Base Content

## Reference 1 (Relevance: 89.5%)
Source: AWS_Marketplace_Guidelines.pdf

[content chunk here...]

---

## Reference 2 (Relevance: 85.2%)
Source: Battle_Card_Competitor_X.pdf

[content chunk here...]

---

IMPORTANT: Use the above reference material when relevant...

# Brand Voice Context
Company: Your Company
Tone: Professional

# Your Task
Generate a medium email about: Competitive positioning against Competitor X
...

Usage Instructions

For Partners/Users

1. Upload Documents to Knowledge Base

Currently, upload via AI PDF Chat interface at: /dashboard/user/openai/generator/ai_pdf/workbook

Categories: - sales - Battle cards, sales decks, case studies, objection handlers - technical - Product docs, architecture diagrams, API specs - partner - Partner materials, joint solution briefs, co-sell docs - compliance - AWS Marketplace guidelines, security compliance docs - general - Anything else

2. Create Agent with Knowledge Base Access

When creating or editing an agent, enable the query_knowledge_base capability:

{
  "name": "Partner Email Writer",
  "capabilities": [
    "generate_text",
    "query_knowledge_base"
  ],
  "settings": {
    "kb_category": "sales",  // Optional: restrict to specific category
    "kb_results": 3,         // Optional: number of references to retrieve
    "kb_min_similarity": 0.7 // Optional: relevance threshold
  }
}

3. Execute Agent Task

Task: "Create a competitive positioning email for prospect Acme Corp highlighting our advantages over Competitor X"

Result: Agent will:
1. Query knowledge base for "competitive email Competitor X"
2. Find relevant battle cards, case studies
3. Generate email citing specific differentiation points
4. Include case study references where applicable

Testing

Test Scenario 1: Sales Email with Battle Card Reference

Setup: 1. Upload Competitor_X_Battle_Card.pdf with category sales 2. Create agent with generate_text + query_knowledge_base capabilities 3. Execute task: "Write sales email about our advantages over Competitor X"

Expected Result: - Email contains specific differentiation points from battle card - References actual features/benefits from uploaded doc - Maintains brand voice and tone

Test Scenario 2: AWS Marketplace Listing with Compliance Docs

Setup: 1. Upload AWS_Marketplace_Guidelines.pdf with category compliance 2. Upload 3 successful listing examples with category compliance 3. Create agent with marketplace rewriter capabilities + KB 4. Execute task: "Rewrite our AWS Marketplace listing"

Expected Result: - Listing follows AWS guidelines from uploaded docs - Incorporates best practices from successful examples - Meets compliance requirements

Test Scenario 3: Co-Sell Content with Partner Materials

Setup: 1. Partner A uploads their product brochure (category: partner) 2. Partner B uploads their positioning guide (category: partner) 3. Create co-sell agent with KB access 4. Execute task: "Generate joint solution brief"

Expected Result: - Solution brief combines info from both partners' docs - Identifies complementary capabilities - Maintains both partners' brand voices


API Reference

QueryKnowledgeBaseCapability

Parameters:

Parameter Type Required Default Description
query string Yes - Search query text
category string No null Filter by category
top_results integer No 5 Number of results to return
min_similarity float No 0.7 Minimum relevance score (0-1)

Response:

{
  "success": true,
  "results": [
    {
      "id": 123,
      "content": "AWS Marketplace requires...",
      "similarity": 0.89,
      "source_file": "AWS_Guidelines.pdf",
      "category": "compliance",
      "created_at": "2025-11-01 12:00:00"
    }
  ],
  "context_found": true,
  "num_results": 3,
  "query": "AWS Marketplace listing best practices",
  "credits_used": 2
}

Configuration

Agent Settings

Configure KB behavior in agent settings:

{
  "default_context": {
    "kb_category": "sales",        // Default category filter
    "kb_results": 3,               // Results per query
    "kb_min_similarity": 0.75      // Relevance threshold
  }
}

Per-Request Overrides

Override in task parameters:

$agent->execute([
    'content_type' => 'email',
    'topic' => 'Competitive positioning',
    'kb_category' => 'technical',  // Override default category
    'kb_results' => 5,              // Get more results
    'use_knowledge_base' => true    // Explicitly enable/disable
]);

Performance Considerations

Vector Search Performance

  • Current: Linear scan O(n) with cosine similarity
  • Recommended for >10K docs:
  • Use PostgreSQL with pgvector extension
  • Or migrate to Pinecone/Weaviate
  • Or use AWS OpenSearch vector engine

Embedding API Costs

  • Cost per query: ~$0.0001 (OpenAI Ada-002)
  • Cost per document upload: ~$0.0004 per 1000 tokens
  • Monthly estimate for 1000 docs + 10K queries: ~$2-3

Context Window Management

  • Default: 3 results × 4000 chars = 12K chars context
  • Claude 3 Sonnet: 200K context window (plenty of room)
  • No truncation needed for reasonable KB sizes

Troubleshooting

Issue: Knowledge base not queried

Possible Causes: 1. Agent doesn't have query_knowledge_base capability enabled 2. No documents uploaded for user/team 3. Documents don't match category filter 4. Similarity threshold too high

Solution:

// Check agent capabilities
$agent = Agent::find($agentId);
dd($agent->capabilities); // Should include 'query_knowledge_base'

// Check documents exist
$docs = PdfData::accessibleBy($userId, $teamId)->count();
dd($docs); // Should be > 0

// Lower similarity threshold
$agent->settings = ['kb_min_similarity' => 0.5];

Issue: Irrelevant results returned

Solution: - Increase min_similarity threshold (default 0.7 → 0.8 or 0.9) - Use more specific categories - Improve document chunking (split on sections vs fixed length) - Add more specific queries

Issue: Documents not found for team members

Solution:

// Ensure document is marked as team shared
$pdfData = PdfData::find($id);
$pdfData->update([
    'team_id' => $teamId,
    'is_team_shared' => true
]);


Future Enhancements

Phase 2 (Planned)

  1. Knowledge Base Management UI
  2. Route: /dashboard/user/content-manager/knowledge-base
  3. Features:

    • Upload with category selection
    • Team sharing toggle
    • Usage analytics dashboard
    • Search/filter interface
    • Document versioning
  4. Enhanced Categorization

  5. Subcategories (sales → battle-cards, case-studies, decks)
  6. Tags/keywords
  7. Auto-categorization using AI

  8. Usage Analytics

  9. Most referenced documents
  10. Category usage breakdown
  11. Agent KB effectiveness metrics

  12. Advanced Search

  13. Hybrid search (vector + keyword)
  14. Multi-query expansion
  15. Re-ranking algorithms

Phase 3 (Future)

  1. Enterprise Features
  2. Document access control lists (ACLs)
  3. Audit logging
  4. Document expiration dates
  5. Version history

  6. AI Improvements

  7. Fine-tuned embeddings for domain-specific content
  8. Semantic chunking (vs fixed-length)
  9. Cross-document summarization

Migration from Old System

If you have existing PDF chat data:

-- Update existing records with default values
UPDATE pdf_data
SET
    user_id = (SELECT user_id FROM user_openai_chats WHERE id = pdf_data.chat_id LIMIT 1),
    category = 'general',
    is_team_shared = false,
    usage_count = 0
WHERE user_id IS NULL;

FAQ

Q: Do I need to re-upload documents after migration?

A: No, existing documents will work. They'll just have category = NULL until you update them.

Q: Can multiple agents share the same knowledge base?

A: Yes! All agents for a user/team access the same knowledge base. Category filtering allows specialization.

Q: How do I prevent an agent from using the knowledge base?

A: Remove query_knowledge_base from the agent's capabilities array, or set use_knowledge_base: false in task parameters.

Q: What file types are supported?

A: Currently PDFs. Future: DOCX, TXT, MD, PPTX, XLSX.

Q: How big can uploaded documents be?

A: No hard limit, but large docs are chunked into 4000-char segments. A 100-page PDF might create 50-100 chunks.

Q: Are embeddings recalculated when documents are updated?

A: No automatic re-embedding yet. Delete and re-upload for now. Version management coming in Phase 2.


Resources

  • OpenAI Embeddings API: https://platform.openai.com/docs/guides/embeddings
  • Vector Search Best Practices: https://www.pinecone.io/learn/vector-search/
  • AWS Bedrock Claude Models: https://docs.aws.amazon.com/bedrock/latest/userguide/claude.html

Support

For issues or questions: 1. Check troubleshooting section above 2. Review logs in storage/logs/laravel.log 3. Search for [Query Knowledge Base] log entries 4. Check agent execution history for KB query results


Document Version: 1.0 Last Updated: 2025-11-17 Implementation Status: ✅ Complete