Knowledge Base Implementation Guide¶
AI-Powered Partner GTM Content Generation¶
Date: 2025-11-17 (Updated: 2026-01-09) Status: Implementation Complete
Overview¶
This guide documents the implementation of the Knowledge Base system that allows agents to automatically reference uploaded documents (PDFs, battle cards, AWS docs, etc.) when generating content.
What Was Implemented¶
- QueryKnowledgeBaseCapability - New agent capability for semantic search
- Enhanced PdfData Model - Added categorization, team sharing, and usage tracking
- GenerateTextCapability Integration - Automatic knowledge base querying during content generation
- Database Enhancements - New columns for knowledge base organization
Architecture¶
How It Works¶
User uploads document (PDF, battle card, case study)
↓
Content chunked into 3500-4000 char segments
↓
Each chunk converted to 1536-dim embedding vector (OpenAI Ada-002)
↓
Stored in pdf_data table with metadata (category, file_name, user_id, team_id)
↓
Agent receives task: "Create competitive email"
↓
GenerateTextCapability checks if agent has query_knowledge_base capability
↓
If yes: Query knowledge base with topic
↓
Vector similarity search finds top 3-5 most relevant chunks
↓
Chunks injected into AI prompt as reference material
↓
AI generates content citing specific facts from knowledge base
↓
Output: Context-aware, fact-based content
Database Changes¶
New Migration: 2025_11_17_000002_enhance_pdf_data_for_knowledge_base.php¶
Adds the following columns to pdf_data table:
| Column | Type | Purpose |
|---|---|---|
user_id |
integer | Owner of the document |
team_id |
integer | Team that owns the document (from user's active team) |
category |
string(50) | Document type (sales, technical, partner, compliance) |
file_name |
string | Original filename for reference |
is_team_shared |
boolean | Allow team members to access |
usage_count |
integer | Track how often document is referenced |
last_used_at |
timestamp | Last time document was queried |
Indexes added:
- (user_id, category) - Fast category filtering
- (team_id, is_team_shared) - Team access queries
- last_used_at - Usage analytics
Team-Scoped Document Access¶
With the team-centric architecture, documents are scoped to teams:
┌─────────────────────────────────────────────────────────────────────┐
│ DOCUMENT ACCESS MODEL │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ User's Active Team (active_team_id) │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Documents visible to user: │ │
│ │ │ │
│ │ 1. Own documents (user_id = current_user) │ │
│ │ 2. Team-shared docs (team_id = active_team AND │ │
│ │ is_team_shared = true) │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ Note: Users belonging to multiple teams only see documents │
│ from their ACTIVE team context. They can switch teams to │
│ access other team's shared documents. │
│ │
└─────────────────────────────────────────────────────────────────────┘
Query Logic:
// Get documents accessible by current user in active team context
$activeTeamId = auth()->user()->activeTeam()?->id;
PdfData::where('user_id', auth()->id())
->orWhere(function ($query) use ($activeTeamId) {
$query->where('team_id', $activeTeamId)
->where('is_team_shared', true);
})
->get();
Multi-Team Considerations:
- Documents uploaded are automatically assigned team_id from user's active team
- When user switches teams, they see that team's shared documents
- Users can have private documents (not team-shared) that follow them across teams
Run Migration¶
New Capability Registration¶
Seeder: QueryKnowledgeBaseCapabilitySeeder.php¶
Registers the query_knowledge_base capability in the database.
Run Seeder:
Capability Details:
- Slug: query_knowledge_base
- Category: knowledge
- Required Params: query
- Optional Params: category, top_results, min_similarity
- Credits: 2 per query
Code Components¶
1. QueryKnowledgeBaseCapability¶
Location: app/Extensions/ContentManager/System/Services/Capabilities/QueryKnowledgeBaseCapability.php
Key Features: - Vector similarity search using OpenAI embeddings - Scoped to user + team-shared documents - Category filtering (sales, technical, partner, compliance) - Configurable similarity threshold (default: 0.7) - Returns top N most relevant chunks (default: 5)
Example Usage:
$capability = app(QueryKnowledgeBaseCapability::class);
$result = $capability->execute([
'query' => 'AWS Marketplace listing best practices',
'category' => 'compliance',
'top_results' => 5,
'min_similarity' => 0.7,
], $context, $execution);
// $result['results'] contains array of relevant document chunks
// Each with: id, content, similarity, source_file, category
2. Enhanced PdfData Model¶
Location: app/Models/PdfData.php
New Methods:
- recordUsage() - Increment usage counter
- scopeCategory($category) - Filter by category
- scopeTeamShared($teamId) - Get team's shared docs
- scopeAccessibleBy($userId, $teamId) - Get user's accessible docs (own + team shared)
Relationships:
- user() - BelongsTo User
- team() - BelongsTo Team
3. GenerateTextCapability Integration¶
Location: app/Extensions/ContentManager/System/Services/Capabilities/GenerateTextCapability.php
Changes:
- Added queryKnowledgeBase() method
- Updated buildPrompt() to accept KB results
- KB context injected at top of prompt (highest priority)
- Only queries KB if agent has query_knowledge_base capability enabled
How It Works:
- Check if agent has
query_knowledge_basein capabilities array - If yes, build query from topic + content type
- Execute QueryKnowledgeBaseCapability
- Format results as "Reference Material" in prompt
- Instruct AI to cite facts from references
Example Prompt Structure:
# Relevant Knowledge Base Content
## Reference 1 (Relevance: 89.5%)
Source: AWS_Marketplace_Guidelines.pdf
[content chunk here...]
---
## Reference 2 (Relevance: 85.2%)
Source: Battle_Card_Competitor_X.pdf
[content chunk here...]
---
IMPORTANT: Use the above reference material when relevant...
# Brand Voice Context
Company: Your Company
Tone: Professional
# Your Task
Generate a medium email about: Competitive positioning against Competitor X
...
Usage Instructions¶
For Partners/Users¶
1. Upload Documents to Knowledge Base¶
Currently, upload via AI PDF Chat interface at:
/dashboard/user/openai/generator/ai_pdf/workbook
Categories:
- sales - Battle cards, sales decks, case studies, objection handlers
- technical - Product docs, architecture diagrams, API specs
- partner - Partner materials, joint solution briefs, co-sell docs
- compliance - AWS Marketplace guidelines, security compliance docs
- general - Anything else
2. Create Agent with Knowledge Base Access¶
When creating or editing an agent, enable the query_knowledge_base capability:
{
"name": "Partner Email Writer",
"capabilities": [
"generate_text",
"query_knowledge_base"
],
"settings": {
"kb_category": "sales", // Optional: restrict to specific category
"kb_results": 3, // Optional: number of references to retrieve
"kb_min_similarity": 0.7 // Optional: relevance threshold
}
}
3. Execute Agent Task¶
Task: "Create a competitive positioning email for prospect Acme Corp highlighting our advantages over Competitor X"
Result: Agent will:
1. Query knowledge base for "competitive email Competitor X"
2. Find relevant battle cards, case studies
3. Generate email citing specific differentiation points
4. Include case study references where applicable
Testing¶
Test Scenario 1: Sales Email with Battle Card Reference¶
Setup:
1. Upload Competitor_X_Battle_Card.pdf with category sales
2. Create agent with generate_text + query_knowledge_base capabilities
3. Execute task: "Write sales email about our advantages over Competitor X"
Expected Result: - Email contains specific differentiation points from battle card - References actual features/benefits from uploaded doc - Maintains brand voice and tone
Test Scenario 2: AWS Marketplace Listing with Compliance Docs¶
Setup:
1. Upload AWS_Marketplace_Guidelines.pdf with category compliance
2. Upload 3 successful listing examples with category compliance
3. Create agent with marketplace rewriter capabilities + KB
4. Execute task: "Rewrite our AWS Marketplace listing"
Expected Result: - Listing follows AWS guidelines from uploaded docs - Incorporates best practices from successful examples - Meets compliance requirements
Test Scenario 3: Co-Sell Content with Partner Materials¶
Setup:
1. Partner A uploads their product brochure (category: partner)
2. Partner B uploads their positioning guide (category: partner)
3. Create co-sell agent with KB access
4. Execute task: "Generate joint solution brief"
Expected Result: - Solution brief combines info from both partners' docs - Identifies complementary capabilities - Maintains both partners' brand voices
API Reference¶
QueryKnowledgeBaseCapability¶
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
query |
string | Yes | - | Search query text |
category |
string | No | null | Filter by category |
top_results |
integer | No | 5 | Number of results to return |
min_similarity |
float | No | 0.7 | Minimum relevance score (0-1) |
Response:
{
"success": true,
"results": [
{
"id": 123,
"content": "AWS Marketplace requires...",
"similarity": 0.89,
"source_file": "AWS_Guidelines.pdf",
"category": "compliance",
"created_at": "2025-11-01 12:00:00"
}
],
"context_found": true,
"num_results": 3,
"query": "AWS Marketplace listing best practices",
"credits_used": 2
}
Configuration¶
Agent Settings¶
Configure KB behavior in agent settings:
{
"default_context": {
"kb_category": "sales", // Default category filter
"kb_results": 3, // Results per query
"kb_min_similarity": 0.75 // Relevance threshold
}
}
Per-Request Overrides¶
Override in task parameters:
$agent->execute([
'content_type' => 'email',
'topic' => 'Competitive positioning',
'kb_category' => 'technical', // Override default category
'kb_results' => 5, // Get more results
'use_knowledge_base' => true // Explicitly enable/disable
]);
Performance Considerations¶
Vector Search Performance¶
- Current: Linear scan O(n) with cosine similarity
- Recommended for >10K docs:
- Use PostgreSQL with
pgvectorextension - Or migrate to Pinecone/Weaviate
- Or use AWS OpenSearch vector engine
Embedding API Costs¶
- Cost per query: ~$0.0001 (OpenAI Ada-002)
- Cost per document upload: ~$0.0004 per 1000 tokens
- Monthly estimate for 1000 docs + 10K queries: ~$2-3
Context Window Management¶
- Default: 3 results × 4000 chars = 12K chars context
- Claude 3 Sonnet: 200K context window (plenty of room)
- No truncation needed for reasonable KB sizes
Troubleshooting¶
Issue: Knowledge base not queried¶
Possible Causes:
1. Agent doesn't have query_knowledge_base capability enabled
2. No documents uploaded for user/team
3. Documents don't match category filter
4. Similarity threshold too high
Solution:
// Check agent capabilities
$agent = Agent::find($agentId);
dd($agent->capabilities); // Should include 'query_knowledge_base'
// Check documents exist
$docs = PdfData::accessibleBy($userId, $teamId)->count();
dd($docs); // Should be > 0
// Lower similarity threshold
$agent->settings = ['kb_min_similarity' => 0.5];
Issue: Irrelevant results returned¶
Solution:
- Increase min_similarity threshold (default 0.7 → 0.8 or 0.9)
- Use more specific categories
- Improve document chunking (split on sections vs fixed length)
- Add more specific queries
Issue: Documents not found for team members¶
Solution:
// Ensure document is marked as team shared
$pdfData = PdfData::find($id);
$pdfData->update([
'team_id' => $teamId,
'is_team_shared' => true
]);
Future Enhancements¶
Phase 2 (Planned)¶
- Knowledge Base Management UI
- Route:
/dashboard/user/content-manager/knowledge-base -
Features:
- Upload with category selection
- Team sharing toggle
- Usage analytics dashboard
- Search/filter interface
- Document versioning
-
Enhanced Categorization
- Subcategories (sales → battle-cards, case-studies, decks)
- Tags/keywords
-
Auto-categorization using AI
-
Usage Analytics
- Most referenced documents
- Category usage breakdown
-
Agent KB effectiveness metrics
-
Advanced Search
- Hybrid search (vector + keyword)
- Multi-query expansion
- Re-ranking algorithms
Phase 3 (Future)¶
- Enterprise Features
- Document access control lists (ACLs)
- Audit logging
- Document expiration dates
-
Version history
-
AI Improvements
- Fine-tuned embeddings for domain-specific content
- Semantic chunking (vs fixed-length)
- Cross-document summarization
Migration from Old System¶
If you have existing PDF chat data:¶
-- Update existing records with default values
UPDATE pdf_data
SET
user_id = (SELECT user_id FROM user_openai_chats WHERE id = pdf_data.chat_id LIMIT 1),
category = 'general',
is_team_shared = false,
usage_count = 0
WHERE user_id IS NULL;
FAQ¶
Q: Do I need to re-upload documents after migration?¶
A: No, existing documents will work. They'll just have category = NULL until you update them.
Q: Can multiple agents share the same knowledge base?¶
A: Yes! All agents for a user/team access the same knowledge base. Category filtering allows specialization.
Q: How do I prevent an agent from using the knowledge base?¶
A: Remove query_knowledge_base from the agent's capabilities array, or set use_knowledge_base: false in task parameters.
Q: What file types are supported?¶
A: Currently PDFs. Future: DOCX, TXT, MD, PPTX, XLSX.
Q: How big can uploaded documents be?¶
A: No hard limit, but large docs are chunked into 4000-char segments. A 100-page PDF might create 50-100 chunks.
Q: Are embeddings recalculated when documents are updated?¶
A: No automatic re-embedding yet. Delete and re-upload for now. Version management coming in Phase 2.
Resources¶
- OpenAI Embeddings API: https://platform.openai.com/docs/guides/embeddings
- Vector Search Best Practices: https://www.pinecone.io/learn/vector-search/
- AWS Bedrock Claude Models: https://docs.aws.amazon.com/bedrock/latest/userguide/claude.html
Support¶
For issues or questions:
1. Check troubleshooting section above
2. Review logs in storage/logs/laravel.log
3. Search for [Query Knowledge Base] log entries
4. Check agent execution history for KB query results
Document Version: 1.0 Last Updated: 2025-11-17 Implementation Status: ✅ Complete