Documentation Reorganization Plan¶
RAG-Optimized Knowledge Architecture¶
Created: 2026-01-09 Status: Planning Goal: Transform scattered documentation into a structured knowledge base optimized for both human navigation and AI-powered retrieval (RAG/Chatbot training)
Executive Summary¶
Current State¶
- 103 markdown files in
/docs - Mixed naming conventions (CAPS internal vs lowercase user-facing)
- No consistent metadata for RAG indexing
- Scattered across root, extensions, and subdirectories
- Duplicate information across multiple files
- No Q&A extraction for chatbot training
Target State¶
- Hierarchical taxonomy with clear categories
- Standardized frontmatter for RAG metadata
- Extracted Q&A pairs for chatbot training
- Audience-tagged content (partner vs internal vs admin)
- Single source of truth per topic
- Automated sync to Bedrock Knowledge Base
Part 1: Documentation Taxonomy¶
Proposed Directory Structure¶
docs/
├── index.md # Documentation home
├── manifest.json # RAG metadata index
│
├── getting-started/ # Onboarding (Partner-facing)
│ ├── index.md
│ ├── quick-start.md
│ ├── first-listing.md
│ └── account-setup.md
│
├── features/ # Feature Documentation (Partner-facing)
│ ├── index.md
│ ├── marketplace-readiness/
│ │ ├── solution-discovery.md
│ │ ├── solution-brief-generator.md
│ │ ├── icp-builder.md
│ │ ├── listing-optimizer.md
│ │ └── launch-readiness-score.md
│ ├── content-studio/
│ │ ├── listing-content-studio.md
│ │ ├── offering-analyzer.md
│ │ ├── agreement-analyzer.md
│ │ └── product-description-wizard.md
│ ├── marketing-social/
│ │ ├── campaign-dashboard.md
│ │ ├── social-media-posts.md
│ │ ├── video-intelligence.md
│ │ ├── case-study-generator.md
│ │ └── linkedin-insights.md # ✅ Created
│ ├── cosell-hub/
│ │ ├── index.md
│ │ ├── partner-discovery.md
│ │ ├── joint-gtm-plans.md
│ │ ├── ace-opportunity-sync.md
│ │ ├── cppo-proposals.md
│ │ └── partnership-analytics.md
│ ├── aws-connections/
│ │ ├── index.md
│ │ ├── setup-wizard.md
│ │ ├── marketplace-listings.md
│ │ └── pipeline-tracking.md
│ ├── ai-agents/
│ │ ├── index.md
│ │ ├── agent-builder.md
│ │ ├── available-capabilities.md
│ │ └── execution-history.md
│ ├── gtm-workflows/
│ │ ├── index.md
│ │ ├── visual-creator.md
│ │ ├── content-strategist.md
│ │ ├── market-analyst.md
│ │ ├── technical-builder.md
│ │ └── solution-advisor.md
│ └── optimization-lab/
│ ├── ab-testing.md
│ ├── content-variants.md
│ └── performance-simulator.md
│
├── guides/ # How-To Guides (Partner-facing)
│ ├── index.md
│ ├── launching-on-aws-marketplace.md
│ ├── optimizing-listing-seo.md
│ ├── setting-up-cosell-partnerships.md
│ ├── connecting-linkedin.md
│ ├── generating-case-studies.md
│ └── using-brand-voice.md
│
├── comparisons/ # DIY vs Vellocity (Partner-facing)
│ ├── index.md
│ ├── build-vs-buy-overview.md
│ ├── linkedin-analysis-comparison.md
│ ├── listing-optimization-comparison.md
│ ├── cosell-management-comparison.md
│ └── content-generation-comparison.md
│
├── api/ # API Documentation (Developer-facing)
│ ├── index.md
│ ├── authentication.md
│ ├── partner-api.md
│ ├── cosell-api.md
│ ├── webhooks.md
│ └── rate-limits.md
│
├── integrations/ # Third-Party Integrations
│ ├── index.md
│ ├── aws-marketplace.md
│ ├── aws-partner-central.md
│ ├── linkedin.md
│ ├── hubspot.md
│ └── knowledge-bases.md
│
├── admin/ # Admin Documentation (Internal)
│ ├── index.md
│ ├── user-management.md
│ ├── guardrails.md
│ ├── knowledge-bases.md
│ └── capability-configuration.md
│
├── architecture/ # Technical Architecture (Internal)
│ ├── index.md
│ ├── aws-deployment.md
│ ├── bedrock-integration.md
│ ├── data-flow.md
│ └── security-model.md
│
├── internal/ # Internal Only (Not for RAG)
│ ├── patents/
│ ├── audits/
│ ├── migrations/
│ └── roadmap/
│
├── faq/ # FAQ for Chatbot Training
│ ├── index.md
│ ├── general.md
│ ├── pricing.md
│ ├── features.md
│ ├── aws-integration.md
│ └── troubleshooting.md
│
└── changelog.md # Release notes
Part 2: RAG-Optimized Document Format¶
Frontmatter Schema¶
Every document should include standardized frontmatter for RAG indexing:
---
# Document Metadata (Required)
title: "LinkedIn Insights"
slug: "linkedin-insights"
description: "Transform LinkedIn activity into GTM intelligence for AWS Marketplace partners"
category: "features/marketing-social"
version: "1.0"
last_updated: "2026-01-09"
# RAG Optimization (Required)
audience: ["partner", "customer"] # partner | customer | admin | developer | internal
keywords: ["linkedin", "social", "analytics", "icp", "attribution", "gtm"]
capability_slug: "linkedin_graph_analysis" # Links to capability system
plan_tier: "command_plus" # free | launch | scale | command | command_plus
# Chatbot Training (Optional)
qa_pairs:
- q: "What is LinkedIn Insights?"
a: "LinkedIn Insights analyzes your LinkedIn activity to provide GTM intelligence including content performance, audience alignment, and AWS Marketplace attribution."
- q: "How much does LinkedIn Insights cost?"
a: "LinkedIn Insights is included in the Command+ plan at $799/month. Analysis costs 5-15 credits depending on the type."
- q: "Can LinkedIn Insights show which posts drive AWS Marketplace trials?"
a: "Yes, the Content-to-Marketplace Correlation analysis shows which LinkedIn posts correlate with PDP traffic spikes, demo requests, and Private Offer inquiries."
# Navigation (Optional)
parent: "features/marketing-social"
siblings: ["video-intelligence", "case-study-generator", "campaign-dashboard"]
related: ["cosell-hub/partner-discovery", "guides/connecting-linkedin"]
# Status (Optional)
status: "published" # draft | review | published | deprecated
feature_status: "ga" # beta | ga | deprecated
---
Part 3: Chatbot Training Data Extraction¶
Strategy: Multi-Layer Training¶
┌─────────────────────────────────────────────────────────────────┐
│ CHATBOT KNOWLEDGE LAYERS │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Layer 1: Structured Q&A (Highest Priority) │
│ ├── Extracted from frontmatter qa_pairs │
│ ├── Manually curated FAQ documents │
│ └── High-confidence, direct answers │
│ │
│ Layer 2: Document Chunks (RAG Retrieval) │
│ ├── Full document content, chunked by section │
│ ├── Semantic search for relevant context │
│ └── Used when no direct Q&A match │
│ │
│ Layer 3: Capability Metadata (Structured Data) │
│ ├── Feature names, descriptions, pricing │
│ ├── Plan tier requirements │
│ └── Credit costs and limits │
│ │
└─────────────────────────────────────────────────────────────────┘
Q&A Extraction Sources¶
| Source | Type | Priority | Volume |
|---|---|---|---|
Frontmatter qa_pairs |
Explicit Q&A | 1 (Highest) | ~5-10 per doc |
| FAQ documents | Explicit Q&A | 1 | ~50-100 total |
| Document headings | Implicit Q (heading → Q, content → A) | 2 | Auto-extracted |
| Tables | Structured data | 2 | Auto-extracted |
| Comparison sections | DIY vs Vellocity | 2 | ~20-30 |
Automated Q&A Extraction Script¶
Create /scripts/extract-qa-for-chatbot.php:
<?php
// Extracts Q&A pairs from documentation for chatbot training
class DocumentationQAExtractor
{
public function extractFromFrontmatter(string $path): array
{
// Parse YAML frontmatter
// Return qa_pairs array
}
public function extractFromHeadings(string $content): array
{
// Convert H2/H3 headings to questions
// "## How It Works" → "How does [feature] work?"
}
public function extractFromFAQ(string $path): array
{
// Parse explicit FAQ format
// **Q:** Question → **A:** Answer
}
public function generateTrainingData(): array
{
// Combine all sources
// Deduplicate
// Format for ChatbotTrainingController
}
}
Part 4: Migration Plan¶
Phase 1: Audit & Categorize (Week 1)¶
| Task | Files | Effort |
|---|---|---|
| Categorize existing 103 docs | All | 4 hours |
| Identify duplicates | ~20 expected | 2 hours |
| Map docs to new taxonomy | All | 2 hours |
| Identify gaps | New docs needed | 1 hour |
Deliverable: docs/internal/migration/categorization-map.json
Phase 2: Create New Structure (Week 1-2)¶
| Task | Files | Effort |
|---|---|---|
| Create directory structure | 15 directories | 1 hour |
| Create index.md files | 15 files | 3 hours |
| Move partner-facing docs | ~40 files | 2 hours |
| Move internal docs | ~30 files | 1 hour |
| Archive deprecated docs | ~10 files | 1 hour |
Deliverable: New directory structure with moved files
Phase 3: Add Frontmatter (Week 2)¶
| Task | Files | Effort |
|---|---|---|
| Add frontmatter to partner docs | ~50 files | 8 hours |
| Add frontmatter to internal docs | ~30 files | 4 hours |
| Add Q&A pairs to key docs | ~20 files | 4 hours |
| Create manifest.json | 1 file | 2 hours |
Deliverable: All docs have standardized frontmatter
Phase 4: Q&A Extraction (Week 2-3)¶
| Task | Effort |
|---|---|
| Build extraction script | 4 hours |
| Run extraction on all docs | 1 hour |
| Review and curate Q&A pairs | 4 hours |
| Import to chatbot training | 2 hours |
Deliverable: Q&A training data ready for chatbot
Phase 5: Bedrock Knowledge Base Sync (Week 3)¶
| Task | Effort |
|---|---|
| Configure S3 sync for docs | 2 hours |
| Set up Bedrock KB data source | 2 hours |
| Test RAG retrieval | 2 hours |
| Tune chunking strategy | 2 hours |
Deliverable: Documentation synced to Bedrock KB
Part 5: RAG Retrieval Architecture¶
Integration with Existing Chatbot Pro¶
┌─────────────────────────────────────────────────────────────────┐
│ CHATBOT PRO + DOCS RAG │
├─────────────────────────────────────────────────────────────────┤
│ │
│ User Query: "How do I set up LinkedIn Insights?" │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Query Classification │ │
│ │ - Is this a FAQ? → Check Q&A pairs first │ │
│ │ - Is this feature-specific? → Check capability docs │ │
│ │ - Is this how-to? → Check guides │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Bedrock Knowledge Base Query │ │
│ │ - Semantic search across documentation │ │
│ │ - Filter by audience (partner/admin/internal) │ │
│ │ - Boost by keyword match │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Context Assembly │ │
│ │ - Top 3-5 relevant chunks │ │
│ │ - Q&A pairs if matched │ │
│ │ - Capability metadata │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Claude Response Generation │ │
│ │ - Grounded in retrieved context │ │
│ │ - Cites sources (doc links) │ │
│ │ - Follows brand voice │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Bedrock KB Configuration¶
{
"knowledgeBase": {
"name": "vellocity-documentation",
"description": "Vellocity platform documentation for partner support",
"dataSource": {
"type": "S3",
"s3Configuration": {
"bucketArn": "arn:aws:s3:::vellocity-docs",
"inclusionPrefixes": [
"docs/features/",
"docs/guides/",
"docs/faq/",
"docs/comparisons/",
"docs/integrations/"
],
"exclusionPrefixes": [
"docs/internal/",
"docs/architecture/"
]
}
},
"chunkingConfiguration": {
"chunkingStrategy": "HIERARCHICAL",
"hierarchicalChunkingConfiguration": {
"levelConfigurations": [
{ "maxTokens": 1500 }, // Parent chunks
{ "maxTokens": 300 } // Child chunks
],
"overlapTokens": 60
}
},
"embeddingModel": "amazon.titan-embed-text-v2:0"
}
}
Part 6: Document Templates¶
Feature Documentation Template¶
---
title: "[Feature Name]"
slug: "[feature-slug]"
description: "[One-line description]"
category: "features/[subcategory]"
audience: ["partner", "customer"]
keywords: ["keyword1", "keyword2"]
capability_slug: "[capability_slug]"
plan_tier: "[tier]"
qa_pairs:
- q: "What is [Feature Name]?"
a: "[Brief answer]"
- q: "How much does [Feature Name] cost?"
a: "[Pricing answer]"
---
# [Feature Name]
> **Capability:** `[capability_slug]`
> **Plan Tier:** [Tier Name]
> **Credit Cost:** [X] credits
---
## Overview
[2-3 paragraphs explaining the feature and the problem it solves]
---
## Key Capabilities
### 1. [Capability 1]
[Description]
**What You Get:**
- Point 1
- Point 2
---
## Why This Matters for AWS Marketplace Partners
[Business value explanation]
---
## DIY vs. Vellocity Comparison
| Component | DIY Cost | DIY Time | Vellocity |
|-----------|----------|----------|-----------|
| [Component 1] | $X | Y weeks | Included |
---
## Getting Started
### Prerequisites
1. [Prereq 1]
2. [Prereq 2]
### Step 1: [First Step]
[Instructions]
---
## Frequently Asked Questions
**Q: [Question 1]?**
A: [Answer 1]
**Q: [Question 2]?**
A: [Answer 2]
---
## Related Documentation
- [Related Doc 1](link)
- [Related Doc 2](link)
---
*Last Updated: [Date]*
Part 7: Manifest File (RAG Index)¶
Create docs/manifest.json for programmatic access:
{
"version": "1.0",
"generated": "2026-01-09T12:00:00Z",
"total_documents": 85,
"categories": {
"features": {
"count": 35,
"subcategories": ["marketplace-readiness", "content-studio", "marketing-social", "cosell-hub", "aws-connections", "ai-agents", "gtm-workflows", "optimization-lab"]
},
"guides": { "count": 10 },
"comparisons": { "count": 6 },
"faq": { "count": 6 },
"api": { "count": 5 },
"integrations": { "count": 6 }
},
"documents": [
{
"path": "features/marketing-social/linkedin-insights.md",
"title": "LinkedIn Insights",
"slug": "linkedin-insights",
"audience": ["partner", "customer"],
"capability_slug": "linkedin_graph_analysis",
"plan_tier": "command_plus",
"keywords": ["linkedin", "social", "analytics"],
"qa_count": 12,
"word_count": 2500,
"last_updated": "2026-01-09"
}
// ... more documents
],
"qa_pairs_total": 450,
"capabilities_documented": 32
}
Part 8: Sync to Chatbot Training¶
Automated Sync Command¶
Create artisan docs:sync-to-chatbot:
// app/Console/Commands/SyncDocsToKnowledgeBase.php
class SyncDocsToKnowledgeBase extends Command
{
protected $signature = 'docs:sync-to-chatbot
{--chatbot= : Target chatbot ID}
{--audience= : Filter by audience (partner/admin/all)}
{--dry-run : Preview without importing}';
public function handle()
{
// 1. Parse all docs with frontmatter
// 2. Extract Q&A pairs
// 3. Chunk document content
// 4. Import to ChatbotData table
// 5. Trigger KnowledgeBaseHelper::buildKnowledgeBase()
}
}
Usage¶
# Sync partner-facing docs to support chatbot
php artisan docs:sync-to-chatbot --chatbot=1 --audience=partner
# Preview what would be synced
php artisan docs:sync-to-chatbot --chatbot=1 --audience=partner --dry-run
# Sync everything
php artisan docs:sync-to-chatbot --chatbot=1 --audience=all
Part 9: Success Metrics¶
Documentation Quality¶
| Metric | Current | Target |
|---|---|---|
| Docs with frontmatter | 0% | 100% |
| Docs with Q&A pairs | 0% | 80% |
| Duplicate documents | ~15 | 0 |
| Missing feature docs | ~10 | 0 |
| Average doc freshness | Unknown | < 30 days |
RAG Performance¶
| Metric | Target |
|---|---|
| Query relevance (top-3) | > 85% |
| Answer accuracy | > 90% |
| Response time | < 3 seconds |
| Fallback rate (no answer) | < 10% |
Chatbot Training¶
| Metric | Target |
|---|---|
| Q&A pairs extracted | > 400 |
| Training data coverage | All features |
| Partner question resolution | > 80% |
Appendix: File Migration Mapping¶
High-Priority Moves¶
| Current Location | New Location | Action |
|---|---|---|
CAPABILITIES_MATRIX_CURRENT.md |
internal/capabilities-matrix.md |
Move |
partner-value-proposition.md |
comparisons/build-vs-buy-overview.md |
Rename + Move |
AWS_MARKETPLACE_GTM_STRATEGIC_PLAN.md |
internal/strategy/gtm-plan.md |
Move |
BEDROCK_KNOWLEDGE_BASE_INTEGRATION.md |
architecture/bedrock-integration.md |
Move |
cosell/*.md |
features/cosell-hub/*.md |
Merge |
cloud-connectors/*.md |
features/aws-connections/*.md |
Merge |
Documents to Deprecate¶
| Document | Reason |
|---|---|
CAPABILITIES_MATRIX_AUDIT.md |
Superseded by current matrix |
CAPABILITIES_MATRIX_UPDATED.md |
Superseded by current matrix |
AWS_MARKETPLACE_SELLER_PRIME_COMPLIANCE_AUDIT.md |
Superseded by 2025 version |
metronic-migration-plan.md |
Completed work |
ui-cleanup-backlog.md |
Stale backlog |
Next Steps¶
- Approve this plan - Review with team
- Execute Phase 1 - Categorize existing docs
- Create extraction script - Build Q&A extractor
- Migrate priority docs - Start with feature docs
- Configure Bedrock KB - Set up sync pipeline
- Test chatbot integration - Validate RAG retrieval
Document Version: 1.0 Plan Owner: Engineering Estimated Effort: 40-60 hours over 3 weeks