Documentation Reorganization Plan¶

RAG-Optimized Knowledge Architecture¶

Created: 2026-01-09 Status: Planning Goal: Transform scattered documentation into a structured knowledge base optimized for both human navigation and AI-powered retrieval (RAG/Chatbot training)

Executive Summary¶

Current State¶

103 markdown files in /docs
Mixed naming conventions (CAPS internal vs lowercase user-facing)
No consistent metadata for RAG indexing
Scattered across root, extensions, and subdirectories
Duplicate information across multiple files
No Q&A extraction for chatbot training

Target State¶

Hierarchical taxonomy with clear categories
Standardized frontmatter for RAG metadata
Extracted Q&A pairs for chatbot training
Audience-tagged content (partner vs internal vs admin)
Single source of truth per topic
Automated sync to Bedrock Knowledge Base

Part 1: Documentation Taxonomy¶

Proposed Directory Structure¶

docs/
├── index.md                          # Documentation home
├── manifest.json                     # RAG metadata index
│
├── getting-started/                  # Onboarding (Partner-facing)
│   ├── index.md
│   ├── quick-start.md
│   ├── first-listing.md
│   └── account-setup.md
│
├── features/                         # Feature Documentation (Partner-facing)
│   ├── index.md
│   ├── marketplace-readiness/
│   │   ├── solution-discovery.md
│   │   ├── solution-brief-generator.md
│   │   ├── icp-builder.md
│   │   ├── listing-optimizer.md
│   │   └── launch-readiness-score.md
│   ├── content-studio/
│   │   ├── listing-content-studio.md
│   │   ├── offering-analyzer.md
│   │   ├── agreement-analyzer.md
│   │   └── product-description-wizard.md
│   ├── marketing-social/
│   │   ├── campaign-dashboard.md
│   │   ├── social-media-posts.md
│   │   ├── video-intelligence.md
│   │   ├── case-study-generator.md
│   │   └── linkedin-insights.md        # ✅ Created
│   ├── cosell-hub/
│   │   ├── index.md
│   │   ├── partner-discovery.md
│   │   ├── joint-gtm-plans.md
│   │   ├── ace-opportunity-sync.md
│   │   ├── cppo-proposals.md
│   │   └── partnership-analytics.md
│   ├── aws-connections/
│   │   ├── index.md
│   │   ├── setup-wizard.md
│   │   ├── marketplace-listings.md
│   │   └── pipeline-tracking.md
│   ├── ai-agents/
│   │   ├── index.md
│   │   ├── agent-builder.md
│   │   ├── available-capabilities.md
│   │   └── execution-history.md
│   ├── gtm-workflows/
│   │   ├── index.md
│   │   ├── visual-creator.md
│   │   ├── content-strategist.md
│   │   ├── market-analyst.md
│   │   ├── technical-builder.md
│   │   └── solution-advisor.md
│   └── optimization-lab/
│       ├── ab-testing.md
│       ├── content-variants.md
│       └── performance-simulator.md
│
├── guides/                           # How-To Guides (Partner-facing)
│   ├── index.md
│   ├── launching-on-aws-marketplace.md
│   ├── optimizing-listing-seo.md
│   ├── setting-up-cosell-partnerships.md
│   ├── connecting-linkedin.md
│   ├── generating-case-studies.md
│   └── using-brand-voice.md
│
├── comparisons/                      # DIY vs Vellocity (Partner-facing)
│   ├── index.md
│   ├── build-vs-buy-overview.md
│   ├── linkedin-analysis-comparison.md
│   ├── listing-optimization-comparison.md
│   ├── cosell-management-comparison.md
│   └── content-generation-comparison.md
│
├── api/                              # API Documentation (Developer-facing)
│   ├── index.md
│   ├── authentication.md
│   ├── partner-api.md
│   ├── cosell-api.md
│   ├── webhooks.md
│   └── rate-limits.md
│
├── integrations/                     # Third-Party Integrations
│   ├── index.md
│   ├── aws-marketplace.md
│   ├── aws-partner-central.md
│   ├── linkedin.md
│   ├── hubspot.md
│   └── knowledge-bases.md
│
├── admin/                            # Admin Documentation (Internal)
│   ├── index.md
│   ├── user-management.md
│   ├── guardrails.md
│   ├── knowledge-bases.md
│   └── capability-configuration.md
│
├── architecture/                     # Technical Architecture (Internal)
│   ├── index.md
│   ├── aws-deployment.md
│   ├── bedrock-integration.md
│   ├── data-flow.md
│   └── security-model.md
│
├── internal/                         # Internal Only (Not for RAG)
│   ├── patents/
│   ├── audits/
│   ├── migrations/
│   └── roadmap/
│
├── faq/                              # FAQ for Chatbot Training
│   ├── index.md
│   ├── general.md
│   ├── pricing.md
│   ├── features.md
│   ├── aws-integration.md
│   └── troubleshooting.md
│
└── changelog.md                      # Release notes

Part 2: RAG-Optimized Document Format¶

Frontmatter Schema¶

Every document should include standardized frontmatter for RAG indexing:

---
# Document Metadata (Required)
title: "LinkedIn Insights"
slug: "linkedin-insights"
description: "Transform LinkedIn activity into GTM intelligence for AWS Marketplace partners"
category: "features/marketing-social"
version: "1.0"
last_updated: "2026-01-09"

# RAG Optimization (Required)
audience: ["partner", "customer"]           # partner | customer | admin | developer | internal
keywords: ["linkedin", "social", "analytics", "icp", "attribution", "gtm"]
capability_slug: "linkedin_graph_analysis"  # Links to capability system
plan_tier: "command_plus"                   # free | launch | scale | command | command_plus

# Chatbot Training (Optional)
qa_pairs:
  - q: "What is LinkedIn Insights?"
    a: "LinkedIn Insights analyzes your LinkedIn activity to provide GTM intelligence including content performance, audience alignment, and AWS Marketplace attribution."
  - q: "How much does LinkedIn Insights cost?"
    a: "LinkedIn Insights is included in the Command+ plan at $799/month. Analysis costs 5-15 credits depending on the type."
  - q: "Can LinkedIn Insights show which posts drive AWS Marketplace trials?"
    a: "Yes, the Content-to-Marketplace Correlation analysis shows which LinkedIn posts correlate with PDP traffic spikes, demo requests, and Private Offer inquiries."

# Navigation (Optional)
parent: "features/marketing-social"
siblings: ["video-intelligence", "case-study-generator", "campaign-dashboard"]
related: ["cosell-hub/partner-discovery", "guides/connecting-linkedin"]

# Status (Optional)
status: "published"                         # draft | review | published | deprecated
feature_status: "ga"                        # beta | ga | deprecated
---

Part 3: Chatbot Training Data Extraction¶

Strategy: Multi-Layer Training¶

┌─────────────────────────────────────────────────────────────────┐
│                    CHATBOT KNOWLEDGE LAYERS                     │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Layer 1: Structured Q&A (Highest Priority)                     │
│  ├── Extracted from frontmatter qa_pairs                        │
│  ├── Manually curated FAQ documents                             │
│  └── High-confidence, direct answers                            │
│                                                                 │
│  Layer 2: Document Chunks (RAG Retrieval)                       │
│  ├── Full document content, chunked by section                  │
│  ├── Semantic search for relevant context                       │
│  └── Used when no direct Q&A match                              │
│                                                                 │
│  Layer 3: Capability Metadata (Structured Data)                 │
│  ├── Feature names, descriptions, pricing                       │
│  ├── Plan tier requirements                                     │
│  └── Credit costs and limits                                    │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Q&A Extraction Sources¶

Source	Type	Priority	Volume
Frontmatter `qa_pairs`	Explicit Q&A	1 (Highest)	~5-10 per doc
FAQ documents	Explicit Q&A	1	~50-100 total
Document headings	Implicit Q (heading → Q, content → A)	2	Auto-extracted
Tables	Structured data	2	Auto-extracted
Comparison sections	DIY vs Vellocity	2	~20-30

Automated Q&A Extraction Script¶

Create /scripts/extract-qa-for-chatbot.php:

<?php
// Extracts Q&A pairs from documentation for chatbot training

class DocumentationQAExtractor
{
    public function extractFromFrontmatter(string $path): array
    {
        // Parse YAML frontmatter
        // Return qa_pairs array
    }

    public function extractFromHeadings(string $content): array
    {
        // Convert H2/H3 headings to questions
        // "## How It Works" → "How does [feature] work?"
    }

    public function extractFromFAQ(string $path): array
    {
        // Parse explicit FAQ format
        // **Q:** Question → **A:** Answer
    }

    public function generateTrainingData(): array
    {
        // Combine all sources
        // Deduplicate
        // Format for ChatbotTrainingController
    }
}

Part 4: Migration Plan¶

Phase 1: Audit & Categorize (Week 1)¶

Task	Files	Effort
Categorize existing 103 docs	All	4 hours
Identify duplicates	~20 expected	2 hours
Map docs to new taxonomy	All	2 hours
Identify gaps	New docs needed	1 hour

Deliverable: docs/internal/migration/categorization-map.json

Phase 2: Create New Structure (Week 1-2)¶

Task	Files	Effort
Create directory structure	15 directories	1 hour
Create index.md files	15 files	3 hours
Move partner-facing docs	~40 files	2 hours
Move internal docs	~30 files	1 hour
Archive deprecated docs	~10 files	1 hour

Deliverable: New directory structure with moved files

Phase 3: Add Frontmatter (Week 2)¶

Task	Files	Effort
Add frontmatter to partner docs	~50 files	8 hours
Add frontmatter to internal docs	~30 files	4 hours
Add Q&A pairs to key docs	~20 files	4 hours
Create manifest.json	1 file	2 hours

Deliverable: All docs have standardized frontmatter

Phase 4: Q&A Extraction (Week 2-3)¶

Task	Effort
Build extraction script	4 hours
Run extraction on all docs	1 hour
Review and curate Q&A pairs	4 hours
Import to chatbot training	2 hours

Deliverable: Q&A training data ready for chatbot

Phase 5: Bedrock Knowledge Base Sync (Week 3)¶

Task	Effort
Configure S3 sync for docs	2 hours
Set up Bedrock KB data source	2 hours
Test RAG retrieval	2 hours
Tune chunking strategy	2 hours

Deliverable: Documentation synced to Bedrock KB

Part 5: RAG Retrieval Architecture¶

Integration with Existing Chatbot Pro¶

┌─────────────────────────────────────────────────────────────────┐
│                    CHATBOT PRO + DOCS RAG                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  User Query: "How do I set up LinkedIn Insights?"               │
│                           │                                     │
│                           ▼                                     │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │              Query Classification                        │   │
│  │  - Is this a FAQ? → Check Q&A pairs first               │   │
│  │  - Is this feature-specific? → Check capability docs    │   │
│  │  - Is this how-to? → Check guides                       │   │
│  └─────────────────────────────────────────────────────────┘   │
│                           │                                     │
│                           ▼                                     │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │           Bedrock Knowledge Base Query                   │   │
│  │  - Semantic search across documentation                  │   │
│  │  - Filter by audience (partner/admin/internal)          │   │
│  │  - Boost by keyword match                               │   │
│  └─────────────────────────────────────────────────────────┘   │
│                           │                                     │
│                           ▼                                     │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │              Context Assembly                            │   │
│  │  - Top 3-5 relevant chunks                              │   │
│  │  - Q&A pairs if matched                                 │   │
│  │  - Capability metadata                                  │   │
│  └─────────────────────────────────────────────────────────┘   │
│                           │                                     │
│                           ▼                                     │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │              Claude Response Generation                  │   │
│  │  - Grounded in retrieved context                        │   │
│  │  - Cites sources (doc links)                           │   │
│  │  - Follows brand voice                                  │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Bedrock KB Configuration¶

{
  "knowledgeBase": {
    "name": "vellocity-documentation",
    "description": "Vellocity platform documentation for partner support",
    "dataSource": {
      "type": "S3",
      "s3Configuration": {
        "bucketArn": "arn:aws:s3:::vellocity-docs",
        "inclusionPrefixes": [
          "docs/features/",
          "docs/guides/",
          "docs/faq/",
          "docs/comparisons/",
          "docs/integrations/"
        ],
        "exclusionPrefixes": [
          "docs/internal/",
          "docs/architecture/"
        ]
      }
    },
    "chunkingConfiguration": {
      "chunkingStrategy": "HIERARCHICAL",
      "hierarchicalChunkingConfiguration": {
        "levelConfigurations": [
          { "maxTokens": 1500 },  // Parent chunks
          { "maxTokens": 300 }    // Child chunks
        ],
        "overlapTokens": 60
      }
    },
    "embeddingModel": "amazon.titan-embed-text-v2:0"
  }
}

Part 6: Document Templates¶

Feature Documentation Template¶

---
title: "[Feature Name]"
slug: "[feature-slug]"
description: "[One-line description]"
category: "features/[subcategory]"
audience: ["partner", "customer"]
keywords: ["keyword1", "keyword2"]
capability_slug: "[capability_slug]"
plan_tier: "[tier]"
qa_pairs:
  - q: "What is [Feature Name]?"
    a: "[Brief answer]"
  - q: "How much does [Feature Name] cost?"
    a: "[Pricing answer]"
---

# [Feature Name]

> **Capability:** `[capability_slug]`
> **Plan Tier:** [Tier Name]
> **Credit Cost:** [X] credits

---

## Overview

[2-3 paragraphs explaining the feature and the problem it solves]

---

## Key Capabilities

### 1. [Capability 1]

[Description]

**What You Get:**
- Point 1
- Point 2

---

## Why This Matters for AWS Marketplace Partners

[Business value explanation]

---

## DIY vs. Vellocity Comparison

| Component | DIY Cost | DIY Time | Vellocity |
|-----------|----------|----------|-----------|
| [Component 1] | $X | Y weeks | Included |

---

## Getting Started

### Prerequisites
1. [Prereq 1]
2. [Prereq 2]

### Step 1: [First Step]
[Instructions]

---

## Frequently Asked Questions

**Q: [Question 1]?**
A: [Answer 1]

**Q: [Question 2]?**
A: [Answer 2]

---

## Related Documentation

- [Related Doc 1](link)
- [Related Doc 2](link)

---

*Last Updated: [Date]*

Part 7: Manifest File (RAG Index)¶

Create docs/manifest.json for programmatic access:

{
  "version": "1.0",
  "generated": "2026-01-09T12:00:00Z",
  "total_documents": 85,
  "categories": {
    "features": {
      "count": 35,
      "subcategories": ["marketplace-readiness", "content-studio", "marketing-social", "cosell-hub", "aws-connections", "ai-agents", "gtm-workflows", "optimization-lab"]
    },
    "guides": { "count": 10 },
    "comparisons": { "count": 6 },
    "faq": { "count": 6 },
    "api": { "count": 5 },
    "integrations": { "count": 6 }
  },
  "documents": [
    {
      "path": "features/marketing-social/linkedin-insights.md",
      "title": "LinkedIn Insights",
      "slug": "linkedin-insights",
      "audience": ["partner", "customer"],
      "capability_slug": "linkedin_graph_analysis",
      "plan_tier": "command_plus",
      "keywords": ["linkedin", "social", "analytics"],
      "qa_count": 12,
      "word_count": 2500,
      "last_updated": "2026-01-09"
    }
    // ... more documents
  ],
  "qa_pairs_total": 450,
  "capabilities_documented": 32
}

Part 8: Sync to Chatbot Training¶

Automated Sync Command¶

Create artisan docs:sync-to-chatbot:

// app/Console/Commands/SyncDocsToKnowledgeBase.php

class SyncDocsToKnowledgeBase extends Command
{
    protected $signature = 'docs:sync-to-chatbot
                            {--chatbot= : Target chatbot ID}
                            {--audience= : Filter by audience (partner/admin/all)}
                            {--dry-run : Preview without importing}';

    public function handle()
    {
        // 1. Parse all docs with frontmatter
        // 2. Extract Q&A pairs
        // 3. Chunk document content
        // 4. Import to ChatbotData table
        // 5. Trigger KnowledgeBaseHelper::buildKnowledgeBase()
    }
}

Usage¶

# Sync partner-facing docs to support chatbot
php artisan docs:sync-to-chatbot --chatbot=1 --audience=partner

# Preview what would be synced
php artisan docs:sync-to-chatbot --chatbot=1 --audience=partner --dry-run

# Sync everything
php artisan docs:sync-to-chatbot --chatbot=1 --audience=all

Part 9: Success Metrics¶

Documentation Quality¶

Metric	Current	Target
Docs with frontmatter	0%	100%
Docs with Q&A pairs	0%	80%
Duplicate documents	~15	0
Missing feature docs	~10	0
Average doc freshness	Unknown	< 30 days

RAG Performance¶

Metric	Target
Query relevance (top-3)	> 85%
Answer accuracy	> 90%
Response time	< 3 seconds
Fallback rate (no answer)	< 10%

Chatbot Training¶

Metric	Target
Q&A pairs extracted	> 400
Training data coverage	All features
Partner question resolution	> 80%

Appendix: File Migration Mapping¶

High-Priority Moves¶

Current Location	New Location	Action
`CAPABILITIES_MATRIX_CURRENT.md`	`internal/capabilities-matrix.md`	Move
`partner-value-proposition.md`	`comparisons/build-vs-buy-overview.md`	Rename + Move
`AWS_MARKETPLACE_GTM_STRATEGIC_PLAN.md`	`internal/strategy/gtm-plan.md`	Move
`BEDROCK_KNOWLEDGE_BASE_INTEGRATION.md`	`architecture/bedrock-integration.md`	Move
`cosell/*.md`	`features/cosell-hub/*.md`	Merge
`cloud-connectors/*.md`	`features/aws-connections/*.md`	Merge

Documents to Deprecate¶

Document	Reason
`CAPABILITIES_MATRIX_AUDIT.md`	Superseded by current matrix
`CAPABILITIES_MATRIX_UPDATED.md`	Superseded by current matrix
`AWS_MARKETPLACE_SELLER_PRIME_COMPLIANCE_AUDIT.md`	Superseded by 2025 version
`metronic-migration-plan.md`	Completed work
`ui-cleanup-backlog.md`	Stale backlog

Next Steps¶

Approve this plan - Review with team
Execute Phase 1 - Categorize existing docs
Create extraction script - Build Q&A extractor
Migrate priority docs - Start with feature docs
Configure Bedrock KB - Set up sync pipeline
Test chatbot integration - Validate RAG retrieval

Document Version: 1.0 Plan Owner: Engineering Estimated Effort: 40-60 hours over 3 weeks