Skip to content

Documentation Reorganization Plan

RAG-Optimized Knowledge Architecture

Created: 2026-01-09 Status: Planning Goal: Transform scattered documentation into a structured knowledge base optimized for both human navigation and AI-powered retrieval (RAG/Chatbot training)


Executive Summary

Current State

  • 103 markdown files in /docs
  • Mixed naming conventions (CAPS internal vs lowercase user-facing)
  • No consistent metadata for RAG indexing
  • Scattered across root, extensions, and subdirectories
  • Duplicate information across multiple files
  • No Q&A extraction for chatbot training

Target State

  • Hierarchical taxonomy with clear categories
  • Standardized frontmatter for RAG metadata
  • Extracted Q&A pairs for chatbot training
  • Audience-tagged content (partner vs internal vs admin)
  • Single source of truth per topic
  • Automated sync to Bedrock Knowledge Base

Part 1: Documentation Taxonomy

Proposed Directory Structure

docs/
├── index.md                          # Documentation home
├── manifest.json                     # RAG metadata index
├── getting-started/                  # Onboarding (Partner-facing)
│   ├── index.md
│   ├── quick-start.md
│   ├── first-listing.md
│   └── account-setup.md
├── features/                         # Feature Documentation (Partner-facing)
│   ├── index.md
│   ├── marketplace-readiness/
│   │   ├── solution-discovery.md
│   │   ├── solution-brief-generator.md
│   │   ├── icp-builder.md
│   │   ├── listing-optimizer.md
│   │   └── launch-readiness-score.md
│   ├── content-studio/
│   │   ├── listing-content-studio.md
│   │   ├── offering-analyzer.md
│   │   ├── agreement-analyzer.md
│   │   └── product-description-wizard.md
│   ├── marketing-social/
│   │   ├── campaign-dashboard.md
│   │   ├── social-media-posts.md
│   │   ├── video-intelligence.md
│   │   ├── case-study-generator.md
│   │   └── linkedin-insights.md        # ✅ Created
│   ├── cosell-hub/
│   │   ├── index.md
│   │   ├── partner-discovery.md
│   │   ├── joint-gtm-plans.md
│   │   ├── ace-opportunity-sync.md
│   │   ├── cppo-proposals.md
│   │   └── partnership-analytics.md
│   ├── aws-connections/
│   │   ├── index.md
│   │   ├── setup-wizard.md
│   │   ├── marketplace-listings.md
│   │   └── pipeline-tracking.md
│   ├── ai-agents/
│   │   ├── index.md
│   │   ├── agent-builder.md
│   │   ├── available-capabilities.md
│   │   └── execution-history.md
│   ├── gtm-workflows/
│   │   ├── index.md
│   │   ├── visual-creator.md
│   │   ├── content-strategist.md
│   │   ├── market-analyst.md
│   │   ├── technical-builder.md
│   │   └── solution-advisor.md
│   └── optimization-lab/
│       ├── ab-testing.md
│       ├── content-variants.md
│       └── performance-simulator.md
├── guides/                           # How-To Guides (Partner-facing)
│   ├── index.md
│   ├── launching-on-aws-marketplace.md
│   ├── optimizing-listing-seo.md
│   ├── setting-up-cosell-partnerships.md
│   ├── connecting-linkedin.md
│   ├── generating-case-studies.md
│   └── using-brand-voice.md
├── comparisons/                      # DIY vs Vellocity (Partner-facing)
│   ├── index.md
│   ├── build-vs-buy-overview.md
│   ├── linkedin-analysis-comparison.md
│   ├── listing-optimization-comparison.md
│   ├── cosell-management-comparison.md
│   └── content-generation-comparison.md
├── api/                              # API Documentation (Developer-facing)
│   ├── index.md
│   ├── authentication.md
│   ├── partner-api.md
│   ├── cosell-api.md
│   ├── webhooks.md
│   └── rate-limits.md
├── integrations/                     # Third-Party Integrations
│   ├── index.md
│   ├── aws-marketplace.md
│   ├── aws-partner-central.md
│   ├── linkedin.md
│   ├── hubspot.md
│   └── knowledge-bases.md
├── admin/                            # Admin Documentation (Internal)
│   ├── index.md
│   ├── user-management.md
│   ├── guardrails.md
│   ├── knowledge-bases.md
│   └── capability-configuration.md
├── architecture/                     # Technical Architecture (Internal)
│   ├── index.md
│   ├── aws-deployment.md
│   ├── bedrock-integration.md
│   ├── data-flow.md
│   └── security-model.md
├── internal/                         # Internal Only (Not for RAG)
│   ├── patents/
│   ├── audits/
│   ├── migrations/
│   └── roadmap/
├── faq/                              # FAQ for Chatbot Training
│   ├── index.md
│   ├── general.md
│   ├── pricing.md
│   ├── features.md
│   ├── aws-integration.md
│   └── troubleshooting.md
└── changelog.md                      # Release notes

Part 2: RAG-Optimized Document Format

Frontmatter Schema

Every document should include standardized frontmatter for RAG indexing:

---
# Document Metadata (Required)
title: "LinkedIn Insights"
slug: "linkedin-insights"
description: "Transform LinkedIn activity into GTM intelligence for AWS Marketplace partners"
category: "features/marketing-social"
version: "1.0"
last_updated: "2026-01-09"

# RAG Optimization (Required)
audience: ["partner", "customer"]           # partner | customer | admin | developer | internal
keywords: ["linkedin", "social", "analytics", "icp", "attribution", "gtm"]
capability_slug: "linkedin_graph_analysis"  # Links to capability system
plan_tier: "command_plus"                   # free | launch | scale | command | command_plus

# Chatbot Training (Optional)
qa_pairs:
  - q: "What is LinkedIn Insights?"
    a: "LinkedIn Insights analyzes your LinkedIn activity to provide GTM intelligence including content performance, audience alignment, and AWS Marketplace attribution."
  - q: "How much does LinkedIn Insights cost?"
    a: "LinkedIn Insights is included in the Command+ plan at $799/month. Analysis costs 5-15 credits depending on the type."
  - q: "Can LinkedIn Insights show which posts drive AWS Marketplace trials?"
    a: "Yes, the Content-to-Marketplace Correlation analysis shows which LinkedIn posts correlate with PDP traffic spikes, demo requests, and Private Offer inquiries."

# Navigation (Optional)
parent: "features/marketing-social"
siblings: ["video-intelligence", "case-study-generator", "campaign-dashboard"]
related: ["cosell-hub/partner-discovery", "guides/connecting-linkedin"]

# Status (Optional)
status: "published"                         # draft | review | published | deprecated
feature_status: "ga"                        # beta | ga | deprecated
---

Part 3: Chatbot Training Data Extraction

Strategy: Multi-Layer Training

┌─────────────────────────────────────────────────────────────────┐
│                    CHATBOT KNOWLEDGE LAYERS                     │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Layer 1: Structured Q&A (Highest Priority)                     │
│  ├── Extracted from frontmatter qa_pairs                        │
│  ├── Manually curated FAQ documents                             │
│  └── High-confidence, direct answers                            │
│                                                                 │
│  Layer 2: Document Chunks (RAG Retrieval)                       │
│  ├── Full document content, chunked by section                  │
│  ├── Semantic search for relevant context                       │
│  └── Used when no direct Q&A match                              │
│                                                                 │
│  Layer 3: Capability Metadata (Structured Data)                 │
│  ├── Feature names, descriptions, pricing                       │
│  ├── Plan tier requirements                                     │
│  └── Credit costs and limits                                    │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Q&A Extraction Sources

Source Type Priority Volume
Frontmatter qa_pairs Explicit Q&A 1 (Highest) ~5-10 per doc
FAQ documents Explicit Q&A 1 ~50-100 total
Document headings Implicit Q (heading → Q, content → A) 2 Auto-extracted
Tables Structured data 2 Auto-extracted
Comparison sections DIY vs Vellocity 2 ~20-30

Automated Q&A Extraction Script

Create /scripts/extract-qa-for-chatbot.php:

<?php
// Extracts Q&A pairs from documentation for chatbot training

class DocumentationQAExtractor
{
    public function extractFromFrontmatter(string $path): array
    {
        // Parse YAML frontmatter
        // Return qa_pairs array
    }

    public function extractFromHeadings(string $content): array
    {
        // Convert H2/H3 headings to questions
        // "## How It Works" → "How does [feature] work?"
    }

    public function extractFromFAQ(string $path): array
    {
        // Parse explicit FAQ format
        // **Q:** Question → **A:** Answer
    }

    public function generateTrainingData(): array
    {
        // Combine all sources
        // Deduplicate
        // Format for ChatbotTrainingController
    }
}

Part 4: Migration Plan

Phase 1: Audit & Categorize (Week 1)

Task Files Effort
Categorize existing 103 docs All 4 hours
Identify duplicates ~20 expected 2 hours
Map docs to new taxonomy All 2 hours
Identify gaps New docs needed 1 hour

Deliverable: docs/internal/migration/categorization-map.json

Phase 2: Create New Structure (Week 1-2)

Task Files Effort
Create directory structure 15 directories 1 hour
Create index.md files 15 files 3 hours
Move partner-facing docs ~40 files 2 hours
Move internal docs ~30 files 1 hour
Archive deprecated docs ~10 files 1 hour

Deliverable: New directory structure with moved files

Phase 3: Add Frontmatter (Week 2)

Task Files Effort
Add frontmatter to partner docs ~50 files 8 hours
Add frontmatter to internal docs ~30 files 4 hours
Add Q&A pairs to key docs ~20 files 4 hours
Create manifest.json 1 file 2 hours

Deliverable: All docs have standardized frontmatter

Phase 4: Q&A Extraction (Week 2-3)

Task Effort
Build extraction script 4 hours
Run extraction on all docs 1 hour
Review and curate Q&A pairs 4 hours
Import to chatbot training 2 hours

Deliverable: Q&A training data ready for chatbot

Phase 5: Bedrock Knowledge Base Sync (Week 3)

Task Effort
Configure S3 sync for docs 2 hours
Set up Bedrock KB data source 2 hours
Test RAG retrieval 2 hours
Tune chunking strategy 2 hours

Deliverable: Documentation synced to Bedrock KB


Part 5: RAG Retrieval Architecture

Integration with Existing Chatbot Pro

┌─────────────────────────────────────────────────────────────────┐
│                    CHATBOT PRO + DOCS RAG                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  User Query: "How do I set up LinkedIn Insights?"               │
│                           │                                     │
│                           ▼                                     │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │              Query Classification                        │   │
│  │  - Is this a FAQ? → Check Q&A pairs first               │   │
│  │  - Is this feature-specific? → Check capability docs    │   │
│  │  - Is this how-to? → Check guides                       │   │
│  └─────────────────────────────────────────────────────────┘   │
│                           │                                     │
│                           ▼                                     │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │           Bedrock Knowledge Base Query                   │   │
│  │  - Semantic search across documentation                  │   │
│  │  - Filter by audience (partner/admin/internal)          │   │
│  │  - Boost by keyword match                               │   │
│  └─────────────────────────────────────────────────────────┘   │
│                           │                                     │
│                           ▼                                     │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │              Context Assembly                            │   │
│  │  - Top 3-5 relevant chunks                              │   │
│  │  - Q&A pairs if matched                                 │   │
│  │  - Capability metadata                                  │   │
│  └─────────────────────────────────────────────────────────┘   │
│                           │                                     │
│                           ▼                                     │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │              Claude Response Generation                  │   │
│  │  - Grounded in retrieved context                        │   │
│  │  - Cites sources (doc links)                           │   │
│  │  - Follows brand voice                                  │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Bedrock KB Configuration

{
  "knowledgeBase": {
    "name": "vellocity-documentation",
    "description": "Vellocity platform documentation for partner support",
    "dataSource": {
      "type": "S3",
      "s3Configuration": {
        "bucketArn": "arn:aws:s3:::vellocity-docs",
        "inclusionPrefixes": [
          "docs/features/",
          "docs/guides/",
          "docs/faq/",
          "docs/comparisons/",
          "docs/integrations/"
        ],
        "exclusionPrefixes": [
          "docs/internal/",
          "docs/architecture/"
        ]
      }
    },
    "chunkingConfiguration": {
      "chunkingStrategy": "HIERARCHICAL",
      "hierarchicalChunkingConfiguration": {
        "levelConfigurations": [
          { "maxTokens": 1500 },  // Parent chunks
          { "maxTokens": 300 }    // Child chunks
        ],
        "overlapTokens": 60
      }
    },
    "embeddingModel": "amazon.titan-embed-text-v2:0"
  }
}

Part 6: Document Templates

Feature Documentation Template

---
title: "[Feature Name]"
slug: "[feature-slug]"
description: "[One-line description]"
category: "features/[subcategory]"
audience: ["partner", "customer"]
keywords: ["keyword1", "keyword2"]
capability_slug: "[capability_slug]"
plan_tier: "[tier]"
qa_pairs:
  - q: "What is [Feature Name]?"
    a: "[Brief answer]"
  - q: "How much does [Feature Name] cost?"
    a: "[Pricing answer]"
---

# [Feature Name]

> **Capability:** `[capability_slug]`
> **Plan Tier:** [Tier Name]
> **Credit Cost:** [X] credits

---

## Overview

[2-3 paragraphs explaining the feature and the problem it solves]

---

## Key Capabilities

### 1. [Capability 1]

[Description]

**What You Get:**
- Point 1
- Point 2

---

## Why This Matters for AWS Marketplace Partners

[Business value explanation]

---

## DIY vs. Vellocity Comparison

| Component | DIY Cost | DIY Time | Vellocity |
|-----------|----------|----------|-----------|
| [Component 1] | $X | Y weeks | Included |

---

## Getting Started

### Prerequisites
1. [Prereq 1]
2. [Prereq 2]

### Step 1: [First Step]
[Instructions]

---

## Frequently Asked Questions

**Q: [Question 1]?**
A: [Answer 1]

**Q: [Question 2]?**
A: [Answer 2]

---

## Related Documentation

- [Related Doc 1](link)
- [Related Doc 2](link)

---

*Last Updated: [Date]*

Part 7: Manifest File (RAG Index)

Create docs/manifest.json for programmatic access:

{
  "version": "1.0",
  "generated": "2026-01-09T12:00:00Z",
  "total_documents": 85,
  "categories": {
    "features": {
      "count": 35,
      "subcategories": ["marketplace-readiness", "content-studio", "marketing-social", "cosell-hub", "aws-connections", "ai-agents", "gtm-workflows", "optimization-lab"]
    },
    "guides": { "count": 10 },
    "comparisons": { "count": 6 },
    "faq": { "count": 6 },
    "api": { "count": 5 },
    "integrations": { "count": 6 }
  },
  "documents": [
    {
      "path": "features/marketing-social/linkedin-insights.md",
      "title": "LinkedIn Insights",
      "slug": "linkedin-insights",
      "audience": ["partner", "customer"],
      "capability_slug": "linkedin_graph_analysis",
      "plan_tier": "command_plus",
      "keywords": ["linkedin", "social", "analytics"],
      "qa_count": 12,
      "word_count": 2500,
      "last_updated": "2026-01-09"
    }
    // ... more documents
  ],
  "qa_pairs_total": 450,
  "capabilities_documented": 32
}

Part 8: Sync to Chatbot Training

Automated Sync Command

Create artisan docs:sync-to-chatbot:

// app/Console/Commands/SyncDocsToKnowledgeBase.php

class SyncDocsToKnowledgeBase extends Command
{
    protected $signature = 'docs:sync-to-chatbot
                            {--chatbot= : Target chatbot ID}
                            {--audience= : Filter by audience (partner/admin/all)}
                            {--dry-run : Preview without importing}';

    public function handle()
    {
        // 1. Parse all docs with frontmatter
        // 2. Extract Q&A pairs
        // 3. Chunk document content
        // 4. Import to ChatbotData table
        // 5. Trigger KnowledgeBaseHelper::buildKnowledgeBase()
    }
}

Usage

# Sync partner-facing docs to support chatbot
php artisan docs:sync-to-chatbot --chatbot=1 --audience=partner

# Preview what would be synced
php artisan docs:sync-to-chatbot --chatbot=1 --audience=partner --dry-run

# Sync everything
php artisan docs:sync-to-chatbot --chatbot=1 --audience=all

Part 9: Success Metrics

Documentation Quality

Metric Current Target
Docs with frontmatter 0% 100%
Docs with Q&A pairs 0% 80%
Duplicate documents ~15 0
Missing feature docs ~10 0
Average doc freshness Unknown < 30 days

RAG Performance

Metric Target
Query relevance (top-3) > 85%
Answer accuracy > 90%
Response time < 3 seconds
Fallback rate (no answer) < 10%

Chatbot Training

Metric Target
Q&A pairs extracted > 400
Training data coverage All features
Partner question resolution > 80%

Appendix: File Migration Mapping

High-Priority Moves

Current Location New Location Action
CAPABILITIES_MATRIX_CURRENT.md internal/capabilities-matrix.md Move
partner-value-proposition.md comparisons/build-vs-buy-overview.md Rename + Move
AWS_MARKETPLACE_GTM_STRATEGIC_PLAN.md internal/strategy/gtm-plan.md Move
BEDROCK_KNOWLEDGE_BASE_INTEGRATION.md architecture/bedrock-integration.md Move
cosell/*.md features/cosell-hub/*.md Merge
cloud-connectors/*.md features/aws-connections/*.md Merge

Documents to Deprecate

Document Reason
CAPABILITIES_MATRIX_AUDIT.md Superseded by current matrix
CAPABILITIES_MATRIX_UPDATED.md Superseded by current matrix
AWS_MARKETPLACE_SELLER_PRIME_COMPLIANCE_AUDIT.md Superseded by 2025 version
metronic-migration-plan.md Completed work
ui-cleanup-backlog.md Stale backlog

Next Steps

  1. Approve this plan - Review with team
  2. Execute Phase 1 - Categorize existing docs
  3. Create extraction script - Build Q&A extractor
  4. Migrate priority docs - Start with feature docs
  5. Configure Bedrock KB - Set up sync pipeline
  6. Test chatbot integration - Validate RAG retrieval

Document Version: 1.0 Plan Owner: Engineering Estimated Effort: 40-60 hours over 3 weeks