Refinement Drift Fix + Fine-tuned Models & Platform RAG¶
Issue Fixed: Refinement Drift/Hallucination¶
Problem¶
Step #5 refinement was generating random content about meditation apps, fitness tips, and travel apps instead of refining the actual webinar promotion campaign.
Root Cause¶
RefineContentCapability was receiving generic placeholder strings like:
Instead of the actual content from steps #2 and #3 (which generated 3 social posts + 3 emails).
The actual content was being passed in _dependency_1 and _dependency_2 parameters with a nested series structure:
[
'series' => [
['piece_number' => 1, 'content' => '...social post 1...'],
['piece_number' => 2, 'content' => '...social post 2...'],
['piece_number' => 3, 'content' => '...social post 3...'],
]
]
Without the actual content, Claude had no context and generated random topics.
Solution¶
Modified RefineContentCapability.php (commit: 434fb9b7):
- Added
extractContentFromDependencies()method that: - Scans for
_dependency_*parameters from previous workflow steps - Extracts content from series structures (from
GenerateContentSeriesCapability) - Handles direct content fields (from other capabilities)
-
Combines multiple pieces with clear separators
-
Modified
execute()method to: - Call
extractContentFromDependencies()before building prompt - Override the generic
original_contentparameter with extracted actual content -
Log dependency extraction for debugging
-
Result: Refinement now receives:
Instead of just "Generated social posts and email sequences".
Testing¶
Run the webinar campaign workflow again. Step #5 refinement should now: - ✅ Reference the actual social posts and emails generated in steps #2-3 - ✅ Make specific improvements to the webinar content - ✅ Maintain the webinar topic throughout - ❌ NOT drift to random topics like meditation, fitness, or travel
Question 1: Can I Create and Use Fine-tuned Models?¶
Short Answer¶
Yes, AWS Bedrock supports fine-tuned models, but you'll need to extend the current implementation.
Current Implementation¶
The BedrockEngine enum (app/Enums/BedrockEngine.php) has hardcoded foundation model IDs:
case CLAUDE_3_SONNET = 'anthropic.claude-3-sonnet-20240229-v1:0';
case CLAUDE_3_5_SONNET = 'anthropic.claude-3-5-sonnet-20240620-v1:0';
case NOVA_PRO = 'amazon.nova-pro-v1:0';
How to Add Fine-tuned Models¶
Option A: Add to BedrockEngine Enum (Quick)¶
// In app/Enums/BedrockEngine.php
case CUSTOM_WEBINAR_EXPERT = 'arn:aws:bedrock:us-east-1:123456789:provisioned-model/my-webinar-model';
public function label(): string
{
return match ($this) {
// ... existing cases
self::CUSTOM_WEBINAR_EXPERT => __('Custom Webinar Expert'),
};
}
Then use in agent configuration:
Option B: Dynamic Custom Model IDs (Flexible)¶
1. Migration:
Schema::table('ext_content_manager_agents', function (Blueprint $table) {
$table->string('custom_model_id')->nullable()->after('ai_model');
});
2. Modify capability handlers to check for custom model:
// In GenerateContentSeriesCapability.php, RefineContentCapability.php, etc.
$modelId = $agent->custom_model_id ?? BedrockEngine::CLAUDE_3_SONNET->value;
$response = $this->bedrockService->invokeModel(
$modelId, // Use custom model if specified
$prompt,
$options
);
3. Add UI field in agent create/edit form to specify custom model ARN.
AWS Bedrock Fine-tuning Process¶
Supported Models¶
- Claude models: Continued pre-training on your domain data
- Amazon Nova: Fine-tuning for specific tasks
- Amazon Titan: Text generation fine-tuning
Steps to Create Fine-tuned Model¶
-
Prepare training data (JSONL format):
-
Upload to S3:
-
Create fine-tuning job (AWS Console or CLI):
aws bedrock create-model-customization-job \ --job-name "webinar-content-expert" \ --custom-model-name "webinar-expert-v1" \ --base-model-identifier "anthropic.claude-3-sonnet-20240229-v1:0" \ --training-data-config "s3Uri=s3://my-bedrock-training/webinar-expert/" \ --output-data-config "s3Uri=s3://my-bedrock-models/webinar-expert-v1/" \ --role-arn "arn:aws:iam::123456789:role/BedrockCustomizationRole" -
Create provisioned throughput:
-
Use in Vellocity:
Cost Considerations¶
- Training: ~$5-50 depending on dataset size and epochs
- Provisioned throughput: ~$75-300/month (1-4 model units)
- Inference: Included in provisioned throughput (no per-token charges)
Recommendation: Start with foundation models + RAG (cheaper, faster). Use fine-tuning only if: - You have >1,000 high-quality training examples - You need consistent brand voice across all outputs - RAG isn't providing sufficient domain adaptation
Question 2: Can Models Refer to Core Vellocity Platform Knowledge Base?¶
Short Answer¶
Yes, you can create a platform-level RAG that all agents access for Vellocity platform documentation.
Current Knowledge Base Architecture¶
QueryKnowledgeBaseCapability searches:
1. User-scoped: Documents where user_id = current_user
2. Team-scoped: Documents where team_id = current_team AND is_team_shared = true
This isolates customer data per user/team, which is correct for privacy.
Problem¶
Agents don't have access to platform-level documentation like:
- Agent capability reference (what parameters does generate_content_series require?)
- Workflow pattern library (how to structure a webinar campaign?)
- Best practices (when to use refinement loops?)
Solution: Two Implementation Paths¶
Path 1: Platform-Level PdfData (Local RAG with OpenAI Embeddings)¶
Step 1: Database Migration¶
// database/migrations/YYYY_MM_DD_add_platform_shared_to_pdf_data.php
Schema::table('pdf_data', function (Blueprint $table) {
$table->boolean('is_platform_shared')->default(false)->after('is_team_shared');
$table->index('is_platform_shared');
});
Step 2: Modify QueryKnowledgeBaseCapability¶
// In searchKnowledgeBase() method around line 149
$query = PdfData::query()
->where(function ($q) use ($userId, $teamId) {
$q->where('user_id', $userId);
// Team-shared documents
if ($teamId) {
$q->orWhere(function ($tq) use ($teamId) {
$tq->where('team_id', $teamId)
->where('is_team_shared', true);
});
}
// PLATFORM-LEVEL DOCUMENTS (accessible to all users)
$q->orWhere('is_platform_shared', true);
})
->whereNotNull('vector');
Step 3: Upload Platform Documentation¶
use App\Models\PdfData;
use OpenAI\Laravel\Facades\OpenAI;
$platformDocs = [
[
'file_name' => 'agent-capabilities-reference.md',
'content' => file_get_contents(storage_path('platform-docs/capabilities-reference.md')),
'category' => 'platform_docs',
],
[
'file_name' => 'workflow-patterns-library.md',
'content' => file_get_contents(storage_path('platform-docs/workflow-patterns.md')),
'category' => 'platform_docs',
],
// Add more docs...
];
foreach ($platformDocs as $doc) {
// Generate embedding
$embedding = OpenAI::embeddings()->create([
'model' => 'text-embedding-ada-002',
'input' => $doc['content'],
])->embeddings[0]->embedding;
// Save to database
PdfData::create([
'user_id' => 1, // Admin user
'team_id' => null,
'is_platform_shared' => true, // <-- Key flag
'file_name' => $doc['file_name'],
'content' => $doc['content'],
'vector' => json_encode($embedding),
'category' => $doc['category'],
]);
}
Pros:¶
- ✅ Simple implementation (minimal code changes)
- ✅ No additional AWS resources needed
- ✅ Uses existing OpenAI embedding infrastructure
Cons:¶
- ❌ Mixes platform docs with user data (same table)
- ❌ OpenAI embedding costs for every query
- ❌ Harder to manage/version platform docs
Path 2: Dedicated Bedrock Knowledge Base (Recommended)¶
Create a separate AWS Bedrock KB specifically for Vellocity platform documentation.
Step 1: Create Platform KB in AWS¶
Via AWS Console:
1. Go to Amazon Bedrock > Knowledge Bases
2. Click Create knowledge base
3. Configure:
- Name: vellocity-platform-kb
- IAM role: Create new or use existing
- Data source: S3
- S3 URI: s3://vellocity-platform-docs/
- Embedding model: Amazon Titan Embeddings G1
4. Click Create
5. Copy the Knowledge Base ID (e.g., ABCDEFGHIJ)
Via AWS CLI:
aws bedrock-agent create-knowledge-base \
--name "vellocity-platform-kb" \
--role-arn "arn:aws:iam::123456789:role/BedrockKBRole" \
--knowledge-base-configuration '{
"type": "VECTOR",
"vectorKnowledgeBaseConfiguration": {
"embeddingModelArn": "arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v1"
}
}' \
--storage-configuration '{
"type": "OPENSEARCH_SERVERLESS",
"opensearchServerlessConfiguration": {
"collectionArn": "arn:aws:aoss:us-east-1:123456789:collection/platform-kb"
}
}'
Step 2: Upload Platform Documentation to S3¶
Create platform docs:
agent-capabilities-reference.md:
# Vellocity Agent Capabilities Reference
## generate_content_series
**Description**: Generates coordinated series of content pieces that work together strategically.
**Required Parameters**:
- `content_type` (string): "email", "social", "blog", "video_script"
- `topic` (string): Description of the content theme
- `series_count` (integer): Number of pieces to generate (typically 3-5)
**Optional Parameters**:
- `series_type` (string): Pre-defined strategy
- `"email_sequence"`: Announcement → Value → Urgency
- `"social_campaign"`: Awareness → Engagement → Conversion
- `"webinar_sequence"`: Invite → Reminder → Last-chance
- `"product_launch"`: Teaser → Launch → Follow-up
- `length` (string): "short" (50-100 words), "medium" (150-250 words), "long" (300-500 words)
**Example Usage**:
```json
{
"capability": "generate_content_series",
"parameters": {
"content_type": "email",
"topic": "AI-powered webinar on workflow automation",
"series_count": 3,
"series_type": "webinar_sequence",
"length": "medium"
}
}
Output Structure:
{
"success": true,
"series": [
{"piece_number": 1, "content": "Email 1 text...", "word_count": 200},
{"piece_number": 2, "content": "Email 2 text...", "word_count": 220},
{"piece_number": 3, "content": "Email 3 text...", "word_count": 180}
],
"series_count": 3
}
refine_content¶
Description: Iteratively refines content based on analysis feedback to improve quality scores.
Required Parameters:
- original_content (string): The content to refine
- analysis_results (array): Output from analyze_content capability
Optional Parameters:
- refinement_instructions (string): Specific improvement focus
When to Use:
- Content analysis shows scores below targets (e.g., SEO < 80)
- Strategic evaluation reveals missing keywords or weak persona alignment
- meets_targets.overall = false in analysis results
Example Workflow:
[
{
"step": 1,
"capability": "generate_content_series",
"parameters": {"content_type": "email", "topic": "...", "series_count": 3}
},
{
"step": 2,
"capability": "analyze_content",
"parameters": {
"content": "Generated emails",
"target_seo_score": 80,
"target_brand_alignment": 85
}
},
{
"step": 3,
"capability": "refine_content",
"parameters": {
"original_content": "Generated emails",
"analysis_results": "Analysis from step 2"
},
"depends_on": [1, 2],
"condition": "step_2.meets_targets.overall === false"
}
]
**`workflow-patterns-library.md`:**
```markdown
# Vellocity Workflow Patterns Library
## Pattern: Webinar Promotion Campaign
**Use Case**: Promote an upcoming webinar with multi-channel content
**Workflow Steps**:
1. **Query Knowledge Base**: Get webinar details (topic, speakers, date, target audience)
2. **Generate Social Series**: 3 social posts (announcement, value prop, urgency)
3. **Generate Email Series**: 3 emails (invite, reminder, last-chance)
4. **Analyze Content**: Check SEO, brand alignment, persona fit
5. **Refine Content** (conditional): If quality targets not met, improve content
6. **Package Results**: Organize outputs for handoff to marketing team
**Example JSON Workflow**:
```json
{
"task": "Create webinar promotion campaign",
"steps": [
{
"step": 1,
"capability": "query_knowledge_base",
"parameters": {
"query": "webinar details, speakers, agenda, target audience",
"category": "webinar_briefs"
}
},
{
"step": 2,
"capability": "generate_content_series",
"parameters": {
"content_type": "social",
"topic": "Webinar: [Topic from KB]",
"series_count": 3,
"series_type": "social_campaign"
},
"depends_on": [1]
},
{
"step": 3,
"capability": "generate_content_series",
"parameters": {
"content_type": "email",
"topic": "Webinar: [Topic from KB]",
"series_count": 3,
"series_type": "webinar_sequence"
},
"depends_on": [1]
},
{
"step": 4,
"capability": "analyze_content",
"parameters": {
"content": "All generated social + email content",
"target_seo_score": 80,
"target_brand_alignment": 85
},
"depends_on": [2, 3]
},
{
"step": 5,
"capability": "refine_content",
"parameters": {
"original_content": "All generated content",
"analysis_results": "Analysis from step 4"
},
"depends_on": [2, 3, 4],
"condition": "step_4.meets_targets.overall === false"
}
]
}
Expected Output: - 3 social posts (LinkedIn, Twitter-style) - 3 emails (invite, reminder, urgency) - Analysis scores (SEO 80+, Brand 85+) - Refined versions if needed - Total execution time: 40-60 seconds
Pattern: Product Launch Content Suite¶
Use Case: Generate comprehensive launch content across channels
Workflow Steps: 1. Query product details from KB 2. Generate blog post series (3 posts: intro, benefits, use cases) 3. Generate social campaign (5 posts: teaser, launch, testimonial, feature spotlight, CTA) 4. Generate email sequence (4 emails: announcement, deep-dive, case study, offer) 5. Analyze all content 6. Refine if needed
Duration: 60-90 seconds Credits: ~60-80 credits
Pattern: Iterative Quality Improvement Loop¶
Use Case: Ensure content meets strict quality requirements
Workflow:
Loop (max 3 iterations):
1. Generate content
2. Analyze content
3. If targets met: BREAK
4. If targets not met: Refine content, CONTINUE
Example:
{
"task": "Generate high-quality blog post with quality assurance",
"steps": [
{"step": 1, "capability": "generate_text", "parameters": {"prompt": "..."}},
{"step": 2, "capability": "analyze_content", "parameters": {"content": "step_1", "target_seo_score": 90}},
{"step": 3, "capability": "refine_content", "depends_on": [1, 2], "condition": "step_2.meets_targets.overall === false"},
{"step": 4, "capability": "analyze_content", "parameters": {"content": "step_3"}, "depends_on": [3]},
{"step": 5, "capability": "refine_content", "depends_on": [3, 4], "condition": "step_4.meets_targets.overall === false"}
]
}
Best Practices: - Set realistic target scores (SEO 70-85, not 95+) - Limit refinement loops to 2-3 iterations (diminishing returns) - Always include strategic context (keywords, personas, GTM goals)
**Upload to S3:**
```bash
aws s3 cp agent-capabilities-reference.md s3://vellocity-platform-docs/
aws s3 cp workflow-patterns-library.md s3://vellocity-platform-docs/
aws s3 cp best-practices.md s3://vellocity-platform-docs/
Sync KB:
aws bedrock-agent start-ingestion-job \
--knowledge-base-id ABCDEFGHIJ \
--data-source-id <data-source-id>
Step 3: Add Configuration to .env¶
Step 4: Create Platform Knowledge Capability¶
Create new file: app/Extensions/ContentManager/System/Services/Capabilities/QueryPlatformKnowledgeCapability.php
<?php
namespace App\Extensions\ContentManager\System\Services\Capabilities;
use App\Extensions\ContentManager\System\Models\AgentExecution;
use App\Services\Bedrock\BedrockKnowledgeBaseService;
use App\Models\User;
/**
* QueryPlatformKnowledgeCapability
*
* Searches Vellocity platform documentation (capabilities, patterns, best practices)
* Available to all agents regardless of user/team
*/
class QueryPlatformKnowledgeCapability extends BaseCapability
{
public function getName(): string
{
return 'Query Platform Knowledge';
}
public function getDescription(): string
{
return 'Search Vellocity platform documentation for agent guidance, capability reference, and workflow patterns';
}
public function getRequiredParameters(): array
{
return ['query'];
}
public function getEstimatedCredits(array $parameters): int
{
return 2; // Same as regular KB query
}
public function execute(array $parameters, array $context, AgentExecution $execution): array
{
$query = $parameters['query'];
$platformKbId = config('services.bedrock.platform_kb_id');
$region = config('services.bedrock.platform_kb_region', 'us-east-1');
if (!$platformKbId) {
$this->log('Platform KB not configured', [
'env_var' => 'BEDROCK_PLATFORM_KB_ID',
]);
return [
'success' => false,
'results' => [],
'error' => 'Platform Knowledge Base not configured',
];
}
$this->log('Querying platform knowledge base', [
'query' => $query,
'kb_id' => $platformKbId,
]);
try {
$user = User::find($execution->user_id);
$bedrockKB = new BedrockKnowledgeBaseService($user, $region);
// Retrieve from platform KB
$results = $bedrockKB->retrieve(
$platformKbId,
$query,
$parameters['top_results'] ?? 5,
$parameters['min_similarity'] ?? 0.7
);
$this->log('Platform KB query completed', [
'results_found' => count($results),
]);
return [
'success' => true,
'results' => $results,
'query' => $query,
'platform_kb' => true,
'kb_id' => $platformKbId,
'num_results' => count($results),
'credits_used' => $this->getEstimatedCredits($parameters),
];
} catch (\Exception $e) {
$this->logError('Platform KB query failed', [
'error' => $e->getMessage(),
'query' => $query,
]);
// Return empty results to allow workflow to continue
return [
'success' => false,
'results' => [],
'query' => $query,
'error' => $e->getMessage(),
'credits_used' => 0,
];
}
}
}
Step 5: Register Capability¶
In CapabilityRegistry.php bootstrapDefaults():
[
'slug' => 'query_platform_knowledge',
'name' => 'Query Platform Knowledge',
'description' => 'Search Vellocity platform documentation for agent guidance, capability reference, and workflow patterns',
'category' => AgentCapability::CATEGORY_ANALYSIS,
'handler_class' => \App\Extensions\ContentManager\System\Services\Capabilities\QueryPlatformKnowledgeCapability::class,
'required_settings' => ['query'],
'default_settings' => ['top_results' => 5, 'min_similarity' => 0.7],
'requires_credits' => true,
'estimated_credits' => 2,
'is_active' => true,
],
Run registration:
Step 6: Update WorkflowPlanner Instructions¶
In WorkflowPlanner.php buildPlanningPrompt():
$sections[] = "\n# Platform Knowledge Access";
$sections[] = "You have access to Vellocity platform documentation via query_platform_knowledge.";
$sections[] = "Use this to learn about:";
$sections[] = "- Available capabilities and their parameters";
$sections[] = "- Workflow patterns and best practices";
$sections[] = "- How to structure complex multi-step workflows";
$sections[] = "\nExample: Before planning a webinar campaign, query: \"How to structure a webinar promotion workflow with email and social series\"";
Step 7: Test Platform KB Access¶
Via Tinker:
php artisan tinker
use App\Extensions\ContentManager\System\Models\Agent;
use App\Extensions\ContentManager\System\Models\AgentExecution;
use App\Extensions\ContentManager\System\Services\Capabilities\QueryPlatformKnowledgeCapability;
use App\Models\User;
$user = User::first();
$agent = Agent::first();
$execution = AgentExecution::create([
'agent_id' => $agent->id,
'user_id' => $user->id,
'task_description' => 'Test platform KB',
'status' => 'running',
]);
$capability = new QueryPlatformKnowledgeCapability();
$result = $capability->execute(
['query' => 'How to use generate_content_series capability?'],
[],
$execution
);
dd($result);
// Should return results from platform documentation
Pros:¶
- ✅ Complete separation from user data
- ✅ Centralized platform documentation management
- ✅ AWS-optimized vector search (faster, more accurate)
- ✅ Easy to update (upload new docs to S3, auto-syncs)
- ✅ Can track platform KB usage separately
- ✅ Scales to thousands of agents querying simultaneously
Cons:¶
- ❌ Requires AWS Bedrock setup
- ❌ Additional monthly cost (~$50-100 for small KB)
- ❌ More initial configuration
Recommendation: Path 2 (Dedicated Bedrock KB)¶
I strongly recommend Path 2 because:
- Separation of Concerns: Platform docs shouldn't mix with user data
- Performance: Bedrock KB is optimized for high-volume RAG
- Management: Update S3 → auto-syncs to KB (no manual re-embedding)
- Observability: Monitor platform KB queries separately from user KB
- Security: Platform docs can have different access controls
Example Platform KB Query Flow¶
User task: "Create a webinar promotion campaign"
Agent workflow planning with platform KB:
Step 0 (Planning Phase):
→ WorkflowPlanner queries platform KB: "How to structure a webinar promotion workflow?"
→ Platform KB returns: Workflow Patterns > Webinar Promotion Campaign pattern
→ Planner now knows: query_knowledge_base → generate_content_series (social) → generate_content_series (email) → analyze_content → refine_content
Step 1: query_knowledge_base
Parameters: {"query": "webinar details, speakers, agenda"}
→ Searches USER KB for customer-specific webinar info
Step 2: query_platform_knowledge
Parameters: {"query": "generate_content_series parameters for webinar email sequence"}
→ Searches PLATFORM KB for capability documentation
→ Returns: "series_type: webinar_sequence means Email 1 = Invite, Email 2 = Reminder, Email 3 = Last-chance"
Step 3: generate_content_series
Parameters: {
"content_type": "email",
"series_type": "webinar_sequence", // ← Learned from platform KB
"series_count": 3
}
Result: Agent uses platform KB to learn HOW to use capabilities correctly, then uses user KB to get customer-specific context.
Next Steps¶
Immediate (Test the Fix)¶
- Run a new webinar campaign execution
- Verify Step #5 refinement stays on topic (webinar content)
- Check execution trace shows actual content being refined
Short-term (Fine-tuned Models)¶
- Evaluate if RAG + foundation models meet your needs (try first!)
- If fine-tuning needed, prepare training dataset (>1,000 examples)
- Create fine-tuning job in AWS Bedrock
- Add custom model ID to agent configuration
Medium-term (Platform RAG)¶
- Create Bedrock Knowledge Base for platform docs
- Upload documentation:
- Capability reference (all parameters, examples)
- Workflow patterns (webinar, product launch, etc.)
- Best practices (when to refine, target scores, etc.)
- Implement QueryPlatformKnowledgeCapability
- Update WorkflowPlanner to query platform KB during planning
- Test: Agent should reference platform docs when structuring workflows
Long-term (Advanced)¶
- Multi-modal platform KB: Include screenshots, diagrams, video transcripts
- Version control: Track platform KB changes, rollback if needed
- Agent feedback loop: Agents report unclear docs, auto-improve platform KB
- Cross-company patterns: Learn from successful workflows, add to platform KB
Summary¶
What Was Fixed¶
✅ Refinement drift resolved - agents now refine actual content, not random topics
Questions Answered¶
✅ Fine-tuned models: Yes, supported via custom model IDs (add to BedrockEngine or agent.custom_model_id field) ✅ Platform RAG: Yes, implement via dedicated Bedrock KB with QueryPlatformKnowledgeCapability
Recommended Implementation Order¶
- Test the refinement fix (ready now)
- Implement platform RAG (high value, moderate effort)
- Consider fine-tuning (only if RAG insufficient, high cost)
Files Modified:
- app/Extensions/ContentManager/System/Services/Capabilities/RefineContentCapability.php
Files to Create (Platform RAG):
- app/Extensions/ContentManager/System/Services/Capabilities/QueryPlatformKnowledgeCapability.php
- Platform documentation files (markdown → S3 → Bedrock KB)
Configuration Needed: