[prompt-clustering] Copilot Agent Prompt Clustering Analysis - December 2025 #5688

2025-12-06T19:24:54Z

github-actions[bot]
bot Dec 6, 2025

🔬 Copilot Agent Prompt Clustering Analysis

Analysis Date: 2025-12-06
Data Period: 2025-10-22 to 2025-12-06
Repository: githubnext/gh-aw

Summary

This analysis applies advanced NLP clustering techniques to identify common patterns across 1,620 copilot agent task prompts. Using TF-IDF vectorization and K-means clustering, we discovered 9 distinct task categories with varying success rates and complexity levels.

Key Metrics:

Total Tasks Analyzed: 1,620
Clusters Identified: 9
Overall Success Rate: 75.4%
Merged PRs: 1,222
Average Commits per Task: 3.6
Average Files Changed: 17.0

🎯 Key Insights

Most Common Task Type: New Features (419 tasks, 25.9%)
Highest Success Rate: Code Refactoring tasks (83.3%)
Most Complex: MCP Server Integration tasks (4.5 avg commits, 27.0 files)
Best Performer: Documentation tasks show consistently high success (78-81%)

Full Clustering Analysis Report

Methodology

This analysis uses:

TF-IDF Vectorization: Extract semantic features from prompt text (200 features)
K-means Clustering: Group similar prompts using silhouette score optimization (k=9)
PCA Visualization: 2D projection of high-dimensional prompt embeddings
Statistical Analysis: Success rates, complexity metrics, and pattern identification

Cluster Analysis

Each cluster represents a distinct pattern of task types based on NLP analysis of prompt text.

Cluster 1: New Features 🆕

Size: 419 tasks (25.9% of total)
Success Rate: 69.9% (293 merged)
Top Keywords: add, agent, safe, job, output, javascript, step, pr

Characteristics:

Average commits: 3.9
Average reviews: 1.3
Average comments: 2.0
Average files changed: 26.9
Average additions: 1,266 lines

Analysis: This is the largest cluster, representing feature additions and new functionality. The below-average success rate (69.9% vs 75.4%) suggests these tasks are more ambitious and may encounter scope creep. The high file count (26.9 files) indicates broad system impact.

Example PRs:

#2099: Add directory creation for copilot engine --add-dir paths
#2108: Remove js-yaml dependency from badge generator
#2112: Restore AI instructions in README.md

Cluster 2: Bug Fixes & Issue Resolution 🐛

Size: 303 tasks (18.7% of total)
Success Rate: 76.2% (231 merged)
Top Keywords: aw, gh, gh aw, section, issue, comments, workflows

Characteristics:

Average commits: 3.0
Average reviews: 1.5
Average comments: 0.9
Average files changed: 10.6
Average additions: 392 lines

Analysis: Second-largest cluster with above-average success rate. These tasks are more focused (10.6 files vs 17.0 average) and require fewer iterations. The lower additions count suggests surgical fixes rather than broad changes.

Example PRs:

#2228: Fix audit command to cache downloads
#2235: Fix: Raise error when workflow hits max-turns limit

Cluster 3: Documentation Updates 📝

Size: 271 tasks (16.7% of total)
Success Rate: 81.5% (221 merged)
Top Keywords: update, workflows, github, use, command, md, actions, error

Characteristics:

Average commits: 3.8
Average reviews: 1.7
Average comments: 1.8
Average files changed: 14.6
Average additions: 607 lines

Analysis: High success rate (81.5%) with moderate complexity. Documentation tasks are well-defined and measurable, leading to strong completion rates. More reviews suggest collaborative refinement.

Example PRs:

#2097: Add imports documentation
#2102: Add workflow status badges page
#2134: Add HTML details/summary formatting

Cluster 4: Workflow Management 🔄

Size: 178 tasks (11.0% of total)
Success Rate: 77.5% (138 merged)
Top Keywords: agentic, workflow, agentic workflow, daily, shared, add, create

Characteristics:

Average commits: 3.5
Average reviews: 1.6
Average comments: 1.8
Average files changed: 7.7
Average additions: 1,834 lines

Analysis: Despite fewer files changed (7.7), these tasks have high line additions (1,834), suggesting creation of new workflow files. Good success rate indicates well-understood patterns.

Example PRs:

#2103: Add smoke-outpost workflow
#2109: Add semantic function refactoring workflow
#2147: Add dictation prompt generator workflow

Cluster 5: Code Refactoring 🔨

Size: 156 tasks (9.6% of total)
Success Rate: 83.3% (130 merged) ⭐ BEST
Top Keywords: pkg, pkg workflow, workflow, functions, code, duplicate, function

Characteristics:

Average commits: 3.5
Average reviews: 1.6
Average comments: 1.4
Average files changed: 12.5
Average additions: 1,084 lines

Analysis: Highest success rate across all clusters! Refactoring tasks are well-scoped with clear objectives. Lower comment count suggests less back-and-forth, indicating well-understood requirements.

Example PRs:

#2171: Refactor duplicate MCP code patterns
#2249: Extract duplicate GitHub MCP remote config rendering

Cluster 6: Release & Version Management 📦

Size: 91 tasks (5.6% of total)
Success Rate: 78.0% (71 merged)
Top Keywords: version, cli, changes, package, updates, release, update

Characteristics:

Average commits: 2.8 ⭐ SIMPLEST
Average reviews: 0.9
Average comments: 1.3
Average files changed: 23.2
Average additions: 209 lines

Analysis: Simplest tasks by commit count (2.8) with high file count (23.2) but low additions (209). This pattern suggests bulk version bumps across many files with minimal per-file changes.

Example PRs:

#2170: Update blog auditor to validate syntax
#2208: Optimize CLI version checker workflow

Cluster 7: MCP Server Integration 🔌

Size: 85 tasks (5.2% of total)
Success Rate: 67.1% (57 merged)
Top Keywords: mcp, server, mcp server, tools, json, tool, github

Characteristics:

Average commits: 4.5 ⭐ MOST COMPLEX
Average reviews: 1.4
Average comments: 3.3
Average files changed: 27.0
Average additions: 2,837 lines

Analysis: Most complex cluster with highest commit count (4.5) and most additions (2,837 lines). High comment count (3.3) suggests more iteration and clarification needed. Below-average success rate indicates these are challenging tasks.

Example PRs:

#2167: Fix OpenCode MCP server integration
#2218: Add feature flags support
#2255: Replace GITHUB_PERSONAL_ACCESS_TOKEN

Cluster 8: Network & Security Features 🔐

Size: 71 tasks (4.4% of total)
Success Rate: 62.0% (44 merged)
Top Keywords: firewall, network, logs, add, workflow, agentic, engine

Characteristics:

Average commits: 3.7
Average reviews: 1.4
Average comments: 2.3
Average files changed: 12.5
Average additions: 1,234 lines

Analysis: Lowest success rate (62.0%) suggests these are experimental or challenging features. Network and security changes often require more testing and validation.

Example PRs:

#2156: Integrate gh-aw-firewall
#2211: Replace GH_AW_FEATURES with network.firewall config

Cluster 9: Test & Quality Improvements ✅

Size: 46 tasks (2.8% of total)
Success Rate: 80.4% (37 merged)
Top Keywords: fix, tests, javascript, docs, test, issues

Characteristics:

Average commits: 3.0
Average reviews: 1.0
Average comments: 0.9
Average files changed: 10.8
Average additions: 157 lines

Analysis: Small but high-performing cluster. Test fixes are typically straightforward with clear success criteria. Low review/comment counts indicate minimal back-and-forth.

Example PRs:

#2153: Fix TestStopTimeResolutionIntegration
#2320: Fix test expectation for MCP server name

Cluster Summary Table

Cluster	Theme	Tasks	Success	Avg Commits	Avg Files	Top Keywords
1	New Features	419 (25.9%)	69.9%	3.9	26.9	add, agent, safe
2	Bug Fixes	303 (18.7%)	76.2%	3.0	10.6	aw, gh, issue
3	Documentation	271 (16.7%)	81.5%	3.8	14.6	update, workflows
4	Workflow Mgmt	178 (11.0%)	77.5%	3.5	7.7	agentic, workflow
5	Refactoring	156 (9.6%)	83.3% ⭐	3.5	12.5	pkg, code
6	Release Mgmt	91 (5.6%)	78.0%	2.8	23.2	version, cli
7	MCP Integration	85 (5.2%)	67.1%	4.5	27.0	mcp, server
8	Network/Security	71 (4.4%)	62.0%	3.7	12.5	firewall, network
9	Test/Quality	46 (2.8%)	80.4%	3.0	10.8	fix, tests

Key Findings

🎯 Success Patterns

High Performers (>80% success):

Code Refactoring (83.3%): Well-scoped with clear objectives
Documentation (81.5%): Measurable outcomes, collaborative refinement
Test/Quality (80.4%): Clear success criteria, straightforward fixes

Moderate Performers (75-80% success):

Release Management (78.0%): Routine version updates
Workflow Management (77.5%): Established patterns
Bug Fixes (76.2%): Focused scope, surgical changes

Lower Performers (<70% success):

New Features (69.9%): Ambitious scope, broad impact
MCP Integration (67.1%): Complex, high iteration
Network/Security (62.0%): Experimental, validation-heavy

📊 Complexity Analysis

Most Complex (by commits & files):

MCP Server Integration: 4.5 commits, 27.0 files, 2,837 lines
New Features: 3.9 commits, 26.9 files, 1,266 lines
Documentation: 3.8 commits, 14.6 files, 607 lines

Simplest (by commits):

Release Management: 2.8 commits, 23.2 files, 209 lines
Bug Fixes: 3.0 commits, 10.6 files, 392 lines
Test/Quality: 3.0 commits, 10.8 files, 157 lines

Key Insight: Simplicity ≠ Few Files. Release management touches many files (23.2) but with minimal changes (209 lines). MCP integration touches similar files (27.0) but with massive additions (2,837 lines).

💡 Task Type Insights

Documentation & Refactoring Win: Tasks with clear scope and measurable outcomes show highest success rates (80%+).

Feature Development Struggles: New features and integrations show lower success (62-70%), suggesting need for better scoping or multi-phase approaches.

Complexity vs Success: Inverse correlation exists - most complex tasks (MCP integration at 4.5 commits) show lowest success (67.1%).

Review Patterns: Documentation tasks have most reviews (1.7 avg), suggesting collaborative refinement improves quality.

Recommendations

✅ Leverage High-Performing Patterns

Prioritize Refactoring Tasks (83.3% success)
- These are "quick wins" with clear scope
- Use as confidence builders for new agents
- Pattern: "Extract duplicate X into shared Y"
Documentation Excellence (81.5% success)
- Well-defined objectives lead to strong outcomes
- Collaborative review process improves quality
- Pattern: "Update X docs with Y format/syntax"
Test Fixes Are Reliable (80.4% success)
- Clear pass/fail criteria reduce ambiguity
- Low iteration count suggests good understanding
- Pattern: "Fix test Z to check for W"

🔍 Improve Struggling Categories

Break Down New Features (69.9% success)
- Split large features into smaller, testable increments
- 26.9 files average suggests too broad scope
- Recommendation: Phase 1 (core), Phase 2 (integration), Phase 3 (polish)
Scope MCP Integration Tasks (67.1% success)
- These are most complex (4.5 commits, 2,837 lines)
- High comment count (3.3) indicates unclear requirements
- Recommendation: Provide detailed integration specs upfront
De-risk Network/Security Work (62.0% success)
- Experimental features need more validation
- Consider proof-of-concept before full implementation
- Recommendation: Two-phase approach (spike + implementation)

🎯 General Best Practices

Prompt Clarity Matters: Clusters with specific keywords (refactor, update, fix) outperform vague prompts (add, create)
Scope Inverse Success: Tasks touching <15 files show 78%+ success; >25 files show <70% success
Iteration Sweet Spot: 3-4 commits is optimal. <3 may be too simple, >4 suggests unclear requirements
Review Correlation: Documentation (1.7 reviews) succeeds more than features (1.3 reviews), suggesting collaborative refinement helps

📋 Prompt Engineering Tips

High-Success Prompt Patterns:

✅ "Refactor duplicate X into shared Y"
✅ "Update X documentation with Y syntax"
✅ "Fix test Z to check for W"

Low-Success Prompt Patterns:

⚠️ "Add support for X" (too vague)
⚠️ "Integrate X into Y" (too broad)
⚠️ "Implement X feature" (unclear scope)

Improvement Strategy:

Add specific success criteria
Break into phases with clear deliverables
Provide technical context and constraints
Link to similar completed examples

Visualizations

The analysis generated three visualization charts (available in workflow artifacts):

Optimal Clusters Chart: Shows elbow method and silhouette scores for cluster selection
Cluster Visualization: 2D PCA projection of prompt embeddings colored by cluster
Cluster Statistics: Comparative bar charts for size, success rate, commits, and files

Methodology & Data Quality

Data Source: 1,620 copilot-created PRs from 2025-10-22 to 2025-12-06
Prompt Extraction: Extracted from PR body "Original prompt" sections
NLP Pipeline: TF-IDF vectorization → K-means clustering → PCA visualization
Validation: Silhouette score optimization (k=9, score=0.056)
Skipped: 18 PRs with missing or invalid prompts

Analysis generated: 2025-12-06 19:22:03 UTC
Repository: githubnext/gh-aw
Total PRs analyzed: 1,620
Clustering algorithm: K-means (k=9)
Feature extraction: TF-IDF (200 features, 1-3 grams)

AI generated by Copilot Agent Prompt Clustering Analysis

2025-12-08T19:24:04Z

github-actions[bot]
bot Dec 8, 2025
Author

⚓ Avast! This discussion be marked as outdated by Copilot Agent Prompt Clustering Analysis.
🗺️ A newer treasure map awaits ye at Discussion #5900.
Fair winds, matey! 🏴‍☠️

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[prompt-clustering] Copilot Agent Prompt Clustering Analysis - December 2025 #5688

Uh oh!

{{title}}

Uh oh!

Methodology

Cluster Analysis

Cluster 1: New Features 🆕

Cluster 2: Bug Fixes & Issue Resolution 🐛

Cluster 3: Documentation Updates 📝

Cluster 4: Workflow Management 🔄

Cluster 5: Code Refactoring 🔨

Cluster 6: Release & Version Management 📦

Cluster 7: MCP Server Integration 🔌

Cluster 8: Network & Security Features 🔐

Cluster 9: Test & Quality Improvements ✅

Cluster Summary Table

Key Findings

🎯 Success Patterns

📊 Complexity Analysis

💡 Task Type Insights

Recommendations

✅ Leverage High-Performing Patterns

🔍 Improve Struggling Categories

🎯 General Best Practices

📋 Prompt Engineering Tips

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[prompt-clustering] Copilot Agent Prompt Clustering Analysis - December 2025 #5688

Uh oh!

github-actions[bot] bot Dec 6, 2025

🔬 Copilot Agent Prompt Clustering Analysis

Summary

🎯 Key Insights

Methodology

Cluster Analysis

Cluster 1: New Features 🆕

Cluster 2: Bug Fixes & Issue Resolution 🐛

Cluster 3: Documentation Updates 📝

Cluster 4: Workflow Management 🔄

Cluster 5: Code Refactoring 🔨

Cluster 6: Release & Version Management 📦

Cluster 7: MCP Server Integration 🔌

Cluster 8: Network & Security Features 🔐

Cluster 9: Test & Quality Improvements ✅

Cluster Summary Table

Key Findings

🎯 Success Patterns

📊 Complexity Analysis

💡 Task Type Insights

Recommendations

✅ Leverage High-Performing Patterns

🔍 Improve Struggling Categories

🎯 General Best Practices

📋 Prompt Engineering Tips

Visualizations

Methodology & Data Quality

Replies: 1 comment

Uh oh!

github-actions[bot] bot Dec 8, 2025 Author

github-actions[bot]
bot Dec 6, 2025

github-actions[bot]
bot Dec 8, 2025
Author