[prompt-clustering] Copilot Agent Prompt Clustering Analysis - December 2025 #5688
Closed
Replies: 1 comment
-
|
⚓ Avast! This discussion be marked as outdated by Copilot Agent Prompt Clustering Analysis. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
🔬 Copilot Agent Prompt Clustering Analysis
Analysis Date: 2025-12-06
Data Period: 2025-10-22 to 2025-12-06
Repository: githubnext/gh-aw
Summary
This analysis applies advanced NLP clustering techniques to identify common patterns across 1,620 copilot agent task prompts. Using TF-IDF vectorization and K-means clustering, we discovered 9 distinct task categories with varying success rates and complexity levels.
Key Metrics:
🎯 Key Insights
Full Clustering Analysis Report
Methodology
This analysis uses:
Cluster Analysis
Each cluster represents a distinct pattern of task types based on NLP analysis of prompt text.
Cluster 1: New Features 🆕
Size: 419 tasks (25.9% of total)
Success Rate: 69.9% (293 merged)
Top Keywords: add, agent, safe, job, output, javascript, step, pr
Characteristics:
Analysis: This is the largest cluster, representing feature additions and new functionality. The below-average success rate (69.9% vs 75.4%) suggests these tasks are more ambitious and may encounter scope creep. The high file count (26.9 files) indicates broad system impact.
Example PRs:
Cluster 2: Bug Fixes & Issue Resolution 🐛
Size: 303 tasks (18.7% of total)
Success Rate: 76.2% (231 merged)
Top Keywords: aw, gh, gh aw, section, issue, comments, workflows
Characteristics:
Analysis: Second-largest cluster with above-average success rate. These tasks are more focused (10.6 files vs 17.0 average) and require fewer iterations. The lower additions count suggests surgical fixes rather than broad changes.
Example PRs:
Cluster 3: Documentation Updates 📝
Size: 271 tasks (16.7% of total)
Success Rate: 81.5% (221 merged)
Top Keywords: update, workflows, github, use, command, md, actions, error
Characteristics:
Analysis: High success rate (81.5%) with moderate complexity. Documentation tasks are well-defined and measurable, leading to strong completion rates. More reviews suggest collaborative refinement.
Example PRs:
Cluster 4: Workflow Management 🔄
Size: 178 tasks (11.0% of total)
Success Rate: 77.5% (138 merged)
Top Keywords: agentic, workflow, agentic workflow, daily, shared, add, create
Characteristics:
Analysis: Despite fewer files changed (7.7), these tasks have high line additions (1,834), suggesting creation of new workflow files. Good success rate indicates well-understood patterns.
Example PRs:
Cluster 5: Code Refactoring 🔨
Size: 156 tasks (9.6% of total)
Success Rate: 83.3% (130 merged) ⭐ BEST
Top Keywords: pkg, pkg workflow, workflow, functions, code, duplicate, function
Characteristics:
Analysis: Highest success rate across all clusters! Refactoring tasks are well-scoped with clear objectives. Lower comment count suggests less back-and-forth, indicating well-understood requirements.
Example PRs:
Cluster 6: Release & Version Management 📦
Size: 91 tasks (5.6% of total)
Success Rate: 78.0% (71 merged)
Top Keywords: version, cli, changes, package, updates, release, update
Characteristics:
Analysis: Simplest tasks by commit count (2.8) with high file count (23.2) but low additions (209). This pattern suggests bulk version bumps across many files with minimal per-file changes.
Example PRs:
Cluster 7: MCP Server Integration 🔌
Size: 85 tasks (5.2% of total)
Success Rate: 67.1% (57 merged)
Top Keywords: mcp, server, mcp server, tools, json, tool, github
Characteristics:
Analysis: Most complex cluster with highest commit count (4.5) and most additions (2,837 lines). High comment count (3.3) suggests more iteration and clarification needed. Below-average success rate indicates these are challenging tasks.
Example PRs:
Cluster 8: Network & Security Features 🔐
Size: 71 tasks (4.4% of total)
Success Rate: 62.0% (44 merged)
Top Keywords: firewall, network, logs, add, workflow, agentic, engine
Characteristics:
Analysis: Lowest success rate (62.0%) suggests these are experimental or challenging features. Network and security changes often require more testing and validation.
Example PRs:
Cluster 9: Test & Quality Improvements ✅
Size: 46 tasks (2.8% of total)
Success Rate: 80.4% (37 merged)
Top Keywords: fix, tests, javascript, docs, test, issues
Characteristics:
Analysis: Small but high-performing cluster. Test fixes are typically straightforward with clear success criteria. Low review/comment counts indicate minimal back-and-forth.
Example PRs:
Cluster Summary Table
Key Findings
🎯 Success Patterns
High Performers (>80% success):
Moderate Performers (75-80% success):
Lower Performers (<70% success):
📊 Complexity Analysis
Most Complex (by commits & files):
Simplest (by commits):
Key Insight: Simplicity ≠ Few Files. Release management touches many files (23.2) but with minimal changes (209 lines). MCP integration touches similar files (27.0) but with massive additions (2,837 lines).
💡 Task Type Insights
Documentation & Refactoring Win: Tasks with clear scope and measurable outcomes show highest success rates (80%+).
Feature Development Struggles: New features and integrations show lower success (62-70%), suggesting need for better scoping or multi-phase approaches.
Complexity vs Success: Inverse correlation exists - most complex tasks (MCP integration at 4.5 commits) show lowest success (67.1%).
Review Patterns: Documentation tasks have most reviews (1.7 avg), suggesting collaborative refinement improves quality.
Recommendations
✅ Leverage High-Performing Patterns
Prioritize Refactoring Tasks (83.3% success)
Documentation Excellence (81.5% success)
Test Fixes Are Reliable (80.4% success)
🔍 Improve Struggling Categories
Break Down New Features (69.9% success)
Scope MCP Integration Tasks (67.1% success)
De-risk Network/Security Work (62.0% success)
🎯 General Best Practices
Prompt Clarity Matters: Clusters with specific keywords (refactor, update, fix) outperform vague prompts (add, create)
Scope Inverse Success: Tasks touching <15 files show 78%+ success; >25 files show <70% success
Iteration Sweet Spot: 3-4 commits is optimal. <3 may be too simple, >4 suggests unclear requirements
Review Correlation: Documentation (1.7 reviews) succeeds more than features (1.3 reviews), suggesting collaborative refinement helps
📋 Prompt Engineering Tips
High-Success Prompt Patterns:
Low-Success Prompt Patterns:
Improvement Strategy:
Visualizations
The analysis generated three visualization charts (available in workflow artifacts):
Methodology & Data Quality
Analysis generated: 2025-12-06 19:22:03 UTC
Repository: githubnext/gh-aw
Total PRs analyzed: 1,620
Clustering algorithm: K-means (k=9)
Feature extraction: TF-IDF (200 features, 1-3 grams)
Beta Was this translation helpful? Give feedback.
All reactions