darwin-skill is an experimental framework designed to automatically improve AI agent “skills” through iterative evaluation and optimization loops inspired by machine learning training processes. Instead of treating prompts or skill definitions as static assets, the system applies a continuous improvement cycle that evaluates performance, proposes changes, tests outcomes, and either retains or reverts modifications. The framework introduces a scoring system across multiple dimensions, enabling quantitative assessment of skill quality and ensuring that only improvements are preserved over time. It incorporates a “ratchet mechanism” similar to version control workflows, guaranteeing that performance never degrades as iterations progress. The system also separates the agents responsible for editing and evaluating skills to avoid bias, which improves the reliability of optimization results.
Features
- Iterative skill optimization loop with evaluation cycles
- Automatic improvement with keep or revert logic
- Multi-dimensional scoring system for performance
- Independent evaluation to reduce bias
- Human-in-the-loop validation checkpoints
- Version-controlled evolution of skill definitions