π Protected via Cloudflare Access
Worst-scoring seeds β golden95-cfai-v6
91 successful parses, 4 parse fails. Ranked by mean judge score across available numeric judges.
Hard parse failures
agent.env.statusβ parse failedgmail.api.readyβ parse failedgmail.inbox.scanβ parse failedlocal.files.readyβ parse failed
Worst successful seeds
1. project.retro.run β avg 4.4/10
- parent-child-coherence: 0.0/10 β The parent contract only states 'parent' which provides no meaningful guarantee about what context or state exists for the child seed to build upon.
- contract-independence: 2.0/10 β The contract directly parrots the prompt's structure and terminology ('three sections', 'what went well', 'what didn't go well', 'what to try differently', 'action owner', 'topβ3 improvement list') with only 'is present in context' appended.
- referential-clarity: 4.0/10 β Contract uses 'is present in context' and assumes external context exists.
2. project.email.update β avg 4.9/10
- contract-independence: 2.0/10 β Directly parrots the prompt's requirements, then appends 'exists in the context' without asserting real world state.
- distinctiveness: 2.0/10 β Judge saw it as weakly differentiated from neighboring seeds.
- parent-child-coherence: 2.0/10 β Child assumes a status report exists, but the parent contract had collapsed.
3. data.report.narrative β avg 4.9/10
- parent-child-coherence: 0.0/10 β Child assumes data findings exist, but parent guarantee is effectively empty.
- contract-independence: 2.0/10 β Contract parrots the prompt's wording and constraints.
- navigability: 3.0/10 β Hard to verify whether the summary is truly plain-language and audience-appropriate.
4. research.competitor.scan β avg 5.0/10
- contract-independence: 2.0/10 β Prompt restatement with 'exists in context' added.
- token-density: 2.0/10 β Too much padding: 'structured', 'containing', 'exists in the context', etc.
- navigability: 3.0/10 β Contract lacks crisp observable pass/fail criteria.
5. project.status.report β avg 5.0/10
- parent-child-coherence: 0.0/10 β Parent contract provides no usable project state.
- contract-independence: 2.0/10 β Essentially a copy-paste summary of the prompt.
- referential-clarity: 3.0/10 β Relies on 'exists in context' and vague report framing.
6. gmail.priority.brief β avg 5.1/10
- contract-independence: 2.0/10 β Parrots the seed prompt almost verbatim.
- referential-clarity: 2.0/10 β Uses definite references like 'the top three unread emails'.
- navigability: 3.0/10 β Too subjective to verify cleanly.
7. research.market.size β avg 5.2/10
- contract-independence: 2.0/10 β Prompt terminology (TAM/SAM/SOM, confidence, citations) copied straight through.
- parent-child-coherence: 3.0/10 β Parent only guarantees a general topic, not market-sizing context.
- referential-clarity: 4.0/10 β Still leans on 'exists in context'.
8. web.news.digest β avg 5.2/10
- contract-independence: 2.0/10 β Repeats article count + fields + grouping requirements.
- token-density: 3.0/10 β Verbose and padded.
- referential-clarity: 4.0/10 β Vague deliverable framing.
9. agent.workflow.log β avg 5.2/10
- contract-independence: 2.0/10 β Reads like a summary of the prompt, not an independent post-state.
- parent-child-coherence: 2.0/10 β Parent doesn't guarantee a completed workflow exists.
- token-density: 3.0/10 β Too many filler phrases.
10. project.tasks.breakdown β avg 5.3/10
- parent-child-coherence: 0.0/10 β Parent contract collapsed, so child assumptions float.
- contract-independence: 2.0/10 β Restates the prompt field-by-field.
- token-density: 3.0/10 β Padded phrasing.
11. email.thread.load β avg 5.4/10
- contract-independence: 2.0/10 β Copies prompt details into the contract.
- referential-clarity: 2.0/10 β Uses definite refs like 'the full email thread'.
- token-density: 4.0/10 β Extra passive phrasing.
12. regex.pattern.build β avg 5.4/10
- contract-independence: 2.0/10 β Direct prompt restatement.
- distinctiveness: 3.0/10 β Judge saw weak differentiation.
- referential-clarity: 4.0/10 β Overuses user-bound phrasing.
Bottom 5 by judge
contract-independence
project.retro.runβ 2.0/10project.email.updateβ 2.0/10data.report.narrativeβ 2.0/10research.competitor.scanβ 2.0/10project.status.reportβ 2.0/10
referential-clarity
gmail.priority.briefβ 2.0/10email.thread.loadβ 2.0/10disk.reclaim.auditβ 2.0/10meeting.notes.processβ 2.0/10project.status.reportβ 3.0/10
navigability
data.report.narrativeβ 3.0/10research.competitor.scanβ 3.0/10gmail.priority.briefβ 3.0/10profile.intro.writeβ 3.0/10project.retro.runβ 4.0/10
contract-concreteness
project.retro.runβ 8.0/10project.email.updateβ 8.0/10data.report.narrativeβ 8.0/10research.market.sizeβ 8.0/10agent.workflow.logβ 8.0/10
token-density
research.competitor.scanβ 2.0/10research.sales.battlecardβ 2.0/10data.report.narrativeβ 3.0/10web.news.digestβ 3.0/10agent.workflow.logβ 3.0/10
slug-compression
web.news.digestβ 4.0/10topic.focus.setβ 4.0/10api.endpoint.testβ 4.0/10meeting.transcript.cleanβ 4.0/10data.report.narrativeβ 6.0/10
distinctiveness
project.email.updateβ 2.0/10api.endpoint.testβ 2.0/10seed.seed.seedβ 2.0/10regex.pattern.buildβ 3.0/10web.price.compareβ 3.0/10
parent-child-coherence
project.retro.runβ 0.0/10data.report.narrativeβ 0.0/10project.status.reportβ 0.0/10project.tasks.breakdownβ 0.0/10project.email.updateβ 2.0/10
grammar
gmail.priority.briefβ 4.0/10web.news.digestβ 4.0/10agent.workflow.logβ 4.0/10regex.pattern.buildβ 4.0/10meeting.transcript.cleanβ 4.0/10