evaluation rubric

GenAI Content Governance: Evaluation Rubric

The Context: Generative AI accelerates content production, but speed without governance creates technical debt. This rubric serves as the architectural guardrail for all LLM-generated documentation, ensuring that automated output meets enterprise standards for accuracy, brand alignment, and structural integrity before it reaches the deployment pipeline.

Scoring Methodology: This is a binary validation system. A piece of generated content must pass all critical checks in all four dimensions to be approved for deployment.

Dimension 1: Technical & Architectural Integrity

The content must be factually flawless and structurally sound.

  • Code Validity: Are all generated code snippets, JSON payloads, and CLI commands syntactically correct and tested against the current API version?
  • Feature Accuracy: Does the text accurately describe the product's actual capabilities rather than hallucinated features?
  • Prerequisite Clarity: Are all necessary dependencies, permissions, and environmental setups explicitly stated before instructions begin?

Dimension 2: Structural & Formatting Compliance

The content must adhere to strict Docs-as-Code hygiene.

  • Markdown Hygiene: Is the document formatted in clean, standard Markdown without injected HTML or broken rendering tags?
  • Information Hierarchy: Does the output follow a logical flow (e.g., H1 for Title, H2 for Major Sections, H3 for Sub-steps) without skipping heading levels?
  • Modularity: Can this content be easily broken down into smaller, reusable components for the larger content ecosystem?

Dimension 3: Brand Voice & Semantic Alignment

The content must sound like our brand, not a generic LLM.

  • Fluff Elimination: Has all standard AI filler language (e.g., "In conclusion," "It is important to note," "Delve," "Transformative") been aggressively removed?
  • Active Voice: Are instructions written in the imperative mood and active voice (e.g., "Click the button" instead of "The button should be clicked")?
  • Density: Is the information density high? Does every sentence serve a functional purpose for the user?

Dimension 4: Inclusivity & Accessibility

The content must be usable by a global, diverse engineering audience.

  • Plain Language: Is the vocabulary simple and direct, avoiding unnecessary idioms, jargon, or complex regional metaphors?
  • Inclusive Terminology: Has the content been scrubbed of legacy exclusionary terms (e.g., using "allowlist/blocklist" instead of "whitelist/blacklist")?
  • Visual Accessibility: Do all generated tables have clear headers, and do all suggested images include descriptive alt-text?