Skip to content

Rules

groundlens.rules

Rule-based interpretable layer — deterministic, auditable, no LLM.

This module provides a checklist-style rule engine that complements the geometric SGI/DGI scores with human-readable audit evidence. A trained auditor or compliance officer can read the textual explanation produced by a :class:RuleSet evaluation and verify, item by item, why a response passed or failed.

The rule engine is intentionally rule-based rather than learning-based:

  • Deterministic. Same inputs → same outputs, byte-identical.
  • Auditable. Every pass/fail decision cites the rule, the weight, and the matched evidence span in the response text.
  • No LLM. Pattern matching, substring tests, and regular expressions. Compatible with the no-second-LLM constraint of groundlens.
  • Domain-specific. Built-in factories (:func:banking_rules) exist for regulated domains; custom rule sets can be assembled from :class:ChecklistRule instances or loaded from configuration.

Sub-scores follow the structure of compliance rationale evaluation in regulated AI literature: specificity (does the response cite concrete case details?), explanatory linkage (does it explain the reasoning?), and boundary shift (does it state what would resolve the case?). Each is in [0, 1] and aggregated via a non-compensatory geometric mean so a zero sub-score collapses the overall quality signal — a rationale that names parameters but offers no resolution path is not partial credit, it is structurally incomplete.

References

Toulmin, S. E. (2003). The Uses of Argument. Cambridge University Press.

McCarthy, P. M., & Jarvis, S. (2010). MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. Behavior Research Methods, 42(2), 381-392.

Karwowski, J., et al. (2024). Goodhart's Law in Reinforcement Learning. ICLR 2024.

De la Chica Rodríguez, J. M., & Martí-González, C. (2026). Mechanical Enforcement for LLM Governance. arXiv:2605.14744.

Classes

RuleEvidence(matched: bool, span: str, explanation: str) dataclass

A single piece of evidence supporting a rule's pass/fail decision.

Attributes:

Name Type Description
matched bool

Whether the rule pattern matched the input text.

span str

The substring (lowercased) that triggered the match, or "" if no match was found.

explanation str

Short human-readable note describing what was checked.

ChecklistRule(id: str, description: str, weight: float, sub_score: str, check: Callable[[str, str, str | None, dict[str, Any]], RuleEvidence], citation: str = '') dataclass

A single rule with an id, a pattern check, and a weight.

Rules are designed to be readable: id and description are surfaced verbatim in the audit explanation. The check callable returns a :class:RuleEvidence so the audit trail records why the rule fired, not just that it did.

Attributes:

Name Type Description
id str

Stable identifier (e.g. "spec.reg_flag"). Used in audit logs.

description str

One-line human-readable description of the rule.

weight float

Contribution to the parent sub-score when matched, in [0, 1]. Sub-scores are capped at 1.0 even when weights sum higher.

sub_score str

Which sub-score this rule contributes to. For the legacy banking_rules() set: "spec", "expl", or "bshift". For the current groundlens_banking_rules() set: "groundedness", "completeness", "calibration", "traceability", or "robustness". Custom rule sets may define additional categories.

check Callable[[str, str, str | None, dict[str, Any]], RuleEvidence]

Pure function (question, response, context, metadata) -> RuleEvidence. Must be deterministic.

citation str

Free-text academic / industry / regulatory provenance for the rule, suitable for inclusion in an audit explanation or a regulatory submission. Empty string when no citation is provided. Example: "RAGAs (Es et al., EACL 2024) §3 Faithfulness".

RuleResult(rule_id: str, sub_score: str, weight: float, matched: bool, evidence_span: str, explanation: str) dataclass

Outcome of evaluating a single rule.

Attributes:

Name Type Description
rule_id str

The :attr:ChecklistRule.id that produced this result.

sub_score str

Which sub-score this rule contributes to.

weight float

The weight of the rule (echo of :attr:ChecklistRule.weight).

matched bool

Whether the rule fired.

evidence_span str

The substring that triggered the match, if any.

explanation str

The rule's human-readable explanation.

RuleSetResult(sub_scores: dict[str, float], quality: float, flagged: bool, rule_results: tuple[RuleResult, ...], audit_explanation: str) dataclass

Aggregated result of evaluating a :class:RuleSet against a response.

Each sub-score is a capped weight sum of matched rules in that category, stored in the :attr:sub_scores mapping. quality is the geometric mean of all sub-score values: any zero sub-score yields quality = 0.0, reflecting that a rationale missing any audited dimension is structurally incomplete for human review.

Backward-compatible read accessors are exposed for the legacy De-La-Chica style sub-scores (spec, expl, bshift) and for the current GroundLens five-category skeleton (groundedness, completeness, calibration, traceability, robustness). Accessors return 0.0 when the underlying ruleset did not define the requested sub-score.

Attributes:

Name Type Description
sub_scores dict[str, float]

Mapping from sub-score name to its capped value in [0, 1]. By convention, do not mutate.

quality float

Geometric mean of all sub-score values in :attr:sub_scores.

flagged bool

True when the ruleset's flag predicate is triggered.

rule_results tuple[RuleResult, ...]

One :class:RuleResult per rule that was evaluated.

audit_explanation str

Multi-line human-readable summary suitable for inclusion in an audit log.

Attributes
spec: float property

Legacy specificity sub-score. Returns 0.0 if not defined by ruleset.

expl: float property

Legacy explanatory-linkage sub-score. Returns 0.0 if not defined by ruleset.

bshift: float property

Legacy boundary-shift sub-score. Returns 0.0 if not defined by ruleset.

groundedness: float property

Groundedness sub-score. Returns 0.0 if not defined by ruleset.

completeness: float property

Completeness sub-score. Returns 0.0 if not defined by ruleset.

calibration: float property

Calibration sub-score. Returns 0.0 if not defined by ruleset.

traceability: float property

Traceability sub-score. Returns 0.0 if not defined by ruleset.

robustness: float property

Robustness sub-score. Returns 0.0 if not defined by ruleset.

RuleSet(name: str, rules: tuple[ChecklistRule, ...], sub_scores: tuple[str, ...] = ('spec', 'expl', 'bshift'), quality_floor: float = _DEFAULT_QUALITY_FLOOR, flag_predicate: Callable[[dict[str, float]], bool] | None = None) dataclass

A collection of rules evaluated together against a (q, r, ctx) triple.

Use :func:groundlens_banking_rules for the current canonical five-category ruleset, :func:banking_rules for the legacy three-category ruleset, or construct your own by passing a sequence of :class:ChecklistRule along with the list of sub-score categories the rules contribute to.

Attributes:

Name Type Description
name str

Identifier (e.g. "groundlens_banking_v1"). Surfaced in audit logs.

rules tuple[ChecklistRule, ...]

The rules to evaluate.

sub_scores tuple[str, ...]

Ordered tuple of sub-score category names this ruleset produces. Rules whose sub_score field is not in this tuple are ignored at aggregation time (their evidence is still recorded in :attr:RuleSetResult.rule_results). Default ("spec", "expl", "bshift") preserves legacy behavior.

quality_floor float

Default flag-predicate threshold below which a sub-score triggers the audit-deficiency flag. Applied to spec and expl only when :attr:flag_predicate is None.

flag_predicate Callable[[dict[str, float]], bool] | None

Optional pure function dict[str, float] -> bool that decides whether the aggregated result is flagged. When None, the default legacy predicate is used: flagged iff spec < quality_floor or expl < quality_floor.

Methods:
evaluate(*, question: str, response: str, context: str | None = None, metadata: dict[str, Any] | None = None) -> RuleSetResult

Evaluate the ruleset against a single (question, response) pair.

Parameters:

Name Type Description Default
question str

The user query / prompt the LLM received.

required
response str

The LLM's rationale text being audited.

required
context str | None

Optional retrieved context (RAG-style). May be None when no retrieval was performed.

None
metadata dict[str, Any] | None

Optional dict carrying domain-specific structured data that some rules may consult (e.g. the case parameters in a banking decision: risk score, flags, amount, etc.).

None

Returns:

Name Type Description
A RuleSetResult

class:RuleSetResult with all sub-scores, the aggregated

RuleSetResult

quality, and a full audit explanation.

Raises:

Type Description
ValueError

If response is empty.

Source code in src/groundlens/rules.py
def evaluate(
    self,
    *,
    question: str,
    response: str,
    context: str | None = None,
    metadata: dict[str, Any] | None = None,
) -> RuleSetResult:
    """Evaluate the ruleset against a single (question, response) pair.

    Args:
        question: The user query / prompt the LLM received.
        response: The LLM's rationale text being audited.
        context: Optional retrieved context (RAG-style). May be ``None``
            when no retrieval was performed.
        metadata: Optional dict carrying domain-specific structured data
            that some rules may consult (e.g. the case parameters in a
            banking decision: risk score, flags, amount, etc.).

    Returns:
        A :class:`RuleSetResult` with all sub-scores, the aggregated
        quality, and a full audit explanation.

    Raises:
        ValueError: If ``response`` is empty.
    """
    if not response.strip():
        msg = "response must be a non-empty string."
        raise ValueError(msg)

    meta = metadata or {}

    results: list[RuleResult] = []
    weights_by_sub: dict[str, float] = dict.fromkeys(self.sub_scores, 0.0)

    for rule in self.rules:
        evidence = rule.check(question, response, context, meta)
        results.append(
            RuleResult(
                rule_id=rule.id,
                sub_score=rule.sub_score,
                weight=rule.weight,
                matched=evidence.matched,
                evidence_span=evidence.span,
                explanation=evidence.explanation,
            )
        )
        if evidence.matched and rule.sub_score in weights_by_sub:
            weights_by_sub[rule.sub_score] += rule.weight

    sub_scores: dict[str, float] = {
        name: round(min(1.0, weights_by_sub[name]), 4) for name in self.sub_scores
    }

    product = 1.0
    for value in sub_scores.values():
        product *= value
    n = len(sub_scores)
    quality = round(product ** (1.0 / n), 4) if product > 0 and n > 0 else 0.0

    if self.flag_predicate is not None:
        flagged = bool(self.flag_predicate(sub_scores))
    else:
        # Legacy default: flagged iff spec or expl below quality_floor.
        flagged = (sub_scores.get("spec", 0.0) < self.quality_floor) or (
            sub_scores.get("expl", 0.0) < self.quality_floor
        )

    audit = _format_audit_explanation(
        ruleset_name=self.name,
        sub_scores=sub_scores,
        quality=quality,
        flagged=flagged,
        quality_floor=self.quality_floor,
        results=results,
    )

    return RuleSetResult(
        sub_scores=sub_scores,
        quality=quality,
        flagged=flagged,
        rule_results=tuple(results),
        audit_explanation=audit,
    )

Functions:

banking_rules(quality_floor: float = _DEFAULT_QUALITY_FLOOR) -> RuleSet

Curated ruleset for regulated banking governance decisions.

The rules cover the three sub-scores that an auditor or compliance officer typically inspects in a deferral or escalation rationale:

  • Specificity (spec): does the rationale cite the case parameters that triggered the decision? Flags, risk score, numeric thresholds, gates, completeness, jurisdictional details, sufficient length, and specificity-marking language.
  • Explanatory linkage (expl): does the rationale link the case facts to the decision? Conditional structure, pending actions, causal connectives, epistemic limits, domain references, modal verbs, length, and temporal ordering.
  • Boundary shift (bshift): does the rationale state what would change the decision? Conditional approval pathways, information requests, risk-reduction proposals, alternative framings, threshold references, and length.

The default quality_floor=0.3 follows the cosmetic-deadlock threshold introduced in the financial-decisions governance literature. A response that falls below this floor on either spec or expl is flagged as audit-deficient even if the geometric SGI/DGI score looks acceptable in isolation — a structurally typical "false negative" of embedding-based detection.

Parameters:

Name Type Description Default
quality_floor float

Threshold below which a sub-score triggers the cosmetic-deadlock flag. Tune per deployment risk tolerance.

_DEFAULT_QUALITY_FLOOR

Returns:

Name Type Description
A RuleSet

class:RuleSet named "banking_v1".

Source code in src/groundlens/rules.py
def banking_rules(quality_floor: float = _DEFAULT_QUALITY_FLOOR) -> RuleSet:
    """Curated ruleset for regulated banking governance decisions.

    The rules cover the three sub-scores that an auditor or compliance
    officer typically inspects in a deferral or escalation rationale:

    - **Specificity (spec):** does the rationale cite the case parameters
      that triggered the decision? Flags, risk score, numeric thresholds,
      gates, completeness, jurisdictional details, sufficient length, and
      specificity-marking language.
    - **Explanatory linkage (expl):** does the rationale link the case
      facts to the decision? Conditional structure, pending actions, causal
      connectives, epistemic limits, domain references, modal verbs,
      length, and temporal ordering.
    - **Boundary shift (bshift):** does the rationale state what would
      change the decision? Conditional approval pathways, information
      requests, risk-reduction proposals, alternative framings, threshold
      references, and length.

    The default ``quality_floor=0.3`` follows the cosmetic-deadlock
    threshold introduced in the financial-decisions governance literature.
    A response that falls below this floor on either ``spec`` or ``expl``
    is flagged as audit-deficient even if the geometric SGI/DGI score
    looks acceptable in isolation — a structurally typical "false
    negative" of embedding-based detection.

    Args:
        quality_floor: Threshold below which a sub-score triggers the
            cosmetic-deadlock flag. Tune per deployment risk tolerance.

    Returns:
        A :class:`RuleSet` named ``"banking_v1"``.
    """
    rules: tuple[ChecklistRule, ...] = (
        # Specificity sub-rules
        ChecklistRule("spec.reg_flag", "regulatory flag", 0.20, "spec", _check_regulatory_flag),
        ChecklistRule("spec.risk_ref", "risk reference", 0.15, "spec", _check_risk_reference),
        ChecklistRule("spec.numeric", "numeric value", 0.10, "spec", _check_numeric_value),
        ChecklistRule("spec.gate", "gate / threshold", 0.10, "spec", _check_gate_name),
        ChecklistRule("spec.info_gap", "information gap", 0.15, "spec", _check_information_gap),
        ChecklistRule(
            "spec.case_detail", "case-specific detail", 0.10, "spec", _check_case_specific_detail
        ),
        ChecklistRule(
            "spec.length", "substantive length", 0.10, "spec", _check_substantive_length
        ),
        ChecklistRule(
            "spec.spec_language",
            "specificity language",
            0.10,
            "spec",
            _check_specificity_language,
        ),
        # Explanatory linkage sub-rules
        ChecklistRule(
            "expl.conditional", "conditional structure", 0.20, "expl", _check_conditional_structure
        ),
        ChecklistRule("expl.pending", "pending action", 0.15, "expl", _check_pending_action),
        ChecklistRule("expl.causal", "causal connective", 0.15, "expl", _check_causal_connective),
        ChecklistRule(
            "expl.epistemic", "epistemic limitation", 0.15, "expl", _check_epistemic_limit
        ),
        ChecklistRule("expl.domain", "domain reference", 0.10, "expl", _check_domain_reference),
        ChecklistRule("expl.modal", "modal verb", 0.10, "expl", _check_modal_verb),
        ChecklistRule("expl.length", "minimum length", 0.10, "expl", _check_minimum_length),
        ChecklistRule(
            "expl.temporal", "temporal ordering", 0.05, "expl", _check_temporal_ordering
        ),
        # Boundary shift sub-rules
        ChecklistRule(
            "bshift.cond_approval",
            "conditional approval",
            0.25,
            "bshift",
            _check_conditional_approval,
        ),
        ChecklistRule(
            "bshift.info_request",
            "information request",
            0.20,
            "bshift",
            _check_information_request,
        ),
        ChecklistRule(
            "bshift.risk_reduction", "risk reduction", 0.15, "bshift", _check_risk_reduction
        ),
        ChecklistRule(
            "bshift.alternative", "alternative framing", 0.10, "bshift", _check_alternative_framing
        ),
        ChecklistRule(
            "bshift.threshold_ref",
            "threshold reference",
            0.10,
            "bshift",
            _check_threshold_reference,
        ),
        ChecklistRule(
            "bshift.length", "resolution-path length", 0.05, "bshift", _check_resolution_length
        ),
    )
    return RuleSet(name="banking_v1", rules=rules, quality_floor=quality_floor)

groundlens_banking_rules(quality_floor: float = _DEFAULT_QUALITY_FLOOR) -> RuleSet

Canonical rule set for LLM rationale evaluation in banking governance.

Returns the 20-rule reference set whose provenance is triangulated across five independent research tracks: peer-reviewed NLP literature, tier-1 bank public reports, banking regulator whitepapers, cross-industry frameworks, and financial-domain NLP benchmarks. The rules are organized into five empirically-emergent sub-score categories:

  • groundedness (5 rules): claims linked to and supported by source.
  • completeness (3 rules): coverage of the governance question.
  • calibration (4 rules): uncertainty expression and abstention.
  • traceability (5 rules): citation, audit trail, validation references.
  • robustness (3 rules): resistance to noise, conflict, injection.

Each rule carries a citation field pointing to at least one of its academic, industrial, or regulatory provenance sources. The companion paper (Marin, 2026) documents the full per-rule provenance.

The default flag predicate :func:_groundlens_banking_flag_predicate triggers when any regulator-non-negotiable sub-score falls below its threshold (groundedness < 0.5, calibration < 0.3, or traceability < 0.4).

Parameters:

Name Type Description Default
quality_floor float

Legacy floor exposed for users who want a uniform threshold across sub-scores. Not used by the default flag predicate; kept for compatibility with the legacy banking_rules() signature so deployers can A/B both rulesets with one parameter.

_DEFAULT_QUALITY_FLOOR

Returns:

Name Type Description
A RuleSet

class:RuleSet named "groundlens_banking_v1" with five

RuleSet

sub-scores and 20 rules.

Source code in src/groundlens/rules.py
def groundlens_banking_rules(quality_floor: float = _DEFAULT_QUALITY_FLOOR) -> RuleSet:
    """Canonical rule set for LLM rationale evaluation in banking governance.

    Returns the 20-rule reference set whose provenance is triangulated across
    five independent research tracks: peer-reviewed NLP literature, tier-1
    bank public reports, banking regulator whitepapers, cross-industry
    frameworks, and financial-domain NLP benchmarks. The rules are organized
    into five empirically-emergent sub-score categories:

    - **groundedness** (5 rules): claims linked to and supported by source.
    - **completeness** (3 rules): coverage of the governance question.
    - **calibration** (4 rules): uncertainty expression and abstention.
    - **traceability** (5 rules): citation, audit trail, validation references.
    - **robustness** (3 rules): resistance to noise, conflict, injection.

    Each rule carries a ``citation`` field pointing to at least one of its
    academic, industrial, or regulatory provenance sources. The companion
    paper (Marin, 2026) documents the full per-rule provenance.

    The default flag predicate :func:`_groundlens_banking_flag_predicate`
    triggers when any regulator-non-negotiable sub-score falls below its
    threshold (groundedness < 0.5, calibration < 0.3, or traceability < 0.4).

    Args:
        quality_floor: Legacy floor exposed for users who want a uniform
            threshold across sub-scores. Not used by the default flag
            predicate; kept for compatibility with the legacy ``banking_rules()``
            signature so deployers can A/B both rulesets with one parameter.

    Returns:
        A :class:`RuleSet` named ``"groundlens_banking_v1"`` with five
        sub-scores and 20 rules.
    """
    rules: tuple[ChecklistRule, ...] = (
        # ── Groundedness (5 rules) ──────────────────────────────────────────
        ChecklistRule(
            id="grnd.claim_supported_by_context",
            description="every claim inferable from context",
            weight=0.25,
            sub_score="groundedness",
            check=_check_grounded_in_context,
            citation="RAGAs (Es et al., EACL 2024) §3; NIST AI 600-1 (2024) §2.2 Confabulation",
        ),
        ChecklistRule(
            id="grnd.atomic_decomposition",
            description="rationale decomposable into atomic claims",
            weight=0.20,
            sub_score="groundedness",
            check=_check_atomic_decomposable,
            citation="FactScore (Min et al., EMNLP 2023) §3; RAGAs (Es et al., EACL 2024) §3",
        ),
        ChecklistRule(
            id="grnd.no_unsupported_extensions",
            description="no claims beyond what context supports",
            weight=0.20,
            sub_score="groundedness",
            check=_check_no_unsupported_extensions,
            citation=(
                "HaluEval (Li et al., EMNLP 2023); Ji et al. ACM CSUR 2023; NIST AI 600-1 (2024)"
            ),
        ),
        ChecklistRule(
            id="grnd.regulatory_flag",
            description="names a specific regulatory flag or policy clause",
            weight=0.20,
            sub_score="groundedness",
            check=_check_regulatory_flag,
            citation="REV (Chen et al., ACL 2023); SR 26-2 (Fed/OCC/FDIC 2026) §VI Documentation",
        ),
        ChecklistRule(
            id="grnd.counterfactual_robust",
            description="screened against wrong-retrieval scenarios",
            weight=0.15,
            sub_score="groundedness",
            check=_check_counterfactual_robustness,
            citation="RGB (Chen et al., AAAI 2024); EU AI Act 2024/1689 Art. 15(4)",
        ),
        # ── Completeness (3 rules) ──────────────────────────────────────────
        ChecklistRule(
            id="comp.addresses_all_parts",
            description="response length scales with question parts",
            weight=0.40,
            sub_score="completeness",
            check=_check_addresses_all_parts,
            citation="RAGAs (Es et al., EACL 2024) §3; EU AI Act 2024/1689 Art. 13(2)",
        ),
        ChecklistRule(
            id="comp.governance_dimensions",
            description="references multiple governance dimensions",
            weight=0.35,
            sub_score="completeness",
            check=_check_governance_dimensions,
            citation="EBA GL/2020/06 §4.3.3; SR 26-2 (Fed/OCC/FDIC 2026) §IV Model Development",
        ),
        ChecklistRule(
            id="comp.information_integration",
            description="integrates multiple sources",
            weight=0.25,
            sub_score="completeness",
            check=_check_information_integration,
            citation="RGB (Chen et al., AAAI 2024); TRUE (Honovich et al., NAACL 2022)",
        ),
        # ── Calibration (4 rules) ───────────────────────────────────────────
        ChecklistRule(
            id="cal.abstains_when_insufficient",
            description="explicitly abstains when evidence is insufficient",
            weight=0.35,
            sub_score="calibration",
            check=_check_abstains_when_insufficient,
            citation=(
                "RAGAs (Es et al., EACL 2024) §3; FinanceBench (Islam et al., 2023); "
                "SR 26-2 §V Model Validation"
            ),
        ),
        ChecklistRule(
            id="cal.explicit_hedging",
            description="uses hedging language for uncertain claims",
            weight=0.30,
            sub_score="calibration",
            check=_check_explicit_hedging,
            citation=(
                "TruthfulQA (Lin et al., ACL 2022); Hyland (1998) hedging taxonomy; "
                "SR 26-2 §IV Model Use"
            ),
        ),
        ChecklistRule(
            id="cal.confidence_score",
            description="includes a numeric confidence or probability",
            weight=0.20,
            sub_score="calibration",
            check=_check_confidence_score,
            citation="G-Eval (Liu et al., EMNLP 2023); EU AI Act Art. 13(3)(b)(ii)",
        ),
        ChecklistRule(
            id="cal.self_consistency",
            description="pipeline screened for self-consistency",
            weight=0.15,
            sub_score="calibration",
            check=_check_self_consistency,
            citation="SelfCheckGPT (Manakul et al., EMNLP 2023); Morgan Stanley + OpenAI (2024)",
        ),
        # ── Traceability (5 rules) ──────────────────────────────────────────
        ChecklistRule(
            id="trace.specific_source_span",
            description="cites a specific page / section / paragraph",
            weight=0.25,
            sub_score="traceability",
            check=_check_specific_source_span,
            citation=(
                "e-SNLI (Camburu et al., NeurIPS 2018); EU AI Act Art. 13(3)(b)(iv); "
                "FinanceBench (Islam et al., 2023)"
            ),
        ),
        ChecklistRule(
            id="trace.natural_language_rationale",
            description="provides a substantive natural-language rationale",
            weight=0.20,
            sub_score="traceability",
            check=_check_substantive_length,
            citation=(
                "e-SNLI (Camburu et al., NeurIPS 2018); EU AI Act Art. 13(3)(b)(iv); "
                "PRA SS1/23 Principle 3"
            ),
        ),
        ChecklistRule(
            id="trace.falsifiable_actionable",
            description="couples numeric claim with causal mechanism",
            weight=0.20,
            sub_score="traceability",
            check=_check_falsifiable_actionable,
            citation="REV (Chen et al., ACL 2023); SR 26-2 §V Conceptual Soundness",
        ),
        ChecklistRule(
            id="trace.numeric_value",
            description="includes a numeric value or metric",
            weight=0.15,
            sub_score="traceability",
            check=_check_numeric_value,
            citation=(
                "FinQA (Chen et al., EMNLP 2021); EU AI Act Art. 13(3)(b)(ii); "
                "SR 26-2 §V Outcomes Analysis"
            ),
        ),
        ChecklistRule(
            id="trace.audit_logged",
            description="rationale persisted to audit log",
            weight=0.20,
            sub_score="traceability",
            check=_check_audit_logged,
            citation=(
                "EU AI Act Art. 12 Record-Keeping; SR 26-2 §VI Documentation; "
                "ISO/IEC 42001:2023 §8.2"
            ),
        ),
        # ── Robustness (3 rules) ────────────────────────────────────────────
        ChecklistRule(
            id="rob.independent_validation",
            description="references independent validation / effective challenge",
            weight=0.40,
            sub_score="robustness",
            check=_check_independent_validation,
            citation=(
                "SR 26-2 §III Effective Challenge; PRA SS1/23 Principle 4; "
                "ECB Guide to Internal Models §9.3 ¶43(a)"
            ),
        ),
        ChecklistRule(
            id="rob.prompt_injection_robust",
            description="pipeline screened for prompt-injection robustness",
            weight=0.35,
            sub_score="robustness",
            check=_check_prompt_injection_robust,
            citation="RGB (Chen et al., AAAI 2024); EU AI Act Art. 15; MAS MindForge (2024)",
        ),
        ChecklistRule(
            id="rob.cross_source_conflict",
            description="acknowledges cross-source conflicts",
            weight=0.25,
            sub_score="robustness",
            check=_check_cross_source_conflict,
            citation=(
                "ConflictBank (Su et al., 2024); EU AI Act Art. 15(4); RGB (Chen et al., 2024)"
            ),
        ),
    )

    return RuleSet(
        name="groundlens_banking_v1",
        rules=rules,
        sub_scores=("groundedness", "completeness", "calibration", "traceability", "robustness"),
        quality_floor=quality_floor,
        flag_predicate=_groundlens_banking_flag_predicate,
    )

decision_rationale_rules(domain: str = 'finance', regulations: tuple[str, ...] = (), quality_floor: float = _DEFAULT_QUALITY_FLOOR) -> RuleSet

Rule set for decision-rationale agents (credit / AML / KYC / sanctions).

Canonical factory for the 20-rule, 5-sub-score decision-rationale rule set. Replaces :func:groundlens_banking_rules under the archetype-as-function naming convention introduced in ADR 0001 (release 2026.6.13).

Parameters:

Name Type Description Default
domain str

Deployment domain. Currently only "finance" (default) is supported; calling with any other value raises ValueError so the caller knows the verticalization is not yet shipped. Insurance, healthcare, and legal vertical decision-rationale sets are on the roadmap.

'finance'
regulations tuple[str, ...]

Optional tuple of regulation keys. When non-empty, audit_explanation lines whose rule citation does not mention any of the requested regulations are suppressed from the rendered audit text. Does not add or remove rules. Valid keys include: "eu_ai_act", "sr_26_2", "sr_11_7", "nist_ai_600_1", "nist_ai_rmf", "iso_42001", "ecb_internal_models", "eba_gl_2020_06", "pra_ss1_23", "hipaa", "gdpr".

Implementation note (2026.6.13): the kwarg is accepted and validated, but provenance-filtered rendering of audit_explanation will land in a follow-up release. For now the audit text is unmodified; the rule set is returned unchanged. A UserWarning is emitted when the kwarg is non-empty so the caller is aware the filter is not yet active.

()
quality_floor float

Threshold below which a sub-score triggers the cosmetic-deadlock flag. Kept for compatibility with the legacy banking_rules() signature.

_DEFAULT_QUALITY_FLOOR

Returns:

Name Type Description
A RuleSet

class:RuleSet named "decision_rationale_v1_finance" with

RuleSet

five sub-scores and 20 rules. The rules and weights are identical

RuleSet

to those of :func:groundlens_banking_rules; only the rule-set

RuleSet

name is updated.

Raises:

Type Description
ValueError

If domain is not in :data:_VALID_DECISION_RATIONALE_DOMAINS.

Example::

from groundlens import decision_rationale_rules

rs = decision_rationale_rules(
    domain="finance",
    regulations=("eu_ai_act", "sr_26_2"),
)
result = rs.evaluate(question=q, response=r, context=ctx)
Source code in src/groundlens/rules.py
def decision_rationale_rules(
    domain: str = "finance",
    regulations: tuple[str, ...] = (),
    quality_floor: float = _DEFAULT_QUALITY_FLOOR,
) -> RuleSet:
    """Rule set for decision-rationale agents (credit / AML / KYC / sanctions).

    Canonical factory for the 20-rule, 5-sub-score decision-rationale
    rule set. Replaces :func:`groundlens_banking_rules` under the
    archetype-as-function naming convention introduced in ADR 0001
    (release 2026.6.13).

    Args:
        domain: Deployment domain. Currently only ``"finance"`` (default)
            is supported; calling with any other value raises
            ``ValueError`` so the caller knows the verticalization is not
            yet shipped. Insurance, healthcare, and legal vertical
            decision-rationale sets are on the roadmap.
        regulations: Optional tuple of regulation keys. When non-empty,
            ``audit_explanation`` lines whose rule citation does not
            mention any of the requested regulations are suppressed from
            the rendered audit text. Does not add or remove rules. Valid
            keys include: ``"eu_ai_act"``, ``"sr_26_2"``, ``"sr_11_7"``,
            ``"nist_ai_600_1"``, ``"nist_ai_rmf"``, ``"iso_42001"``,
            ``"ecb_internal_models"``, ``"eba_gl_2020_06"``,
            ``"pra_ss1_23"``, ``"hipaa"``, ``"gdpr"``.

            *Implementation note (2026.6.13):* the kwarg is accepted and
            validated, but provenance-filtered rendering of
            ``audit_explanation`` will land in a follow-up release. For
            now the audit text is unmodified; the rule set is returned
            unchanged. A ``UserWarning`` is emitted when the kwarg is
            non-empty so the caller is aware the filter is not yet active.
        quality_floor: Threshold below which a sub-score triggers the
            cosmetic-deadlock flag. Kept for compatibility with the
            legacy ``banking_rules()`` signature.

    Returns:
        A :class:`RuleSet` named ``"decision_rationale_v1_finance"`` with
        five sub-scores and 20 rules. The rules and weights are identical
        to those of :func:`groundlens_banking_rules`; only the rule-set
        name is updated.

    Raises:
        ValueError: If ``domain`` is not in
            :data:`_VALID_DECISION_RATIONALE_DOMAINS`.

    Example::

        from groundlens import decision_rationale_rules

        rs = decision_rationale_rules(
            domain="finance",
            regulations=("eu_ai_act", "sr_26_2"),
        )
        result = rs.evaluate(question=q, response=r, context=ctx)
    """
    if domain not in _VALID_DECISION_RATIONALE_DOMAINS:
        msg = (
            f"decision_rationale_rules(domain={domain!r}) — supported domains "
            f"are {_VALID_DECISION_RATIONALE_DOMAINS}. Other verticalizations "
            "are on the roadmap; open an issue at "
            "https://github.com/groundlens-dev/groundlens/issues to request "
            "one."
        )
        raise ValueError(msg)

    unknown = tuple(r for r in regulations if r not in _REGULATION_CITATION_KEYS)
    if unknown:
        msg = (
            f"decision_rationale_rules(regulations={regulations!r}) — unknown "
            f"keys {unknown}. Known keys: "
            f"{tuple(_REGULATION_CITATION_KEYS.keys())}."
        )
        raise ValueError(msg)
    if regulations:
        warnings.warn(
            "decision_rationale_rules(regulations=...) is accepted but the "
            "provenance-filtered audit_explanation rendering is not yet "
            "active (slated for a follow-up release). The returned RuleSet "
            "is unchanged.",
            UserWarning,
            stacklevel=2,
        )

    base = groundlens_banking_rules(quality_floor=quality_floor)
    # Replace the legacy name with the archetype-aware canonical name.
    object.__setattr__(base, "name", f"decision_rationale_v1_{domain}")
    return base