Rules

`groundlens.rules` ¶

Rule-based interpretable layer — deterministic, auditable, no LLM.

This module provides a checklist-style rule engine that complements the geometric SGI/DGI scores with human-readable audit evidence. A trained auditor or compliance officer can read the textual explanation produced by a :class:RuleSet evaluation and verify, item by item, why a response passed or failed.

The rule engine is intentionally rule-based rather than learning-based:

Deterministic. Same inputs → same outputs, byte-identical.
Auditable. Every pass/fail decision cites the rule, the weight, and the matched evidence span in the response text.
No LLM. Pattern matching, substring tests, and regular expressions. Compatible with the no-second-LLM constraint of groundlens.
Domain-specific. Built-in factories (:func:banking_rules) exist for regulated domains; custom rule sets can be assembled from :class:ChecklistRule instances or loaded from configuration.

Sub-scores follow the structure of compliance rationale evaluation in regulated AI literature: specificity (does the response cite concrete case details?), explanatory linkage (does it explain the reasoning?), and boundary shift (does it state what would resolve the case?). Each is in [0, 1] and aggregated via a non-compensatory geometric mean so a zero sub-score collapses the overall quality signal — a rationale that names parameters but offers no resolution path is not partial credit, it is structurally incomplete.

References

Toulmin, S. E. (2003). The Uses of Argument. Cambridge University Press.

McCarthy, P. M., & Jarvis, S. (2010). MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. Behavior Research Methods, 42(2), 381-392.

Karwowski, J., et al. (2024). Goodhart's Law in Reinforcement Learning. ICLR 2024.

De la Chica Rodríguez, J. M., & Martí-González, C. (2026). Mechanical Enforcement for LLM Governance. arXiv:2605.14744.

Classes¶

`RuleEvidence(matched: bool, span: str, explanation: str)` `dataclass` ¶

A single piece of evidence supporting a rule's pass/fail decision.

Attributes:

Name	Type	Description
`matched`	`bool`	Whether the rule pattern matched the input text.
`span`	`str`	The substring (lowercased) that triggered the match, or `""` if no match was found.
`explanation`	`str`	Short human-readable note describing what was checked.

`ChecklistRule(id: str, description: str, weight: float, sub_score: str, check: Callable[[str, str, str | None, dict[str, Any]], RuleEvidence], citation: str = '')` `dataclass` ¶

A single rule with an id, a pattern check, and a weight.

Rules are designed to be readable: id and description are surfaced verbatim in the audit explanation. The check callable returns a :class:RuleEvidence so the audit trail records why the rule fired, not just that it did.

Attributes:

Name	Type	Description
`id`	`str`	Stable identifier (e.g. `"spec.reg_flag"`). Used in audit logs.
`description`	`str`	One-line human-readable description of the rule.
`weight`	`float`	Contribution to the parent sub-score when matched, in [0, 1]. Sub-scores are capped at 1.0 even when weights sum higher.
`sub_score`	`str`	Which sub-score this rule contributes to. For the legacy `banking_rules()` set: `"spec"`, `"expl"`, or `"bshift"`. For the current `groundlens_banking_rules()` set: `"groundedness"`, `"completeness"`, `"calibration"`, `"traceability"`, or `"robustness"`. Custom rule sets may define additional categories.
`check`	`Callable[[str, str, str \| None, dict[str, Any]], RuleEvidence]`	Pure function `(question, response, context, metadata) -> RuleEvidence`. Must be deterministic.
`citation`	`str`	Free-text academic / industry / regulatory provenance for the rule, suitable for inclusion in an audit explanation or a regulatory submission. Empty string when no citation is provided. Example: `"RAGAs (Es et al., EACL 2024) §3 Faithfulness"`.

`RuleResult(rule_id: str, sub_score: str, weight: float, matched: bool, evidence_span: str, explanation: str)` `dataclass` ¶

Outcome of evaluating a single rule.

Attributes:

Name	Type	Description
`rule_id`	`str`	The :attr:`ChecklistRule.id` that produced this result.
`sub_score`	`str`	Which sub-score this rule contributes to.
`weight`	`float`	The weight of the rule (echo of :attr:`ChecklistRule.weight`).
`matched`	`bool`	Whether the rule fired.
`evidence_span`	`str`	The substring that triggered the match, if any.
`explanation`	`str`	The rule's human-readable explanation.

`RuleSetResult(sub_scores: dict[str, float], quality: float, flagged: bool, rule_results: tuple[RuleResult, ...], audit_explanation: str)` `dataclass` ¶

Aggregated result of evaluating a :class:RuleSet against a response.

Each sub-score is a capped weight sum of matched rules in that category, stored in the :attr:sub_scores mapping. quality is the geometric mean of all sub-score values: any zero sub-score yields quality = 0.0, reflecting that a rationale missing any audited dimension is structurally incomplete for human review.

Backward-compatible read accessors are exposed for the legacy De-La-Chica style sub-scores (spec, expl, bshift) and for the current GroundLens five-category skeleton (groundedness, completeness, calibration, traceability, robustness). Accessors return 0.0 when the underlying ruleset did not define the requested sub-score.

Attributes:

Name	Type	Description
`sub_scores`	`dict[str, float]`	Mapping from sub-score name to its capped value in [0, 1]. By convention, do not mutate.
`quality`	`float`	Geometric mean of all sub-score values in :attr:`sub_scores`.
`flagged`	`bool`	`True` when the ruleset's flag predicate is triggered.
`rule_results`	`tuple[RuleResult, ...]`	One :class:`RuleResult` per rule that was evaluated.
`audit_explanation`	`str`	Multi-line human-readable summary suitable for inclusion in an audit log.

Attributes¶

`spec: float` `property` ¶

Legacy specificity sub-score. Returns 0.0 if not defined by ruleset.

`expl: float` `property` ¶

Legacy explanatory-linkage sub-score. Returns 0.0 if not defined by ruleset.

`bshift: float` `property` ¶

Legacy boundary-shift sub-score. Returns 0.0 if not defined by ruleset.

`groundedness: float` `property` ¶

Groundedness sub-score. Returns 0.0 if not defined by ruleset.

`completeness: float` `property` ¶

Completeness sub-score. Returns 0.0 if not defined by ruleset.

`calibration: float` `property` ¶

Calibration sub-score. Returns 0.0 if not defined by ruleset.

`traceability: float` `property` ¶

Traceability sub-score. Returns 0.0 if not defined by ruleset.

`robustness: float` `property` ¶

Robustness sub-score. Returns 0.0 if not defined by ruleset.

`RuleSet(name: str, rules: tuple[ChecklistRule, ...], sub_scores: tuple[str, ...] = ('spec', 'expl', 'bshift'), quality_floor: float = _DEFAULT_QUALITY_FLOOR, flag_predicate: Callable[[dict[str, float]], bool] | None = None)` `dataclass` ¶

A collection of rules evaluated together against a (q, r, ctx) triple.

Use :func:groundlens_banking_rules for the current canonical five-category ruleset, :func:banking_rules for the legacy three-category ruleset, or construct your own by passing a sequence of :class:ChecklistRule along with the list of sub-score categories the rules contribute to.

Attributes:

Name	Type	Description
`name`	`str`	Identifier (e.g. `"groundlens_banking_v1"`). Surfaced in audit logs.
`rules`	`tuple[ChecklistRule, ...]`	The rules to evaluate.
`sub_scores`	`tuple[str, ...]`	Ordered tuple of sub-score category names this ruleset produces. Rules whose `sub_score` field is not in this tuple are ignored at aggregation time (their evidence is still recorded in :attr:`RuleSetResult.rule_results`). Default `("spec", "expl", "bshift")` preserves legacy behavior.
`quality_floor`	`float`	Default flag-predicate threshold below which a sub-score triggers the audit-deficiency flag. Applied to `spec` and `expl` only when :attr:`flag_predicate` is `None`.
`flag_predicate`	`Callable[[dict[str, float]], bool] \| None`	Optional pure function `dict[str, float] -> bool` that decides whether the aggregated result is flagged. When `None`, the default legacy predicate is used: flagged iff `spec < quality_floor or expl < quality_floor`.

Methods:¶

`evaluate(*, question: str, response: str, context: str | None = None, metadata: dict[str, Any] | None = None) -> RuleSetResult` ¶

Evaluate the ruleset against a single (question, response) pair.

Parameters:

Name	Type	Description	Default
`question`	`str`	The user query / prompt the LLM received.	required
`response`	`str`	The LLM's rationale text being audited.	required
`context`	`str \| None`	Optional retrieved context (RAG-style). May be `None` when no retrieval was performed.	`None`
`metadata`	`dict[str, Any] \| None`	Optional dict carrying domain-specific structured data that some rules may consult (e.g. the case parameters in a banking decision: risk score, flags, amount, etc.).	`None`

Returns:

Name	Type	Description
`A`	`RuleSetResult`	class:`RuleSetResult` with all sub-scores, the aggregated
	`RuleSetResult`	quality, and a full audit explanation.

Raises:

Type	Description
`ValueError`	If `response` is empty.

Source code in src/groundlens/rules.py

def evaluate(
    self,
    *,
    question: str,
    response: str,
    context: str | None = None,
    metadata: dict[str, Any] | None = None,
) -> RuleSetResult:
    """Evaluate the ruleset against a single (question, response) pair.

    Args:
        question: The user query / prompt the LLM received.
        response: The LLM's rationale text being audited.
        context: Optional retrieved context (RAG-style). May be ``None``
            when no retrieval was performed.
        metadata: Optional dict carrying domain-specific structured data
            that some rules may consult (e.g. the case parameters in a
            banking decision: risk score, flags, amount, etc.).

    Returns:
        A :class:`RuleSetResult` with all sub-scores, the aggregated
        quality, and a full audit explanation.

    Raises:
        ValueError: If ``response`` is empty.
    """
    if not response.strip():
        msg = "response must be a non-empty string."
        raise ValueError(msg)

    meta = metadata or {}

    results: list[RuleResult] = []
    weights_by_sub: dict[str, float] = dict.fromkeys(self.sub_scores, 0.0)

    for rule in self.rules:
        evidence = rule.check(question, response, context, meta)
        results.append(
            RuleResult(
                rule_id=rule.id,
                sub_score=rule.sub_score,
                weight=rule.weight,
                matched=evidence.matched,
                evidence_span=evidence.span,
                explanation=evidence.explanation,
            )
        )
        if evidence.matched and rule.sub_score in weights_by_sub:
            weights_by_sub[rule.sub_score] += rule.weight

    sub_scores: dict[str, float] = {
        name: round(min(1.0, weights_by_sub[name]), 4) for name in self.sub_scores
    }

    product = 1.0
    for value in sub_scores.values():
        product *= value
    n = len(sub_scores)
    quality = round(product ** (1.0 / n), 4) if product > 0 and n > 0 else 0.0

    if self.flag_predicate is not None:
        flagged = bool(self.flag_predicate(sub_scores))
    else:
        # Legacy default: flagged iff spec or expl below quality_floor.
        flagged = (sub_scores.get("spec", 0.0) < self.quality_floor) or (
            sub_scores.get("expl", 0.0) < self.quality_floor
        )

    audit = _format_audit_explanation(
        ruleset_name=self.name,
        sub_scores=sub_scores,
        quality=quality,
        flagged=flagged,
        quality_floor=self.quality_floor,
        results=results,
    )

    return RuleSetResult(
        sub_scores=sub_scores,
        quality=quality,
        flagged=flagged,
        rule_results=tuple(results),
        audit_explanation=audit,
    )

Functions:¶

`banking_rules(quality_floor: float = _DEFAULT_QUALITY_FLOOR) -> RuleSet` ¶

Curated ruleset for regulated banking governance decisions.

The rules cover the three sub-scores that an auditor or compliance officer typically inspects in a deferral or escalation rationale:

Specificity (spec): does the rationale cite the case parameters that triggered the decision? Flags, risk score, numeric thresholds, gates, completeness, jurisdictional details, sufficient length, and specificity-marking language.
Explanatory linkage (expl): does the rationale link the case facts to the decision? Conditional structure, pending actions, causal connectives, epistemic limits, domain references, modal verbs, length, and temporal ordering.
Boundary shift (bshift): does the rationale state what would change the decision? Conditional approval pathways, information requests, risk-reduction proposals, alternative framings, threshold references, and length.

The default quality_floor=0.3 follows the cosmetic-deadlock threshold introduced in the financial-decisions governance literature. A response that falls below this floor on either spec or expl is flagged as audit-deficient even if the geometric SGI/DGI score looks acceptable in isolation — a structurally typical "false negative" of embedding-based detection.

Parameters:

Name	Type	Description	Default
`quality_floor`	`float`	Threshold below which a sub-score triggers the cosmetic-deadlock flag. Tune per deployment risk tolerance.	`_DEFAULT_QUALITY_FLOOR`

Returns:

Name	Type	Description
`A`	`RuleSet`	class:`RuleSet` named `"banking_v1"`.

Source code in src/groundlens/rules.py

def banking_rules(quality_floor: float = _DEFAULT_QUALITY_FLOOR) -> RuleSet:
    """Curated ruleset for regulated banking governance decisions.

    The rules cover the three sub-scores that an auditor or compliance
    officer typically inspects in a deferral or escalation rationale:

    - **Specificity (spec):** does the rationale cite the case parameters
      that triggered the decision? Flags, risk score, numeric thresholds,
      gates, completeness, jurisdictional details, sufficient length, and
      specificity-marking language.
    - **Explanatory linkage (expl):** does the rationale link the case
      facts to the decision? Conditional structure, pending actions, causal
      connectives, epistemic limits, domain references, modal verbs,
      length, and temporal ordering.
    - **Boundary shift (bshift):** does the rationale state what would
      change the decision? Conditional approval pathways, information
      requests, risk-reduction proposals, alternative framings, threshold
      references, and length.

    The default ``quality_floor=0.3`` follows the cosmetic-deadlock
    threshold introduced in the financial-decisions governance literature.
    A response that falls below this floor on either ``spec`` or ``expl``
    is flagged as audit-deficient even if the geometric SGI/DGI score
    looks acceptable in isolation — a structurally typical "false
    negative" of embedding-based detection.

    Args:
        quality_floor: Threshold below which a sub-score triggers the
            cosmetic-deadlock flag. Tune per deployment risk tolerance.

    Returns:
        A :class:`RuleSet` named ``"banking_v1"``.
    """
    rules: tuple[ChecklistRule, ...] = (
        # Specificity sub-rules
        ChecklistRule("spec.reg_flag", "regulatory flag", 0.20, "spec", _check_regulatory_flag),
        ChecklistRule("spec.risk_ref", "risk reference", 0.15, "spec", _check_risk_reference),
        ChecklistRule("spec.numeric", "numeric value", 0.10, "spec", _check_numeric_value),
        ChecklistRule("spec.gate", "gate / threshold", 0.10, "spec", _check_gate_name),
        ChecklistRule("spec.info_gap", "information gap", 0.15, "spec", _check_information_gap),
        ChecklistRule(
            "spec.case_detail", "case-specific detail", 0.10, "spec", _check_case_specific_detail
        ),
        ChecklistRule(
            "spec.length", "substantive length", 0.10, "spec", _check_substantive_length
        ),
        ChecklistRule(
            "spec.spec_language",
            "specificity language",
            0.10,
            "spec",
            _check_specificity_language,
        ),
        # Explanatory linkage sub-rules
        ChecklistRule(
            "expl.conditional", "conditional structure", 0.20, "expl", _check_conditional_structure
        ),
        ChecklistRule("expl.pending", "pending action", 0.15, "expl", _check_pending_action),
        ChecklistRule("expl.causal", "causal connective", 0.15, "expl", _check_causal_connective),
        ChecklistRule(
            "expl.epistemic", "epistemic limitation", 0.15, "expl", _check_epistemic_limit
        ),
        ChecklistRule("expl.domain", "domain reference", 0.10, "expl", _check_domain_reference),
        ChecklistRule("expl.modal", "modal verb", 0.10, "expl", _check_modal_verb),
        ChecklistRule("expl.length", "minimum length", 0.10, "expl", _check_minimum_length),
        ChecklistRule(
            "expl.temporal", "temporal ordering", 0.05, "expl", _check_temporal_ordering
        ),
        # Boundary shift sub-rules
        ChecklistRule(
            "bshift.cond_approval",
            "conditional approval",
            0.25,
            "bshift",
            _check_conditional_approval,
        ),
        ChecklistRule(
            "bshift.info_request",
            "information request",
            0.20,
            "bshift",
            _check_information_request,
        ),
        ChecklistRule(
            "bshift.risk_reduction", "risk reduction", 0.15, "bshift", _check_risk_reduction
        ),
        ChecklistRule(
            "bshift.alternative", "alternative framing", 0.10, "bshift", _check_alternative_framing
        ),
        ChecklistRule(
            "bshift.threshold_ref",
            "threshold reference",
            0.10,
            "bshift",
            _check_threshold_reference,
        ),
        ChecklistRule(
            "bshift.length", "resolution-path length", 0.05, "bshift", _check_resolution_length
        ),
    )
    return RuleSet(name="banking_v1", rules=rules, quality_floor=quality_floor)

`groundlens_banking_rules(quality_floor: float = _DEFAULT_QUALITY_FLOOR) -> RuleSet` ¶

Canonical rule set for LLM rationale evaluation in banking governance.

Returns the 20-rule reference set whose provenance is triangulated across five independent research tracks: peer-reviewed NLP literature, tier-1 bank public reports, banking regulator whitepapers, cross-industry frameworks, and financial-domain NLP benchmarks. The rules are organized into five empirically-emergent sub-score categories:

groundedness (5 rules): claims linked to and supported by source.
completeness (3 rules): coverage of the governance question.
calibration (4 rules): uncertainty expression and abstention.
traceability (5 rules): citation, audit trail, validation references.
robustness (3 rules): resistance to noise, conflict, injection.

Each rule carries a citation field pointing to at least one of its academic, industrial, or regulatory provenance sources. The companion paper (Marin, 2026) documents the full per-rule provenance.

The default flag predicate :func:_groundlens_banking_flag_predicate triggers when any regulator-non-negotiable sub-score falls below its threshold (groundedness < 0.5, calibration < 0.3, or traceability < 0.4).

Parameters:

Name	Type	Description	Default
`quality_floor`	`float`	Legacy floor exposed for users who want a uniform threshold across sub-scores. Not used by the default flag predicate; kept for compatibility with the legacy `banking_rules()` signature so deployers can A/B both rulesets with one parameter.	`_DEFAULT_QUALITY_FLOOR`

Returns:

Name	Type	Description
`A`	`RuleSet`	class:`RuleSet` named `"groundlens_banking_v1"` with five
	`RuleSet`	sub-scores and 20 rules.

Source code in src/groundlens/rules.py

def groundlens_banking_rules(quality_floor: float = _DEFAULT_QUALITY_FLOOR) -> RuleSet:
    """Canonical rule set for LLM rationale evaluation in banking governance.

    Returns the 20-rule reference set whose provenance is triangulated across
    five independent research tracks: peer-reviewed NLP literature, tier-1
    bank public reports, banking regulator whitepapers, cross-industry
    frameworks, and financial-domain NLP benchmarks. The rules are organized
    into five empirically-emergent sub-score categories:

    - **groundedness** (5 rules): claims linked to and supported by source.
    - **completeness** (3 rules): coverage of the governance question.
    - **calibration** (4 rules): uncertainty expression and abstention.
    - **traceability** (5 rules): citation, audit trail, validation references.
    - **robustness** (3 rules): resistance to noise, conflict, injection.

    Each rule carries a ``citation`` field pointing to at least one of its
    academic, industrial, or regulatory provenance sources. The companion
    paper (Marin, 2026) documents the full per-rule provenance.

    The default flag predicate :func:`_groundlens_banking_flag_predicate`
    triggers when any regulator-non-negotiable sub-score falls below its
    threshold (groundedness < 0.5, calibration < 0.3, or traceability < 0.4).

    Args:
        quality_floor: Legacy floor exposed for users who want a uniform
            threshold across sub-scores. Not used by the default flag
            predicate; kept for compatibility with the legacy ``banking_rules()``
            signature so deployers can A/B both rulesets with one parameter.

    Returns:
        A :class:`RuleSet` named ``"groundlens_banking_v1"`` with five
        sub-scores and 20 rules.
    """
    rules: tuple[ChecklistRule, ...] = (
        # ── Groundedness (5 rules) ──────────────────────────────────────────
        ChecklistRule(
            id="grnd.claim_supported_by_context",
            description="every claim inferable from context",
            weight=0.25,
            sub_score="groundedness",
            check=_check_grounded_in_context,
            citation="RAGAs (Es et al., EACL 2024) §3; NIST AI 600-1 (2024) §2.2 Confabulation",
        ),
        ChecklistRule(
            id="grnd.atomic_decomposition",
            description="rationale decomposable into atomic claims",
            weight=0.20,
            sub_score="groundedness",
            check=_check_atomic_decomposable,
            citation="FactScore (Min et al., EMNLP 2023) §3; RAGAs (Es et al., EACL 2024) §3",
        ),
        ChecklistRule(
            id="grnd.no_unsupported_extensions",
            description="no claims beyond what context supports",
            weight=0.20,
            sub_score="groundedness",
            check=_check_no_unsupported_extensions,
            citation=(
                "HaluEval (Li et al., EMNLP 2023); Ji et al. ACM CSUR 2023; NIST AI 600-1 (2024)"
            ),
        ),
        ChecklistRule(
            id="grnd.regulatory_flag",
            description="names a specific regulatory flag or policy clause",
            weight=0.20,
            sub_score="groundedness",
            check=_check_regulatory_flag,
            citation="REV (Chen et al., ACL 2023); SR 26-2 (Fed/OCC/FDIC 2026) §VI Documentation",
        ),
        ChecklistRule(
            id="grnd.counterfactual_robust",
            description="screened against wrong-retrieval scenarios",
            weight=0.15,
            sub_score="groundedness",
            check=_check_counterfactual_robustness,
            citation="RGB (Chen et al., AAAI 2024); EU AI Act 2024/1689 Art. 15(4)",
        ),
        # ── Completeness (3 rules) ──────────────────────────────────────────
        ChecklistRule(
            id="comp.addresses_all_parts",
            description="response length scales with question parts",
            weight=0.40,
            sub_score="completeness",
            check=_check_addresses_all_parts,
            citation="RAGAs (Es et al., EACL 2024) §3; EU AI Act 2024/1689 Art. 13(2)",
        ),
        ChecklistRule(
            id="comp.governance_dimensions",
            description="references multiple governance dimensions",
            weight=0.35,
            sub_score="completeness",
            check=_check_governance_dimensions,
            citation="EBA GL/2020/06 §4.3.3; SR 26-2 (Fed/OCC/FDIC 2026) §IV Model Development",
        ),
        ChecklistRule(
            id="comp.information_integration",
            description="integrates multiple sources",
            weight=0.25,
            sub_score="completeness",
            check=_check_information_integration,
            citation="RGB (Chen et al., AAAI 2024); TRUE (Honovich et al., NAACL 2022)",
        ),
        # ── Calibration (4 rules) ───────────────────────────────────────────
        ChecklistRule(
            id="cal.abstains_when_insufficient",
            description="explicitly abstains when evidence is insufficient",
            weight=0.35,
            sub_score="calibration",
            check=_check_abstains_when_insufficient,
            citation=(
                "RAGAs (Es et al., EACL 2024) §3; FinanceBench (Islam et al., 2023); "
                "SR 26-2 §V Model Validation"
            ),
        ),
        ChecklistRule(
            id="cal.explicit_hedging",
            description="uses hedging language for uncertain claims",
            weight=0.30,
            sub_score="calibration",
            check=_check_explicit_hedging,
            citation=(
                "TruthfulQA (Lin et al., ACL 2022); Hyland (1998) hedging taxonomy; "
                "SR 26-2 §IV Model Use"
            ),
        ),
        ChecklistRule(
            id="cal.confidence_score",
            description="includes a numeric confidence or probability",
            weight=0.20,
            sub_score="calibration",
            check=_check_confidence_score,
            citation="G-Eval (Liu et al., EMNLP 2023); EU AI Act Art. 13(3)(b)(ii)",
        ),
        ChecklistRule(
            id="cal.self_consistency",
            description="pipeline screened for self-consistency",
            weight=0.15,
            sub_score="calibration",
            check=_check_self_consistency,
            citation="SelfCheckGPT (Manakul et al., EMNLP 2023); Morgan Stanley + OpenAI (2024)",
        ),
        # ── Traceability (5 rules) ──────────────────────────────────────────
        ChecklistRule(
            id="trace.specific_source_span",
            description="cites a specific page / section / paragraph",
            weight=0.25,
            sub_score="traceability",
            check=_check_specific_source_span,
            citation=(
                "e-SNLI (Camburu et al., NeurIPS 2018); EU AI Act Art. 13(3)(b)(iv); "
                "FinanceBench (Islam et al., 2023)"
            ),
        ),
        ChecklistRule(
            id="trace.natural_language_rationale",
            description="provides a substantive natural-language rationale",
            weight=0.20,
            sub_score="traceability",
            check=_check_substantive_length,
            citation=(
                "e-SNLI (Camburu et al., NeurIPS 2018); EU AI Act Art. 13(3)(b)(iv); "
                "PRA SS1/23 Principle 3"
            ),
        ),
        ChecklistRule(
            id="trace.falsifiable_actionable",
            description="couples numeric claim with causal mechanism",
            weight=0.20,
            sub_score="traceability",
            check=_check_falsifiable_actionable,
            citation="REV (Chen et al., ACL 2023); SR 26-2 §V Conceptual Soundness",
        ),
        ChecklistRule(
            id="trace.numeric_value",
            description="includes a numeric value or metric",
            weight=0.15,
            sub_score="traceability",
            check=_check_numeric_value,
            citation=(
                "FinQA (Chen et al., EMNLP 2021); EU AI Act Art. 13(3)(b)(ii); "
                "SR 26-2 §V Outcomes Analysis"
            ),
        ),
        ChecklistRule(
            id="trace.audit_logged",
            description="rationale persisted to audit log",
            weight=0.20,
            sub_score="traceability",
            check=_check_audit_logged,
            citation=(
                "EU AI Act Art. 12 Record-Keeping; SR 26-2 §VI Documentation; "
                "ISO/IEC 42001:2023 §8.2"
            ),
        ),
        # ── Robustness (3 rules) ────────────────────────────────────────────
        ChecklistRule(
            id="rob.independent_validation",
            description="references independent validation / effective challenge",
            weight=0.40,
            sub_score="robustness",
            check=_check_independent_validation,
            citation=(
                "SR 26-2 §III Effective Challenge; PRA SS1/23 Principle 4; "
                "ECB Guide to Internal Models §9.3 ¶43(a)"
            ),
        ),
        ChecklistRule(
            id="rob.prompt_injection_robust",
            description="pipeline screened for prompt-injection robustness",
            weight=0.35,
            sub_score="robustness",
            check=_check_prompt_injection_robust,
            citation="RGB (Chen et al., AAAI 2024); EU AI Act Art. 15; MAS MindForge (2024)",
        ),
        ChecklistRule(
            id="rob.cross_source_conflict",
            description="acknowledges cross-source conflicts",
            weight=0.25,
            sub_score="robustness",
            check=_check_cross_source_conflict,
            citation=(
                "ConflictBank (Su et al., 2024); EU AI Act Art. 15(4); RGB (Chen et al., 2024)"
            ),
        ),
    )

    return RuleSet(
        name="groundlens_banking_v1",
        rules=rules,
        sub_scores=("groundedness", "completeness", "calibration", "traceability", "robustness"),
        quality_floor=quality_floor,
        flag_predicate=_groundlens_banking_flag_predicate,
    )

`decision_rationale_rules(domain: str = 'finance', regulations: tuple[str, ...] = (), quality_floor: float = _DEFAULT_QUALITY_FLOOR) -> RuleSet` ¶

Rule set for decision-rationale agents (credit / AML / KYC / sanctions).

Canonical factory for the 20-rule, 5-sub-score decision-rationale rule set. Replaces :func:groundlens_banking_rules under the archetype-as-function naming convention introduced in ADR 0001 (release 2026.6.13).

Parameters:

Name	Type	Description	Default
`domain`	`str`	Deployment domain. Currently only `"finance"` (default) is supported; calling with any other value raises `ValueError` so the caller knows the verticalization is not yet shipped. Insurance, healthcare, and legal vertical decision-rationale sets are on the roadmap.	`'finance'`
`regulations`	`tuple[str, ...]`	Optional tuple of regulation keys. When non-empty, `audit_explanation` lines whose rule citation does not mention any of the requested regulations are suppressed from the rendered audit text. Does not add or remove rules. Valid keys include: `"eu_ai_act"`, `"sr_26_2"`, `"sr_11_7"`, `"nist_ai_600_1"`, `"nist_ai_rmf"`, `"iso_42001"`, `"ecb_internal_models"`, `"eba_gl_2020_06"`, `"pra_ss1_23"`, `"hipaa"`, `"gdpr"`. Implementation note (2026.6.13): the kwarg is accepted and validated, but provenance-filtered rendering of `audit_explanation` will land in a follow-up release. For now the audit text is unmodified; the rule set is returned unchanged. A `UserWarning` is emitted when the kwarg is non-empty so the caller is aware the filter is not yet active.	`()`
`quality_floor`	`float`	Threshold below which a sub-score triggers the cosmetic-deadlock flag. Kept for compatibility with the legacy `banking_rules()` signature.	`_DEFAULT_QUALITY_FLOOR`

Returns:

Name	Type	Description
`A`	`RuleSet`	class:`RuleSet` named `"decision_rationale_v1_finance"` with
	`RuleSet`	five sub-scores and 20 rules. The rules and weights are identical
	`RuleSet`	to those of :func:`groundlens_banking_rules`; only the rule-set
	`RuleSet`	name is updated.

Raises:

Type	Description
`ValueError`	If `domain` is not in :data:`_VALID_DECISION_RATIONALE_DOMAINS`.

Example::

from groundlens import decision_rationale_rules

rs = decision_rationale_rules(
    domain="finance",
    regulations=("eu_ai_act", "sr_26_2"),
)
result = rs.evaluate(question=q, response=r, context=ctx)

Source code in src/groundlens/rules.py

def decision_rationale_rules(
    domain: str = "finance",
    regulations: tuple[str, ...] = (),
    quality_floor: float = _DEFAULT_QUALITY_FLOOR,
) -> RuleSet:
    """Rule set for decision-rationale agents (credit / AML / KYC / sanctions).

    Canonical factory for the 20-rule, 5-sub-score decision-rationale
    rule set. Replaces :func:`groundlens_banking_rules` under the
    archetype-as-function naming convention introduced in ADR 0001
    (release 2026.6.13).

    Args:
        domain: Deployment domain. Currently only ``"finance"`` (default)
            is supported; calling with any other value raises
            ``ValueError`` so the caller knows the verticalization is not
            yet shipped. Insurance, healthcare, and legal vertical
            decision-rationale sets are on the roadmap.
        regulations: Optional tuple of regulation keys. When non-empty,
            ``audit_explanation`` lines whose rule citation does not
            mention any of the requested regulations are suppressed from
            the rendered audit text. Does not add or remove rules. Valid
            keys include: ``"eu_ai_act"``, ``"sr_26_2"``, ``"sr_11_7"``,
            ``"nist_ai_600_1"``, ``"nist_ai_rmf"``, ``"iso_42001"``,
            ``"ecb_internal_models"``, ``"eba_gl_2020_06"``,
            ``"pra_ss1_23"``, ``"hipaa"``, ``"gdpr"``.

            *Implementation note (2026.6.13):* the kwarg is accepted and
            validated, but provenance-filtered rendering of
            ``audit_explanation`` will land in a follow-up release. For
            now the audit text is unmodified; the rule set is returned
            unchanged. A ``UserWarning`` is emitted when the kwarg is
            non-empty so the caller is aware the filter is not yet active.
        quality_floor: Threshold below which a sub-score triggers the
            cosmetic-deadlock flag. Kept for compatibility with the
            legacy ``banking_rules()`` signature.

    Returns:
        A :class:`RuleSet` named ``"decision_rationale_v1_finance"`` with
        five sub-scores and 20 rules. The rules and weights are identical
        to those of :func:`groundlens_banking_rules`; only the rule-set
        name is updated.

    Raises:
        ValueError: If ``domain`` is not in
            :data:`_VALID_DECISION_RATIONALE_DOMAINS`.

    Example::

        from groundlens import decision_rationale_rules

        rs = decision_rationale_rules(
            domain="finance",
            regulations=("eu_ai_act", "sr_26_2"),
        )
        result = rs.evaluate(question=q, response=r, context=ctx)
    """
    if domain not in _VALID_DECISION_RATIONALE_DOMAINS:
        msg = (
            f"decision_rationale_rules(domain={domain!r}) — supported domains "
            f"are {_VALID_DECISION_RATIONALE_DOMAINS}. Other verticalizations "
            "are on the roadmap; open an issue at "
            "https://github.com/groundlens-dev/groundlens/issues to request "
            "one."
        )
        raise ValueError(msg)

    unknown = tuple(r for r in regulations if r not in _REGULATION_CITATION_KEYS)
    if unknown:
        msg = (
            f"decision_rationale_rules(regulations={regulations!r}) — unknown "
            f"keys {unknown}. Known keys: "
            f"{tuple(_REGULATION_CITATION_KEYS.keys())}."
        )
        raise ValueError(msg)
    if regulations:
        warnings.warn(
            "decision_rationale_rules(regulations=...) is accepted but the "
            "provenance-filtered audit_explanation rendering is not yet "
            "active (slated for a follow-up release). The returned RuleSet "
            "is unchanged.",
            UserWarning,
            stacklevel=2,
        )

    base = groundlens_banking_rules(quality_floor=quality_floor)
    # Replace the legacy name with the archetype-aware canonical name.
    object.__setattr__(base, "name", f"decision_rationale_v1_{domain}")
    return base

Rules

groundlens.rules ¶

Classes¶

RuleEvidence(matched: bool, span: str, explanation: str) dataclass ¶

ChecklistRule(id: str, description: str, weight: float, sub_score: str, check: Callable[[str, str, str | None, dict[str, Any]], RuleEvidence], citation: str = '') dataclass ¶

RuleResult(rule_id: str, sub_score: str, weight: float, matched: bool, evidence_span: str, explanation: str) dataclass ¶

RuleSetResult(sub_scores: dict[str, float], quality: float, flagged: bool, rule_results: tuple[RuleResult, ...], audit_explanation: str) dataclass ¶

Attributes¶

spec: float property ¶

expl: float property ¶

bshift: float property ¶

groundedness: float property ¶

completeness: float property ¶

calibration: float property ¶

traceability: float property ¶

robustness: float property ¶

RuleSet(name: str, rules: tuple[ChecklistRule, ...], sub_scores: tuple[str, ...] = ('spec', 'expl', 'bshift'), quality_floor: float = _DEFAULT_QUALITY_FLOOR, flag_predicate: Callable[[dict[str, float]], bool] | None = None) dataclass ¶

Methods:¶

evaluate(*, question: str, response: str, context: str | None = None, metadata: dict[str, Any] | None = None) -> RuleSetResult ¶

Functions:¶

banking_rules(quality_floor: float = _DEFAULT_QUALITY_FLOOR) -> RuleSet ¶

groundlens_banking_rules(quality_floor: float = _DEFAULT_QUALITY_FLOOR) -> RuleSet ¶

decision_rationale_rules(domain: str = 'finance', regulations: tuple[str, ...] = (), quality_floor: float = _DEFAULT_QUALITY_FLOOR) -> RuleSet ¶

`groundlens.rules` ¶

`RuleEvidence(matched: bool, span: str, explanation: str)` `dataclass` ¶

`ChecklistRule(id: str, description: str, weight: float, sub_score: str, check: Callable[[str, str, str | None, dict[str, Any]], RuleEvidence], citation: str = '')` `dataclass` ¶

`RuleResult(rule_id: str, sub_score: str, weight: float, matched: bool, evidence_span: str, explanation: str)` `dataclass` ¶

`RuleSetResult(sub_scores: dict[str, float], quality: float, flagged: bool, rule_results: tuple[RuleResult, ...], audit_explanation: str)` `dataclass` ¶

`spec: float` `property` ¶

`expl: float` `property` ¶

`bshift: float` `property` ¶

`groundedness: float` `property` ¶

`completeness: float` `property` ¶

`calibration: float` `property` ¶

`traceability: float` `property` ¶

`robustness: float` `property` ¶

`RuleSet(name: str, rules: tuple[ChecklistRule, ...], sub_scores: tuple[str, ...] = ('spec', 'expl', 'bshift'), quality_floor: float = _DEFAULT_QUALITY_FLOOR, flag_predicate: Callable[[dict[str, float]], bool] | None = None)` `dataclass` ¶

`evaluate(*, question: str, response: str, context: str | None = None, metadata: dict[str, Any] | None = None) -> RuleSetResult` ¶

`banking_rules(quality_floor: float = _DEFAULT_QUALITY_FLOOR) -> RuleSet` ¶

`groundlens_banking_rules(quality_floor: float = _DEFAULT_QUALITY_FLOOR) -> RuleSet` ¶

`decision_rationale_rules(domain: str = 'finance', regulations: tuple[str, ...] = (), quality_floor: float = _DEFAULT_QUALITY_FLOOR) -> RuleSet` ¶