Skip to content

Propose

groundlens.propose

Active-learning helper for bootstrapping a DGI calibration dataset.

A new deployment needs a verified-grounded corpus before :func:groundlens.compute_dgi can produce meaningful scores. Curating that corpus from scratch is the practical bottleneck most teams hit first. This module implements the propose half of an active-learning loop: given a list of self-contained SeedExample(context, question, grounded) triples and a text-generation callable, it produces a ranked batch of candidate (question, response) pairs for a human reviewer to label.

The loop is intentionally non-circular: the DGI score orders the candidates by uncertainty, but the label is supplied by the human at the end. Calibration on the labelled batch then sharpens the same DGI that proposed the next batch.

Public types

  • :class:SeedExample -- one verified-grounded triple (context, question, grounded) you supply as input.
  • :class:ProposedLabel -- one candidate ready for review.
  • :class:PropositionBatch -- the batch returned by :meth:groundlens.DGI.propose_labels.

All three are exposed at the top of the package.

References:

Marin, J. (2026). A Methodology for Building Human-Confabulated Hallucination Benchmarks. groundlens-dev/grounding-benchmark. CC BY 4.0.

Classes

SeedExample(context: str, question: str, grounded: str) dataclass

One verified-grounded triple you supply to DGI.propose_labels.

A SeedExample binds a FAQ paragraph (context) to a question that paragraph answers (question) and the verified-grounded response to that question (grounded). Bundling the three together is what keeps the candidate generation coherent: the confabulation prompt receives the same context, question and grounded answer rather than randomly-paired pieces.

Attributes:

Name Type Description
context str

A paragraph from the deployment's FAQ corpus that supports the grounded response.

question str

A question whose answer is contained in context.

grounded str

The verified-grounded response to question given context. The confabulation strategies rewrite this response under specific failure modes.

Raises:

Type Description
ValueError

If any field is empty or whitespace-only.

Methods:
__post_init__() -> None

Validate that every field is a non-empty, non-whitespace string.

Source code in src/groundlens/propose.py
def __post_init__(self) -> None:
    """Validate that every field is a non-empty, non-whitespace string."""
    for name in ("context", "question", "grounded"):
        value = getattr(self, name)
        if not isinstance(value, str) or not value.strip():
            msg = f"SeedExample.{name} must be a non-empty string."
            raise ValueError(msg)

ProposedLabel(question: str, candidate_response: str, dgi_score: float, strategy: str, context_excerpt: str, uncertainty: float) dataclass

One candidate (question, response) pair ready for human review.

Attributes:

Name Type Description
question str

A question grounded in one of the FAQ-corpus entries.

candidate_response str

A confabulated response written by the generation LLM under the named strategy.

dgi_score float

The DGI normalized score of the candidate against the current mu_hat. Lower scores mean stronger deferral signal.

strategy str

The name of the confabulation strategy that produced this candidate (e.g. "redefinition").

context_excerpt str

The FAQ excerpt the question was anchored to.

uncertainty float

Distance of dgi_score from the threshold used for ranking. Smaller = more uncertain = higher priority.

PropositionBatch(items: tuple[ProposedLabel, ...], review_template: str, all_candidates: tuple[ProposedLabel, ...] = tuple(), strategies_used: tuple[str, ...] = tuple()) dataclass

A batch of candidates returned by :meth:groundlens.DGI.propose_labels.

Attributes:

Name Type Description
items tuple[ProposedLabel, ...]

Candidates ordered by acquisition score (most useful to label first). Length up to n_to_label.

review_template str

A Markdown template instructing the human reviewer how to label the items in the batch.

all_candidates tuple[ProposedLabel, ...]

Every candidate generated in the round, ordered by acquisition score. Useful for audit and debugging.

strategies_used tuple[str, ...]

The tuple of strategy names actually used.

Functions:

rank_for_labelling(candidates: list[ProposedLabel], *, n_to_label: int, diverse_fraction: float = 0.3) -> list[ProposedLabel]

Pick the n_to_label most useful candidates for a human reviewer.

The default acquisition mixes two signals:

  • Uncertainty (70%): the ceil((1 - diverse_fraction) * n_to_label) candidates with the smallest distance to the threshold. These are the candidates the current model finds hardest to classify, so a label on them shifts mu_hat the most.
  • Diversity (30%): the remaining slots are filled with candidates from strategies under-represented in the uncertainty subset, ensuring all strategies surface in the batch.

Parameters:

Name Type Description Default
candidates list[ProposedLabel]

Candidates to rank. Each carries its own uncertainty score; smaller is more uncertain.

required
n_to_label int

How many candidates to return.

required
diverse_fraction float

Fraction of the batch reserved for diversity. 0.3 by default.

0.3

Returns:

Type Description
list[ProposedLabel]

List of selected candidates in ranked order.

Source code in src/groundlens/propose.py
def rank_for_labelling(
    candidates: list[ProposedLabel],
    *,
    n_to_label: int,
    diverse_fraction: float = 0.3,
) -> list[ProposedLabel]:
    """Pick the ``n_to_label`` most useful candidates for a human reviewer.

    The default acquisition mixes two signals:

    - **Uncertainty (70%):** the ``ceil((1 - diverse_fraction) * n_to_label)``
      candidates with the smallest distance to the threshold. These are
      the candidates the current model finds hardest to classify, so a
      label on them shifts ``mu_hat`` the most.
    - **Diversity (30%):** the remaining slots are filled with
      candidates from strategies under-represented in the uncertainty
      subset, ensuring all strategies surface in the batch.

    Args:
        candidates: Candidates to rank. Each carries its own
            ``uncertainty`` score; smaller is more uncertain.
        n_to_label: How many candidates to return.
        diverse_fraction: Fraction of the batch reserved for diversity.
            ``0.3`` by default.

    Returns:
        List of selected candidates in ranked order.
    """
    if n_to_label <= 0:
        return []
    if not candidates:
        return []

    n_uncertain = max(1, round((1.0 - diverse_fraction) * n_to_label))
    n_diverse = max(0, n_to_label - n_uncertain)

    # 1) Uncertainty top-n.
    by_uncertainty = sorted(candidates, key=lambda c: c.uncertainty)
    uncertain_pick = by_uncertainty[:n_uncertain]

    # 2) Diversity: among the remaining candidates, prefer strategies
    #    NOT represented in `uncertain_pick`.
    used_strategies = {c.strategy for c in uncertain_pick}
    rest = list(by_uncertainty[n_uncertain:])
    rest.sort(
        key=lambda c: (
            0 if c.strategy not in used_strategies else 1,
            c.uncertainty,
        )
    )
    diverse_pick = rest[:n_diverse]

    return [*uncertain_pick, *diverse_pick]

build_review_template(items: list[ProposedLabel]) -> str

Render the Markdown template for a batch of proposed labels.

Source code in src/groundlens/propose.py
def build_review_template(items: list[ProposedLabel]) -> str:
    """Render the Markdown template for a batch of proposed labels."""
    body_parts = []
    for idx, it in enumerate(items, start=1):
        body_parts.append(
            _ITEM_TEMPLATE.format(
                idx=idx,
                total=len(items),
                strategy=it.strategy,
                context_excerpt=it.context_excerpt[:500],
                question=it.question,
                candidate_response=it.candidate_response[:1000],
                dgi_score=it.dgi_score,
                uncertainty=it.uncertainty,
            )
        )
    return _REVIEW_TEMPLATE.format(n=len(items), items="\n".join(body_parts))