Propose

`groundlens.propose` ¶

Active-learning helper for bootstrapping a DGI calibration dataset.

A new deployment needs a verified-grounded corpus before :func:groundlens.compute_dgi can produce meaningful scores. Curating that corpus from scratch is the practical bottleneck most teams hit first. This module implements the propose half of an active-learning loop: given a list of self-contained SeedExample(context, question, grounded) triples and a text-generation callable, it produces a ranked batch of candidate (question, response) pairs for a human reviewer to label.

The loop is intentionally non-circular: the DGI score orders the candidates by uncertainty, but the label is supplied by the human at the end. Calibration on the labelled batch then sharpens the same DGI that proposed the next batch.

Public types¶

:class:SeedExample -- one verified-grounded triple (context, question, grounded) you supply as input.
:class:ProposedLabel -- one candidate ready for review.
:class:PropositionBatch -- the batch returned by :meth:groundlens.DGI.propose_labels.

All three are exposed at the top of the package.

References:¶

Marin, J. (2026). A Methodology for Building Human-Confabulated Hallucination Benchmarks. groundlens-dev/grounding-benchmark. CC BY 4.0.

Classes¶

`SeedExample(context: str, question: str, grounded: str)` `dataclass` ¶

One verified-grounded triple you supply to DGI.propose_labels.

A SeedExample binds a FAQ paragraph (context) to a question that paragraph answers (question) and the verified-grounded response to that question (grounded). Bundling the three together is what keeps the candidate generation coherent: the confabulation prompt receives the same context, question and grounded answer rather than randomly-paired pieces.

Attributes:

Name	Type	Description
`context`	`str`	A paragraph from the deployment's FAQ corpus that supports the grounded response.
`question`	`str`	A question whose answer is contained in `context`.
`grounded`	`str`	The verified-grounded response to `question` given `context`. The confabulation strategies rewrite this response under specific failure modes.

Raises:

Type	Description
`ValueError`	If any field is empty or whitespace-only.

Methods:¶

`__post_init__() -> None` ¶

Validate that every field is a non-empty, non-whitespace string.

Source code in src/groundlens/propose.py

def __post_init__(self) -> None:
    """Validate that every field is a non-empty, non-whitespace string."""
    for name in ("context", "question", "grounded"):
        value = getattr(self, name)
        if not isinstance(value, str) or not value.strip():
            msg = f"SeedExample.{name} must be a non-empty string."
            raise ValueError(msg)

`ProposedLabel(question: str, candidate_response: str, dgi_score: float, strategy: str, context_excerpt: str, uncertainty: float)` `dataclass` ¶

One candidate (question, response) pair ready for human review.

Attributes:

Name	Type	Description
`question`	`str`	A question grounded in one of the FAQ-corpus entries.
`candidate_response`	`str`	A confabulated response written by the generation LLM under the named `strategy`.
`dgi_score`	`float`	The DGI normalized score of the candidate against the current `mu_hat`. Lower scores mean stronger deferral signal.
`strategy`	`str`	The name of the confabulation strategy that produced this candidate (e.g. `"redefinition"`).
`context_excerpt`	`str`	The FAQ excerpt the question was anchored to.
`uncertainty`	`float`	Distance of `dgi_score` from the threshold used for ranking. Smaller = more uncertain = higher priority.

`PropositionBatch(items: tuple[ProposedLabel, ...], review_template: str, all_candidates: tuple[ProposedLabel, ...] = tuple(), strategies_used: tuple[str, ...] = tuple())` `dataclass` ¶

A batch of candidates returned by :meth:groundlens.DGI.propose_labels.

Attributes:

Name	Type	Description
`items`	`tuple[ProposedLabel, ...]`	Candidates ordered by acquisition score (most useful to label first). Length up to `n_to_label`.
`review_template`	`str`	A Markdown template instructing the human reviewer how to label the items in the batch.
`all_candidates`	`tuple[ProposedLabel, ...]`	Every candidate generated in the round, ordered by acquisition score. Useful for audit and debugging.
`strategies_used`	`tuple[str, ...]`	The tuple of strategy names actually used.

Functions:¶

`rank_for_labelling(candidates: list[ProposedLabel], *, n_to_label: int, diverse_fraction: float = 0.3) -> list[ProposedLabel]` ¶

Pick the n_to_label most useful candidates for a human reviewer.

The default acquisition mixes two signals:

Uncertainty (70%): the ceil((1 - diverse_fraction) * n_to_label) candidates with the smallest distance to the threshold. These are the candidates the current model finds hardest to classify, so a label on them shifts mu_hat the most.
Diversity (30%): the remaining slots are filled with candidates from strategies under-represented in the uncertainty subset, ensuring all strategies surface in the batch.

Parameters:

Name	Type	Description	Default
`candidates`	`list[ProposedLabel]`	Candidates to rank. Each carries its own `uncertainty` score; smaller is more uncertain.	required
`n_to_label`	`int`	How many candidates to return.	required
`diverse_fraction`	`float`	Fraction of the batch reserved for diversity. `0.3` by default.	`0.3`

Returns:

Type	Description
`list[ProposedLabel]`	List of selected candidates in ranked order.

Source code in src/groundlens/propose.py

def rank_for_labelling(
    candidates: list[ProposedLabel],
    *,
    n_to_label: int,
    diverse_fraction: float = 0.3,
) -> list[ProposedLabel]:
    """Pick the ``n_to_label`` most useful candidates for a human reviewer.

    The default acquisition mixes two signals:

    - **Uncertainty (70%):** the ``ceil((1 - diverse_fraction) * n_to_label)``
      candidates with the smallest distance to the threshold. These are
      the candidates the current model finds hardest to classify, so a
      label on them shifts ``mu_hat`` the most.
    - **Diversity (30%):** the remaining slots are filled with
      candidates from strategies under-represented in the uncertainty
      subset, ensuring all strategies surface in the batch.

    Args:
        candidates: Candidates to rank. Each carries its own
            ``uncertainty`` score; smaller is more uncertain.
        n_to_label: How many candidates to return.
        diverse_fraction: Fraction of the batch reserved for diversity.
            ``0.3`` by default.

    Returns:
        List of selected candidates in ranked order.
    """
    if n_to_label <= 0:
        return []
    if not candidates:
        return []

    n_uncertain = max(1, round((1.0 - diverse_fraction) * n_to_label))
    n_diverse = max(0, n_to_label - n_uncertain)

    # 1) Uncertainty top-n.
    by_uncertainty = sorted(candidates, key=lambda c: c.uncertainty)
    uncertain_pick = by_uncertainty[:n_uncertain]

    # 2) Diversity: among the remaining candidates, prefer strategies
    #    NOT represented in `uncertain_pick`.
    used_strategies = {c.strategy for c in uncertain_pick}
    rest = list(by_uncertainty[n_uncertain:])
    rest.sort(
        key=lambda c: (
            0 if c.strategy not in used_strategies else 1,
            c.uncertainty,
        )
    )
    diverse_pick = rest[:n_diverse]

    return [*uncertain_pick, *diverse_pick]

`build_review_template(items: list[ProposedLabel]) -> str` ¶

Render the Markdown template for a batch of proposed labels.

Source code in src/groundlens/propose.py

def build_review_template(items: list[ProposedLabel]) -> str:
    """Render the Markdown template for a batch of proposed labels."""
    body_parts = []
    for idx, it in enumerate(items, start=1):
        body_parts.append(
            _ITEM_TEMPLATE.format(
                idx=idx,
                total=len(items),
                strategy=it.strategy,
                context_excerpt=it.context_excerpt[:500],
                question=it.question,
                candidate_response=it.candidate_response[:1000],
                dgi_score=it.dgi_score,
                uncertainty=it.uncertainty,
            )
        )
    return _REVIEW_TEMPLATE.format(n=len(items), items="\n".join(body_parts))

Propose

groundlens.propose ¶

Public types¶

References:¶

Classes¶

SeedExample(context: str, question: str, grounded: str) dataclass ¶

Methods:¶

__post_init__() -> None ¶

ProposedLabel(question: str, candidate_response: str, dgi_score: float, strategy: str, context_excerpt: str, uncertainty: float) dataclass ¶

PropositionBatch(items: tuple[ProposedLabel, ...], review_template: str, all_candidates: tuple[ProposedLabel, ...] = tuple(), strategies_used: tuple[str, ...] = tuple()) dataclass ¶

Functions:¶

rank_for_labelling(candidates: list[ProposedLabel], *, n_to_label: int, diverse_fraction: float = 0.3) -> list[ProposedLabel] ¶

build_review_template(items: list[ProposedLabel]) -> str ¶

`groundlens.propose` ¶

`SeedExample(context: str, question: str, grounded: str)` `dataclass` ¶

`__post_init__() -> None` ¶

`ProposedLabel(question: str, candidate_response: str, dgi_score: float, strategy: str, context_excerpt: str, uncertainty: float)` `dataclass` ¶

`PropositionBatch(items: tuple[ProposedLabel, ...], review_template: str, all_candidates: tuple[ProposedLabel, ...] = tuple(), strategies_used: tuple[str, ...] = tuple())` `dataclass` ¶

`rank_for_labelling(candidates: list[ProposedLabel], *, n_to_label: int, diverse_fraction: float = 0.3) -> list[ProposedLabel]` ¶

`build_review_template(items: list[ProposedLabel]) -> str` ¶