Propose
groundlens.propose
¶
Active-learning helper for bootstrapping a DGI calibration dataset.
A new deployment needs a verified-grounded corpus before
:func:groundlens.compute_dgi can produce meaningful scores. Curating
that corpus from scratch is the practical bottleneck most teams hit
first. This module implements the propose half of an
active-learning loop: given a list of self-contained
SeedExample(context, question, grounded) triples and a text-generation
callable, it produces a ranked batch of candidate (question, response)
pairs for a human reviewer to label.
The loop is intentionally non-circular: the DGI score orders the candidates by uncertainty, but the label is supplied by the human at the end. Calibration on the labelled batch then sharpens the same DGI that proposed the next batch.
Public types¶
- :class:
SeedExample-- one verified-grounded triple(context, question, grounded)you supply as input. - :class:
ProposedLabel-- one candidate ready for review. - :class:
PropositionBatch-- the batch returned by :meth:groundlens.DGI.propose_labels.
All three are exposed at the top of the package.
References:¶
Marin, J. (2026). A Methodology for Building Human-Confabulated Hallucination Benchmarks. groundlens-dev/grounding-benchmark. CC BY 4.0.
Classes¶
SeedExample(context: str, question: str, grounded: str)
dataclass
¶
One verified-grounded triple you supply to DGI.propose_labels.
A SeedExample binds a FAQ paragraph (context) to a question
that paragraph answers (question) and the verified-grounded
response to that question (grounded). Bundling the three
together is what keeps the candidate generation coherent: the
confabulation prompt receives the same context, question and
grounded answer rather than randomly-paired pieces.
Attributes:
| Name | Type | Description |
|---|---|---|
context |
str
|
A paragraph from the deployment's FAQ corpus that supports the grounded response. |
question |
str
|
A question whose answer is contained in |
grounded |
str
|
The verified-grounded response to |
Raises:
| Type | Description |
|---|---|
ValueError
|
If any field is empty or whitespace-only. |
Methods:¶
__post_init__() -> None
¶
Validate that every field is a non-empty, non-whitespace string.
Source code in src/groundlens/propose.py
ProposedLabel(question: str, candidate_response: str, dgi_score: float, strategy: str, context_excerpt: str, uncertainty: float)
dataclass
¶
One candidate (question, response) pair ready for human review.
Attributes:
| Name | Type | Description |
|---|---|---|
question |
str
|
A question grounded in one of the FAQ-corpus entries. |
candidate_response |
str
|
A confabulated response written by the
generation LLM under the named |
dgi_score |
float
|
The DGI normalized score of the candidate against
the current |
strategy |
str
|
The name of the confabulation strategy that produced
this candidate (e.g. |
context_excerpt |
str
|
The FAQ excerpt the question was anchored to. |
uncertainty |
float
|
Distance of |
PropositionBatch(items: tuple[ProposedLabel, ...], review_template: str, all_candidates: tuple[ProposedLabel, ...] = tuple(), strategies_used: tuple[str, ...] = tuple())
dataclass
¶
A batch of candidates returned by :meth:groundlens.DGI.propose_labels.
Attributes:
| Name | Type | Description |
|---|---|---|
items |
tuple[ProposedLabel, ...]
|
Candidates ordered by acquisition score (most useful to
label first). Length up to |
review_template |
str
|
A Markdown template instructing the human reviewer how to label the items in the batch. |
all_candidates |
tuple[ProposedLabel, ...]
|
Every candidate generated in the round, ordered by acquisition score. Useful for audit and debugging. |
strategies_used |
tuple[str, ...]
|
The tuple of strategy names actually used. |
Functions:¶
rank_for_labelling(candidates: list[ProposedLabel], *, n_to_label: int, diverse_fraction: float = 0.3) -> list[ProposedLabel]
¶
Pick the n_to_label most useful candidates for a human reviewer.
The default acquisition mixes two signals:
- Uncertainty (70%): the
ceil((1 - diverse_fraction) * n_to_label)candidates with the smallest distance to the threshold. These are the candidates the current model finds hardest to classify, so a label on them shiftsmu_hatthe most. - Diversity (30%): the remaining slots are filled with candidates from strategies under-represented in the uncertainty subset, ensuring all strategies surface in the batch.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
candidates
|
list[ProposedLabel]
|
Candidates to rank. Each carries its own
|
required |
n_to_label
|
int
|
How many candidates to return. |
required |
diverse_fraction
|
float
|
Fraction of the batch reserved for diversity.
|
0.3
|
Returns:
| Type | Description |
|---|---|
list[ProposedLabel]
|
List of selected candidates in ranked order. |
Source code in src/groundlens/propose.py
build_review_template(items: list[ProposedLabel]) -> str
¶
Render the Markdown template for a batch of proposed labels.