API Reference¶

This page provides the complete API reference for groundlens. All public classes and functions are documented with their signatures, parameters, return types, and examples.

For auto-generated documentation from source docstrings, ensure mkdocstrings is configured in your MkDocs build.

Core Functions¶

compute_sgi¶

`groundlens.sgi.compute_sgi(question: str, context: str, response: str, *, model: str = DEFAULT_MODEL) -> SGIResult` ¶

Compute the Semantic Grounding Index for a response.

Parameters:

Name	Type	Description	Default
`question`	`str`	The input query.	required
`context`	`str`	Source document, retrieved chunks, or reference text.	required
`response`	`str`	The LLM output to evaluate.	required
`model`	`str`	Sentence transformer model name. Default `all-MiniLM-L6-v2`.	`DEFAULT_MODEL`

Returns:

Type	Description
`SGIResult`	SGIResult with raw score, normalized score, and flag status.

Raises:

Type	Description
`ValueError`	If any input string is empty.

Example

from groundlens import compute_sgi result = compute_sgi( ... question="What is the capital of France?", ... context="France is in Western Europe. Its capital is Paris.", ... response="The capital of France is Paris.", ... ) result.flagged False

Source code in src/groundlens/sgi.py

def compute_sgi(
    question: str,
    context: str,
    response: str,
    *,
    model: str = DEFAULT_MODEL,
) -> SGIResult:
    """Compute the Semantic Grounding Index for a response.

    Args:
        question: The input query.
        context: Source document, retrieved chunks, or reference text.
        response: The LLM output to evaluate.
        model: Sentence transformer model name. Default ``all-MiniLM-L6-v2``.

    Returns:
        SGIResult with raw score, normalized score, and flag status.

    Raises:
        ValueError: If any input string is empty.

    Example:
        >>> from groundlens import compute_sgi
        >>> result = compute_sgi(
        ...     question="What is the capital of France?",
        ...     context="France is in Western Europe. Its capital is Paris.",
        ...     response="The capital of France is Paris.",
        ... )
        >>> result.flagged
        False
    """
    if not question.strip():
        msg = "question must be a non-empty string."
        raise ValueError(msg)
    if not context.strip():
        msg = "context must be a non-empty string."
        raise ValueError(msg)
    if not response.strip():
        msg = "response must be a non-empty string."
        raise ValueError(msg)

    embeddings = encode_texts([question, context, response], model_name=model)
    q_emb, ctx_emb, resp_emb = embeddings[0], embeddings[1], embeddings[2]

    q_dist = euclidean_distance(resp_emb, q_emb)
    ctx_dist = euclidean_distance(resp_emb, ctx_emb)

    # Degenerate case: response identical to context.
    if ctx_dist < 1e-8:
        return SGIResult(
            value=10.0,
            normalized=1.0,
            flagged=False,
            q_dist=round(q_dist, 4),
            ctx_dist=round(ctx_dist, 4),
        )

    # Degenerate case: response identical to question.
    if q_dist < 1e-8:
        return SGIResult(
            value=0.0,
            normalized=0.0,
            flagged=True,
            q_dist=round(q_dist, 4),
            ctx_dist=round(ctx_dist, 4),
        )

    raw = q_dist / ctx_dist
    normalized = normalize_sgi(raw)

    return SGIResult(
        value=round(raw, 4),
        normalized=round(normalized, 4),
        flagged=raw < SGI_REVIEW,
        q_dist=round(q_dist, 4),
        ctx_dist=round(ctx_dist, 4),
    )

compute_dgi¶

`groundlens.dgi.compute_dgi(question: str, response: str, *, model: str = DEFAULT_MODEL, reference_csv: str | None = None) -> DGIResult` ¶

Compute the Directional Grounding Index for a response.

Parameters:

Name	Type	Description	Default
`question`	`str`	The input query.	required
`response`	`str`	The LLM output to evaluate.	required
`model`	`str`	Sentence transformer model name.	`DEFAULT_MODEL`
`reference_csv`	`str \| None`	Path to domain-specific calibration CSV. If `None`, uses the bundled dataset.	`None`

Returns:

Type	Description
`DGIResult`	DGIResult with raw score, normalized score, and flag status.

Raises:

Type	Description
`ValueError`	If question or response is empty.

Example

from groundlens import compute_dgi result = compute_dgi( ... question="What causes seasons on Earth?", ... response="Seasons are caused by Earth's 23.5-degree axial tilt.", ... ) result.flagged False

Source code in src/groundlens/dgi.py

def compute_dgi(
    question: str,
    response: str,
    *,
    model: str = DEFAULT_MODEL,
    reference_csv: str | None = None,
) -> DGIResult:
    """Compute the Directional Grounding Index for a response.

    Args:
        question: The input query.
        response: The LLM output to evaluate.
        model: Sentence transformer model name.
        reference_csv: Path to domain-specific calibration CSV.
            If ``None``, uses the bundled dataset.

    Returns:
        DGIResult with raw score, normalized score, and flag status.

    Raises:
        ValueError: If question or response is empty.

    Example:
        >>> from groundlens import compute_dgi
        >>> result = compute_dgi(
        ...     question="What causes seasons on Earth?",
        ...     response="Seasons are caused by Earth's 23.5-degree axial tilt.",
        ... )
        >>> result.flagged
        False
    """
    if not question.strip():
        msg = "question must be a non-empty string."
        raise ValueError(msg)
    if not response.strip():
        msg = "response must be a non-empty string."
        raise ValueError(msg)

    mu_hat = _get_mu_hat(model, reference_csv)
    embeddings = encode_texts([question, response], model_name=model)
    q_emb, r_emb = embeddings[0], embeddings[1]

    delta = displacement_vector(q_emb, r_emb)
    magnitude = float(np.linalg.norm(delta))

    # Degenerate case: response identical to question.
    if magnitude < 1e-8:
        return DGIResult(value=0.0, normalized=0.0, flagged=True)

    delta_hat = delta / magnitude
    gamma = float(np.dot(delta_hat, mu_hat))

    if math.isnan(gamma):
        logger.warning("DGI produced NaN — check embedding dimensions.")
        return DGIResult(value=0.0, normalized=0.0, flagged=True)

    normalized = round(normalize_dgi(gamma), 4)

    return DGIResult(
        value=round(gamma, 4),
        normalized=normalized,
        flagged=gamma < DGI_PASS,
    )

evaluate¶

`groundlens.evaluate.evaluate(question: str, response: str, context: str | None = None, *, model: str = DEFAULT_MODEL, reference_csv: str | None = None) -> GroundlensScore` ¶

Evaluate a single LLM response for hallucination risk.

Auto-selects scoring method

SGI when context is provided (grounded verification).
DGI when context is None (context-free verification).

Parameters:

Name	Type	Description	Default
`question`	`str`	The input query.	required
`response`	`str`	The LLM output to evaluate.	required
`context`	`str \| None`	Source document or retrieved text. If provided, SGI is used.	`None`
`model`	`str`	Sentence transformer model name.	`DEFAULT_MODEL`
`reference_csv`	`str \| None`	DGI calibration CSV path (only used when context is None).	`None`

Returns:

Type	Description
`GroundlensScore`	GroundlensScore with method, value, flag, and explanation.

Example

from groundlens import evaluate

With context → SGI¶

score = evaluate("Q?", "A.", context="Source text.") score.method 'sgi'

Without context → DGI¶

score = evaluate("Q?", "A.") score.method 'dgi'

Source code in src/groundlens/evaluate.py

def evaluate(
    question: str,
    response: str,
    context: str | None = None,
    *,
    model: str = DEFAULT_MODEL,
    reference_csv: str | None = None,
) -> GroundlensScore:
    """Evaluate a single LLM response for hallucination risk.

    Auto-selects scoring method:
        - **SGI** when ``context`` is provided (grounded verification).
        - **DGI** when ``context`` is ``None`` (context-free verification).

    Args:
        question: The input query.
        response: The LLM output to evaluate.
        context: Source document or retrieved text. If provided, SGI is used.
        model: Sentence transformer model name.
        reference_csv: DGI calibration CSV path (only used when context is None).

    Returns:
        GroundlensScore with method, value, flag, and explanation.

    Example:
        >>> from groundlens import evaluate
        >>> # With context → SGI
        >>> score = evaluate("Q?", "A.", context="Source text.")
        >>> score.method
        'sgi'
        >>> # Without context → DGI
        >>> score = evaluate("Q?", "A.")
        >>> score.method
        'dgi'
    """
    result: SGIResult | DGIResult
    if context is not None and context.strip():
        result = compute_sgi(
            question=question,
            context=context,
            response=response,
            model=model,
        )
    else:
        result = compute_dgi(
            question=question,
            response=response,
            model=model,
            reference_csv=reference_csv,
        )

    return GroundlensScore(
        value=result.value,
        normalized=result.normalized,
        flagged=result.flagged,
        method=result.method,
        explanation=result.explanation,
        detail=result,
    )

evaluate_batch¶

`groundlens.evaluate.evaluate_batch(items: list[dict[str, str]], *, model: str = DEFAULT_MODEL, reference_csv: str | None = None) -> list[GroundlensScore]` ¶

Evaluate a batch of LLM responses.

Each item in the list is a dict with keys

question (required)
response (required)
context (optional — triggers SGI when present)

Parameters:

Name	Type	Description	Default
`items`	`list[dict[str, str]]`	List of dicts, each containing question, response, and optionally context.	required
`model`	`str`	Sentence transformer model name.	`DEFAULT_MODEL`
`reference_csv`	`str \| None`	DGI calibration CSV path.	`None`

Returns:

Type	Description
`list[GroundlensScore]`	List of GroundlensScore results, one per input item.

Raises:

Type	Description
`KeyError`	If any item is missing `question` or `response`.

Example

from groundlens import evaluate_batch items = [ ... {"question": "Q1?", "response": "A1.", "context": "C1."}, ... {"question": "Q2?", "response": "A2."}, ... ] results = evaluate_batch(items) len(results) 2

Source code in src/groundlens/evaluate.py

def evaluate_batch(
    items: list[dict[str, str]],
    *,
    model: str = DEFAULT_MODEL,
    reference_csv: str | None = None,
) -> list[GroundlensScore]:
    """Evaluate a batch of LLM responses.

    Each item in the list is a dict with keys:
        - ``question`` (required)
        - ``response`` (required)
        - ``context`` (optional — triggers SGI when present)

    Args:
        items: List of dicts, each containing question, response, and
            optionally context.
        model: Sentence transformer model name.
        reference_csv: DGI calibration CSV path.

    Returns:
        List of GroundlensScore results, one per input item.

    Raises:
        KeyError: If any item is missing ``question`` or ``response``.

    Example:
        >>> from groundlens import evaluate_batch
        >>> items = [
        ...     {"question": "Q1?", "response": "A1.", "context": "C1."},
        ...     {"question": "Q2?", "response": "A2."},
        ... ]
        >>> results = evaluate_batch(items)
        >>> len(results)
        2
    """
    results: list[GroundlensScore] = []

    for i, item in enumerate(items):
        if "question" not in item:
            msg = f"Item {i} missing required key 'question'."
            raise KeyError(msg)
        if "response" not in item:
            msg = f"Item {i} missing required key 'response'."
            raise KeyError(msg)

        score = evaluate(
            question=item["question"],
            response=item["response"],
            context=item.get("context"),
            model=model,
            reference_csv=reference_csv,
        )
        results.append(score)

    logger.info(
        "Evaluated %d items (%d flagged).", len(results), sum(1 for r in results if r.flagged)
    )

    return results

calibrate¶

`groundlens.calibrate.calibrate(pairs: list[tuple[str, str]] | None = None, csv_path: str | None = None, *, model: str = DEFAULT_MODEL, metadata: dict[str, str] | None = None) -> CalibrationResult` ¶

Compute a DGI reference direction from calibration data.

Provide either pairs directly or a csv_path to a file with verified grounded (question, response) pairs.

Parameters:

Name	Type	Description	Default
`pairs`	`list[tuple[str, str]] \| None`	List of (question, response) tuples.	`None`
`csv_path`	`str \| None`	Path to a CSV file with `question` and `response` columns.	`None`
`model`	`str`	Sentence transformer model to use for embedding.	`DEFAULT_MODEL`
`metadata`	`dict[str, str] \| None`	Optional metadata to attach (domain name, date, notes).	`None`

Returns:

Type	Description
`CalibrationResult`	CalibrationResult with computed reference direction and statistics.

Raises:

Type	Description
`ValueError`	If neither `pairs` nor `csv_path` is provided, or if the data contains fewer than 5 pairs.

Example

result = calibrate(pairs=[("Q?", "A.") for _ in range(20)]) result.n_pairs 20

Source code in src/groundlens/calibrate.py

def calibrate(
    pairs: list[tuple[str, str]] | None = None,
    csv_path: str | None = None,
    *,
    model: str = DEFAULT_MODEL,
    metadata: dict[str, str] | None = None,
) -> CalibrationResult:
    """Compute a DGI reference direction from calibration data.

    Provide either ``pairs`` directly or a ``csv_path`` to a file
    with verified grounded (question, response) pairs.

    Args:
        pairs: List of (question, response) tuples.
        csv_path: Path to a CSV file with ``question`` and ``response`` columns.
        model: Sentence transformer model to use for embedding.
        metadata: Optional metadata to attach (domain name, date, notes).

    Returns:
        CalibrationResult with computed reference direction and statistics.

    Raises:
        ValueError: If neither ``pairs`` nor ``csv_path`` is provided,
            or if the data contains fewer than 5 pairs.

    Example:
        >>> result = calibrate(pairs=[("Q?", "A.") for _ in range(20)])
        >>> result.n_pairs
        20
    """
    if csv_path is not None:
        from groundlens._internal.csv_loader import load_reference_pairs

        pairs = load_reference_pairs(csv_path)
    elif pairs is None:
        msg = "Provide either 'pairs' or 'csv_path'."
        raise ValueError(msg)

    if len(pairs) < 5:
        msg = (
            f"Calibration requires at least 5 pairs, got {len(pairs)}. "
            "More pairs (20-100) produce better reference directions."
        )
        raise ValueError(msg)

    logger.info("Calibrating DGI with %d pairs using model %s.", len(pairs), model)

    mu_hat = _compute_reference_direction(pairs, model)

    # Estimate concentration parameter (kappa) from resultant length.
    # This is a rough estimate — the true MLE for von Mises-Fisher is
    # more complex, but the resultant length R-bar is a sufficient
    # indicator of calibration quality.
    from groundlens._internal.embeddings import encode_texts
    from groundlens._internal.geometry import displacement_vector, unit_normalize

    texts: list[str] = []
    for q, r in pairs:
        texts.extend([q, r])
    embeddings = encode_texts(texts, model_name=model)

    unit_displacements = []
    for i in range(len(pairs)):
        delta = displacement_vector(embeddings[i * 2], embeddings[i * 2 + 1])
        norm = float(np.linalg.norm(delta))
        if norm > 1e-8:
            unit_displacements.append(unit_normalize(delta))

    if unit_displacements:
        r_bar = float(np.linalg.norm(np.mean(np.stack(unit_displacements), axis=0)))
    else:
        r_bar = 0.0

    # Approximate kappa from R-bar (Sra, 2012).
    d = mu_hat.shape[0]
    kappa = r_bar * (d - r_bar**2) / (1 - r_bar**2) if r_bar < 0.99 else 100.0

    return CalibrationResult(
        model=model,
        n_pairs=len(pairs),
        embedding_dim=int(mu_hat.shape[0]),
        mu_hat=mu_hat,
        concentration=round(kappa, 2),
        metadata=metadata or {},
    )

Core Classes¶

SGI¶

`groundlens.sgi.SGI(model: str = DEFAULT_MODEL)` ¶

Reusable SGI scorer with a pre-configured embedding model.

Use this class when evaluating multiple responses with the same model to avoid repeating the model parameter.

Example

sgi = SGI(model="all-MiniLM-L6-v2") result = sgi.score( ... question="What is X?", ... context="X is Y.", ... response="X is Y.", ... ) result.flagged False

Initialize SGI scorer.

Parameters:

Name	Type	Description	Default
`model`	`str`	Sentence transformer model name or path.	`DEFAULT_MODEL`

Source code in src/groundlens/sgi.py

def __init__(self, model: str = DEFAULT_MODEL) -> None:
    """Initialize SGI scorer.

    Args:
        model: Sentence transformer model name or path.
    """
    self.model = model

Functions¶

`score(question: str, context: str, response: str) -> SGIResult` ¶

Compute SGI for a single response.

Parameters:

Name	Type	Description	Default
`question`	`str`	The input query.	required
`context`	`str`	Source document or reference text.	required
`response`	`str`	The LLM output to evaluate.	required

Returns:

Type	Description
`SGIResult`	SGIResult with score and flag status.

Source code in src/groundlens/sgi.py

def score(
    self,
    question: str,
    context: str,
    response: str,
) -> SGIResult:
    """Compute SGI for a single response.

    Args:
        question: The input query.
        context: Source document or reference text.
        response: The LLM output to evaluate.

    Returns:
        SGIResult with score and flag status.
    """
    return compute_sgi(
        question=question,
        context=context,
        response=response,
        model=self.model,
    )

DGI¶

`groundlens.dgi.DGI(model: str = DEFAULT_MODEL, reference_csv: str | None = None)` ¶

Reusable DGI scorer with pre-configured model and calibration.

Use this class when evaluating multiple responses against the same reference direction. Supports both bundled and custom calibration.

Example

dgi = DGI() result = dgi.score( ... question="What is ML?", ... response="ML is a branch of AI.", ... ) result.flagged False

dgi = DGI(reference_csv="my_domain_pairs.csv") result = dgi.score(question="...", response="...")

Initialize DGI scorer.

Parameters:

Name	Type	Description	Default
`model`	`str`	Sentence transformer model name.	`DEFAULT_MODEL`
`reference_csv`	`str \| None`	Path to domain-specific calibration CSV.	`None`

Source code in src/groundlens/dgi.py

def __init__(
    self,
    model: str = DEFAULT_MODEL,
    reference_csv: str | None = None,
) -> None:
    """Initialize DGI scorer.

    Args:
        model: Sentence transformer model name.
        reference_csv: Path to domain-specific calibration CSV.
    """
    self.model = model
    self.reference_csv = reference_csv

Functions¶

`calibrate(pairs: list[tuple[str, str]] | None = None, csv_path: str | None = None) -> None` ¶

Set custom calibration data.

Either provide pairs directly or a path to a CSV file. This replaces any previously cached reference direction.

Parameters:

Name	Type	Description	Default
`pairs`	`list[tuple[str, str]] \| None`	List of verified (question, response) tuples.	`None`
`csv_path`	`str \| None`	Path to a calibration CSV file.	`None`

Raises:

Type	Description
`ValueError`	If neither `pairs` nor `csv_path` is provided.

Source code in src/groundlens/dgi.py

def calibrate(
    self,
    pairs: list[tuple[str, str]] | None = None,
    csv_path: str | None = None,
) -> None:
    """Set custom calibration data.

    Either provide pairs directly or a path to a CSV file.
    This replaces any previously cached reference direction.

    Args:
        pairs: List of verified (question, response) tuples.
        csv_path: Path to a calibration CSV file.

    Raises:
        ValueError: If neither ``pairs`` nor ``csv_path`` is provided.
    """
    if csv_path is not None:
        self.reference_csv = csv_path
        # Force recomputation on next score() call.
        cache_key = (self.model, csv_path)
        _mu_hat_cache.pop(cache_key, None)
        return

    if pairs is not None:
        # Compute and cache the reference direction directly.
        mu = _compute_reference_direction(pairs, self.model)
        cache_key = (self.model, "__inline__")
        _mu_hat_cache[cache_key] = mu
        self.reference_csv = "__inline__"
        return

    msg = "Provide either 'pairs' or 'csv_path' for calibration."
    raise ValueError(msg)

`score(question: str, response: str) -> DGIResult` ¶

Compute DGI for a single response.

Parameters:

Name	Type	Description	Default
`question`	`str`	The input query.	required
`response`	`str`	The LLM output to evaluate.	required

Returns:

Type	Description
`DGIResult`	DGIResult with score and flag status.

Source code in src/groundlens/dgi.py

def score(self, question: str, response: str) -> DGIResult:
    """Compute DGI for a single response.

    Args:
        question: The input query.
        response: The LLM output to evaluate.

    Returns:
        DGIResult with score and flag status.
    """
    ref = self.reference_csv if self.reference_csv != "__inline__" else None
    if self.reference_csv == "__inline__":
        # Use the inline-calibrated mu_hat.
        cache_key = (self.model, "__inline__")
        if cache_key not in _mu_hat_cache:
            msg = "Call calibrate() before score() when using inline pairs."
            raise RuntimeError(msg)

    return compute_dgi(
        question=question,
        response=response,
        model=self.model,
        reference_csv=ref,
    )

Result Types¶

SGIResult¶

`groundlens.score.SGIResult(value: float, normalized: float, flagged: bool, q_dist: float, ctx_dist: float, method: str = 'sgi', explanation: str = '')` `dataclass` ¶

Result of Semantic Grounding Index computation.

SGI measures whether a response engaged with the provided context or stayed anchored to the question. Higher values indicate stronger context engagement (grounded).

Attributes:

Name	Type	Description
`value`	`float`	Raw SGI score = dist(response, question) / dist(response, context).
`normalized`	`float`	Score mapped to [0, 1] via tanh normalization.
`flagged`	`bool`	`True` if the score is below the review threshold.
`q_dist`	`float`	Euclidean distance from response to question embedding.
`ctx_dist`	`float`	Euclidean distance from response to context embedding.
`method`	`str`	Always `"sgi"`.
`explanation`	`str`	Human-readable interpretation of the score.

Functions¶

`__post_init__() -> None` ¶

Generate explanation from score if not provided.

Source code in src/groundlens/score.py

def __post_init__(self) -> None:
    """Generate explanation from score if not provided."""
    if not self.explanation:
        if self.value >= SGI_STRONG_PASS:
            expl = f"SGI={self.value:.3f} — strong context engagement (pass)"
        elif self.value >= SGI_REVIEW:
            expl = f"SGI={self.value:.3f} — partial engagement (review recommended)"
        else:
            expl = f"SGI={self.value:.3f} — weak context engagement (flagged)"
        object.__setattr__(self, "explanation", expl)

DGIResult¶

`groundlens.score.DGIResult(value: float, normalized: float, flagged: bool, method: str = 'dgi', explanation: str = '')` `dataclass` ¶

Result of Directional Grounding Index computation.

DGI measures whether the question-to-response displacement vector aligns with the mean displacement of verified grounded pairs. Higher values indicate alignment with grounded patterns.

Attributes:

Name	Type	Description
`value`	`float`	Raw DGI score = cosine similarity to reference direction. Range: [-1, 1].
`normalized`	`float`	Score mapped to [0, 1] via linear normalization.
`flagged`	`bool`	`True` if the score is below the pass threshold.
`method`	`str`	Always `"dgi"`.
`explanation`	`str`	Human-readable interpretation of the score.

Functions¶

`__post_init__() -> None` ¶

Generate explanation from score if not provided.

Source code in src/groundlens/score.py

def __post_init__(self) -> None:
    """Generate explanation from score if not provided."""
    if not self.explanation:
        if self.value >= DGI_PASS:
            expl = f"DGI={self.value:.3f} — aligns with grounded patterns (pass)"
        elif self.value >= 0.0:
            expl = f"DGI={self.value:.3f} — weak alignment (flagged)"
        else:
            expl = f"DGI={self.value:.3f} — opposes grounded direction (high risk)"
        object.__setattr__(self, "explanation", expl)

GroundlensScore¶

`groundlens.score.GroundlensScore(value: float, normalized: float, flagged: bool, method: str, explanation: str, detail: SGIResult | DGIResult)` `dataclass` ¶

Unified score container returned by high-level evaluate() calls.

Wraps either an SGIResult or DGIResult with additional metadata.

Attributes:

Name	Type	Description
`value`	`float`	Raw score from the underlying method.
`normalized`	`float`	Score in [0, 1].
`flagged`	`bool`	Whether human review is recommended.
`method`	`str`	`"sgi"` or `"dgi"`.
`explanation`	`str`	Human-readable interpretation.
`detail`	`SGIResult \| DGIResult`	The full SGIResult or DGIResult for method-specific fields.

CalibrationResult¶

`groundlens.calibrate.CalibrationResult(model: str, n_pairs: int, embedding_dim: int, mu_hat: NDArray[np.float32], concentration: float, metadata: dict[str, str] = dict())` `dataclass` ¶

Result of DGI calibration.

Attributes:

Name	Type	Description
`model`	`str`	Sentence transformer model used for calibration.
`n_pairs`	`int`	Number of (question, response) pairs used.
`embedding_dim`	`int`	Dimensionality of the embedding space.
`mu_hat`	`NDArray[float32]`	The computed reference direction vector.
`concentration`	`float`	Estimated concentration parameter (kappa) of the von Mises-Fisher distribution. Higher values indicate more consistent displacement directions in the reference data.

Functions¶

`save(path: str | Path) -> None` ¶

Save calibration result to JSON.

Parameters:

Name	Type	Description	Default
`path`	`str \| Path`	Output file path. The mu_hat vector is stored as a list.	required

Source code in src/groundlens/calibrate.py

def save(self, path: str | Path) -> None:
    """Save calibration result to JSON.

    Args:
        path: Output file path. The mu_hat vector is stored as a list.
    """
    data = {
        "model": self.model,
        "n_pairs": self.n_pairs,
        "embedding_dim": self.embedding_dim,
        "mu_hat": self.mu_hat.tolist(),
        "concentration": self.concentration,
        "metadata": self.metadata,
    }
    Path(path).write_text(json.dumps(data, indent=2), encoding="utf-8")
    logger.info("Calibration saved to %s.", path)

`load(path: str | Path) -> CalibrationResult` `classmethod` ¶

Load a saved calibration result.

Parameters:

Name	Type	Description	Default
`path`	`str \| Path`	Path to JSON calibration file.	required

Returns:

Type	Description
`CalibrationResult`	CalibrationResult instance with restored mu_hat vector.

Source code in src/groundlens/calibrate.py

@classmethod
def load(cls, path: str | Path) -> CalibrationResult:
    """Load a saved calibration result.

    Args:
        path: Path to JSON calibration file.

    Returns:
        CalibrationResult instance with restored mu_hat vector.
    """
    data = json.loads(Path(path).read_text(encoding="utf-8"))
    return cls(
        model=data["model"],
        n_pairs=data["n_pairs"],
        embedding_dim=data["embedding_dim"],
        mu_hat=np.array(data["mu_hat"], dtype=np.float32),
        concentration=data["concentration"],
        metadata=data.get("metadata", {}),
    )

Providers¶

GroundlensOpenAI¶

`groundlens.providers.openai.GroundlensOpenAI(api_key: str, model: str = 'gpt-4o', groundlens_model: str = 'all-MiniLM-L6-v2', groundlens_threshold: float = 0.45)` ¶

OpenAI LLM provider with built-in groundlens scoring.

Wraps the OpenAI chat completions API and automatically evaluates each response for hallucination risk.

Parameters:

Name	Type	Description	Default
`api_key`	`str`	OpenAI API key.	required
`model`	`str`	Chat model to use for generation. Defaults to `"gpt-4o"`.	`'gpt-4o'`
`groundlens_model`	`str`	Sentence-transformer model for groundlens scoring. Defaults to `"all-MiniLM-L6-v2"`.	`'all-MiniLM-L6-v2'`
`groundlens_threshold`	`float`	Score threshold override (reserved for future use). Defaults to `0.45`.	`0.45`

Example

llm = GroundlensOpenAI(api_key="sk-...") resp = llm.chat("Summarize this document.", context="The document text.") print(resp.groundlens_score.explanation)

Source code in src/groundlens/providers/openai.py

def __init__(
    self,
    api_key: str,
    model: str = "gpt-4o",
    groundlens_model: str = "all-MiniLM-L6-v2",
    groundlens_threshold: float = 0.45,
) -> None:
    self._client = _get_openai_client(api_key)
    self._model = model
    self._groundlens_model = groundlens_model
    self._groundlens_threshold = groundlens_threshold

Functions¶

`chat(prompt: str, context: str | None = None, **kwargs: Any) -> LLMResponse` ¶

Send a chat completion request and score the response.

Parameters:

Name	Type	Description	Default
`prompt`	`str`	The user message content.	required
`context`	`str \| None`	Optional source document. When provided, SGI scoring is used; otherwise DGI scoring is applied.	`None`
`**kwargs`	`Any`	Additional keyword arguments forwarded to the OpenAI `chat.completions.create` call.	`{}`

Returns:

Type	Description
`LLMResponse`	LLMResponse containing the generated text, model identifier,
`LLMResponse`	usage metadata, and a groundlens hallucination score.

Raises:

Type	Description
`OpenAIError`	If the API call fails.

Example

llm = GroundlensOpenAI(api_key="sk-...") resp = llm.chat("What causes tides?") resp.text 'Tides are primarily caused by...'

Source code in src/groundlens/providers/openai.py

def chat(
    self,
    prompt: str,
    context: str | None = None,
    **kwargs: Any,
) -> LLMResponse:
    """Send a chat completion request and score the response.

    Args:
        prompt: The user message content.
        context: Optional source document. When provided, SGI scoring
            is used; otherwise DGI scoring is applied.
        **kwargs: Additional keyword arguments forwarded to the
            OpenAI ``chat.completions.create`` call.

    Returns:
        LLMResponse containing the generated text, model identifier,
        usage metadata, and a groundlens hallucination score.

    Raises:
        openai.OpenAIError: If the API call fails.

    Example:
        >>> llm = GroundlensOpenAI(api_key="sk-...")
        >>> resp = llm.chat("What causes tides?")
        >>> resp.text
        'Tides are primarily caused by...'
    """
    messages: list[dict[str, str]] = [{"role": "user", "content": prompt}]

    logger.debug("Calling OpenAI model=%s prompt_len=%d", self._model, len(prompt))

    completion = self._client.chat.completions.create(
        model=self._model,
        messages=messages,
        **kwargs,
    )

    choice = completion.choices[0]
    text = choice.message.content or ""

    usage: dict[str, Any] = {}
    if completion.usage is not None:
        usage = {
            "prompt_tokens": completion.usage.prompt_tokens,
            "completion_tokens": completion.usage.completion_tokens,
            "total_tokens": completion.usage.total_tokens,
        }

    score = evaluate(
        question=prompt,
        response=text,
        context=context,
        model=self._groundlens_model,
    )

    logger.info(
        "OpenAI response scored: method=%s value=%.3f flagged=%s",
        score.method,
        score.value,
        score.flagged,
    )

    return LLMResponse(
        text=text,
        model=self._model,
        usage=usage,
        groundlens_score=score,
    )

`complete(prompt: str, context: str | None = None) -> LLMResponse` ¶

Generate a completion for the given prompt.

Convenience method that delegates to :meth:chat.

Parameters:

Name	Type	Description	Default
`prompt`	`str`	The user prompt or instruction.	required
`context`	`str \| None`	Optional source document for grounded evaluation.	`None`

Returns:

Type	Description
`LLMResponse`	LLMResponse with generated text and groundlens score.

Source code in src/groundlens/providers/openai.py

def complete(
    self,
    prompt: str,
    context: str | None = None,
) -> LLMResponse:
    """Generate a completion for the given prompt.

    Convenience method that delegates to :meth:`chat`.

    Args:
        prompt: The user prompt or instruction.
        context: Optional source document for grounded evaluation.

    Returns:
        LLMResponse with generated text and groundlens score.
    """
    return self.chat(prompt, context=context)

GroundlensAnthropic¶

`groundlens.providers.anthropic.GroundlensAnthropic(api_key: str, model: str = 'claude-sonnet-4-20250514', groundlens_model: str = 'all-MiniLM-L6-v2', groundlens_threshold: float = 0.45)` ¶

Anthropic Claude provider with built-in groundlens scoring.

Wraps the Anthropic messages API and automatically evaluates each response for hallucination risk.

Parameters:

Name	Type	Description	Default
`api_key`	`str`	Anthropic API key.	required
`model`	`str`	Claude model to use for generation. Defaults to `"claude-sonnet-4-20250514"`.	`'claude-sonnet-4-20250514'`
`groundlens_model`	`str`	Sentence-transformer model for groundlens scoring. Defaults to `"all-MiniLM-L6-v2"`.	`'all-MiniLM-L6-v2'`
`groundlens_threshold`	`float`	Score threshold override (reserved for future use). Defaults to `0.45`.	`0.45`

Example

llm = GroundlensAnthropic(api_key="sk-ant-...") resp = llm.chat("Summarize this.", context="Source text here.") print(resp.groundlens_score.explanation)

Source code in src/groundlens/providers/anthropic.py

def __init__(
    self,
    api_key: str,
    model: str = "claude-sonnet-4-20250514",
    groundlens_model: str = "all-MiniLM-L6-v2",
    groundlens_threshold: float = 0.45,
) -> None:
    self._client = _get_anthropic_client(api_key)
    self._model = model
    self._groundlens_model = groundlens_model
    self._groundlens_threshold = groundlens_threshold

Functions¶

`chat(prompt: str, context: str | None = None, **kwargs: Any) -> LLMResponse` ¶

Send a message to Claude and score the response.

Parameters:

Name	Type	Description	Default
`prompt`	`str`	The user message content.	required
`context`	`str \| None`	Optional source document. When provided, SGI scoring is used; otherwise DGI scoring is applied.	`None`
`**kwargs`	`Any`	Additional keyword arguments forwarded to the Anthropic `messages.create` call.	`{}`

Returns:

Type	Description
`LLMResponse`	LLMResponse containing the generated text, model identifier,
`LLMResponse`	usage metadata, and a groundlens hallucination score.

Raises:

Type	Description
`APIError`	If the API call fails.

Example

llm = GroundlensAnthropic(api_key="sk-ant-...") resp = llm.chat("Explain photosynthesis.") resp.text 'Photosynthesis is the process by which...'

Source code in src/groundlens/providers/anthropic.py

def chat(
    self,
    prompt: str,
    context: str | None = None,
    **kwargs: Any,
) -> LLMResponse:
    """Send a message to Claude and score the response.

    Args:
        prompt: The user message content.
        context: Optional source document. When provided, SGI scoring
            is used; otherwise DGI scoring is applied.
        **kwargs: Additional keyword arguments forwarded to the
            Anthropic ``messages.create`` call.

    Returns:
        LLMResponse containing the generated text, model identifier,
        usage metadata, and a groundlens hallucination score.

    Raises:
        anthropic.APIError: If the API call fails.

    Example:
        >>> llm = GroundlensAnthropic(api_key="sk-ant-...")
        >>> resp = llm.chat("Explain photosynthesis.")
        >>> resp.text
        'Photosynthesis is the process by which...'
    """
    messages: list[dict[str, str]] = [{"role": "user", "content": prompt}]

    logger.debug("Calling Anthropic model=%s prompt_len=%d", self._model, len(prompt))

    max_tokens = kwargs.pop("max_tokens", 4096)

    message = self._client.messages.create(
        model=self._model,
        max_tokens=max_tokens,
        messages=messages,
        **kwargs,
    )

    text = ""
    for block in message.content:
        if hasattr(block, "text"):
            text += block.text

    usage: dict[str, Any] = {
        "input_tokens": message.usage.input_tokens,
        "output_tokens": message.usage.output_tokens,
    }

    score = evaluate(
        question=prompt,
        response=text,
        context=context,
        model=self._groundlens_model,
    )

    logger.info(
        "Anthropic response scored: method=%s value=%.3f flagged=%s",
        score.method,
        score.value,
        score.flagged,
    )

    return LLMResponse(
        text=text,
        model=self._model,
        usage=usage,
        groundlens_score=score,
    )

`complete(prompt: str, context: str | None = None) -> LLMResponse` ¶

Generate a completion for the given prompt.

Convenience method that delegates to :meth:chat.

Parameters:

Name	Type	Description	Default
`prompt`	`str`	The user prompt or instruction.	required
`context`	`str \| None`	Optional source document for grounded evaluation.	`None`

Returns:

Type	Description
`LLMResponse`	LLMResponse with generated text and groundlens score.

Source code in src/groundlens/providers/anthropic.py

def complete(
    self,
    prompt: str,
    context: str | None = None,
) -> LLMResponse:
    """Generate a completion for the given prompt.

    Convenience method that delegates to :meth:`chat`.

    Args:
        prompt: The user prompt or instruction.
        context: Optional source document for grounded evaluation.

    Returns:
        LLMResponse with generated text and groundlens score.
    """
    return self.chat(prompt, context=context)

GroundlensGemini¶

`groundlens.providers.google.GroundlensGemini(api_key: str, model: str = 'gemini-2.0-flash', groundlens_model: str = 'all-MiniLM-L6-v2', groundlens_threshold: float = 0.45)` ¶

Google Gemini provider with built-in groundlens scoring.

Wraps the Google Generative AI SDK and automatically evaluates each response for hallucination risk.

Parameters:

Name	Type	Description	Default
`api_key`	`str`	Google AI API key.	required
`model`	`str`	Gemini model to use for generation. Defaults to `"gemini-2.0-flash"`.	`'gemini-2.0-flash'`
`groundlens_model`	`str`	Sentence-transformer model for groundlens scoring. Defaults to `"all-MiniLM-L6-v2"`.	`'all-MiniLM-L6-v2'`
`groundlens_threshold`	`float`	Score threshold override (reserved for future use). Defaults to `0.45`.	`0.45`

Example

llm = GroundlensGemini(api_key="AI...") resp = llm.chat("Summarize this.", context="Source text here.") print(resp.groundlens_score.explanation)

Source code in src/groundlens/providers/google.py

def __init__(
    self,
    api_key: str,
    model: str = "gemini-2.0-flash",
    groundlens_model: str = "all-MiniLM-L6-v2",
    groundlens_threshold: float = 0.45,
) -> None:
    self._genai = _configure_genai(api_key)
    self._model_name = model
    self._generative_model = self._genai.GenerativeModel(model)
    self._groundlens_model = groundlens_model
    self._groundlens_threshold = groundlens_threshold

Functions¶

`chat(prompt: str, context: str | None = None, **kwargs: Any) -> LLMResponse` ¶

Send a prompt to Gemini and score the response.

Parameters:

Name	Type	Description	Default
`prompt`	`str`	The user message content.	required
`context`	`str \| None`	Optional source document. When provided, SGI scoring is used; otherwise DGI scoring is applied.	`None`
`**kwargs`	`Any`	Additional keyword arguments forwarded to the Gemini `generate_content` call.	`{}`

Returns:

Type	Description
`LLMResponse`	LLMResponse containing the generated text, model identifier,
`LLMResponse`	usage metadata, and a groundlens hallucination score.

Raises:

Type	Description
`GoogleAPIError`	If the API call fails.

Example

llm = GroundlensGemini(api_key="AI...") resp = llm.chat("Explain gravity.") resp.text 'Gravity is a fundamental force...'

Source code in src/groundlens/providers/google.py

def chat(
    self,
    prompt: str,
    context: str | None = None,
    **kwargs: Any,
) -> LLMResponse:
    """Send a prompt to Gemini and score the response.

    Args:
        prompt: The user message content.
        context: Optional source document. When provided, SGI scoring
            is used; otherwise DGI scoring is applied.
        **kwargs: Additional keyword arguments forwarded to the
            Gemini ``generate_content`` call.

    Returns:
        LLMResponse containing the generated text, model identifier,
        usage metadata, and a groundlens hallucination score.

    Raises:
        google.api_core.exceptions.GoogleAPIError: If the API call fails.

    Example:
        >>> llm = GroundlensGemini(api_key="AI...")
        >>> resp = llm.chat("Explain gravity.")
        >>> resp.text
        'Gravity is a fundamental force...'
    """
    logger.debug("Calling Gemini model=%s prompt_len=%d", self._model_name, len(prompt))

    response = self._generative_model.generate_content(prompt, **kwargs)

    text = response.text or ""

    usage: dict[str, Any] = {}
    if hasattr(response, "usage_metadata") and response.usage_metadata is not None:
        usage = {
            "prompt_token_count": response.usage_metadata.prompt_token_count,
            "candidates_token_count": response.usage_metadata.candidates_token_count,
            "total_token_count": response.usage_metadata.total_token_count,
        }

    score = evaluate(
        question=prompt,
        response=text,
        context=context,
        model=self._groundlens_model,
    )

    logger.info(
        "Gemini response scored: method=%s value=%.3f flagged=%s",
        score.method,
        score.value,
        score.flagged,
    )

    return LLMResponse(
        text=text,
        model=self._model_name,
        usage=usage,
        groundlens_score=score,
    )

`complete(prompt: str, context: str | None = None) -> LLMResponse` ¶

Generate a completion for the given prompt.

Convenience method that delegates to :meth:chat.

Parameters:

Name	Type	Description	Default
`prompt`	`str`	The user prompt or instruction.	required
`context`	`str \| None`	Optional source document for grounded evaluation.	`None`

Returns:

Type	Description
`LLMResponse`	LLMResponse with generated text and groundlens score.

Source code in src/groundlens/providers/google.py

def complete(
    self,
    prompt: str,
    context: str | None = None,
) -> LLMResponse:
    """Generate a completion for the given prompt.

    Convenience method that delegates to :meth:`chat`.

    Args:
        prompt: The user prompt or instruction.
        context: Optional source document for grounded evaluation.

    Returns:
        LLMResponse with generated text and groundlens score.
    """
    return self.chat(prompt, context=context)

Integrations¶

GroundlensEvaluator (LangChain)¶

`groundlens.integrations.langchain.evaluator.GroundlensEvaluator(groundlens_model: str = 'all-MiniLM-L6-v2', input_key: str = 'question', output_key: str = 'output', context_key: str = 'context')` ¶

LangSmith run evaluator that scores outputs with groundlens.

Extracts input, output, and optional context from LangSmith runs and examples, then computes SGI (when context is available) or DGI (context-free) scores.

Parameters:

Name	Type	Description	Default
`groundlens_model`	`str`	Sentence-transformer model for groundlens scoring. Defaults to `"all-MiniLM-L6-v2"`.	`'all-MiniLM-L6-v2'`
`input_key`	`str`	Key to extract the question from run inputs. Defaults to `"question"`.	`'question'`
`output_key`	`str`	Key to extract the response from run outputs. Defaults to `"output"`.	`'output'`
`context_key`	`str`	Key to extract context from example inputs. Defaults to `"context"`.	`'context'`

Example

evaluator = GroundlensEvaluator()

Typically used with LangSmith evaluate():¶

from langsmith import evaluate¶

evaluate(chain, data="dataset", evaluators=[evaluator])¶

Source code in src/groundlens/integrations/langchain/evaluator.py

def __init__(
    self,
    groundlens_model: str = "all-MiniLM-L6-v2",
    input_key: str = "question",
    output_key: str = "output",
    context_key: str = "context",
) -> None:
    self._groundlens_model = groundlens_model
    self._input_key = input_key
    self._output_key = output_key
    self._context_key = context_key

Functions¶

`evaluate_run(run: Run, example: Example | None = None) -> Any` ¶

Evaluate a LangSmith run for hallucination risk.

Extracts the question from run inputs, the response from run outputs, and optionally context from the example inputs. Returns a LangSmith EvaluationResult with the groundlens score.

Parameters:

Name	Type	Description	Default
`run`	`Run`	The LangSmith run to evaluate. Must have `inputs` and `outputs` dicts.	required
`example`	`Example \| None`	Optional LangSmith example providing ground truth or context for SGI evaluation.	`None`

Returns:

Type	Description
`Any`	An `EvaluationResult` with key `"groundlens"`, the normalized
`Any`	score, and a comment containing the explanation.

Example

evaluator = GroundlensEvaluator() result = evaluator.evaluate_run(run, example) result.key 'groundlens'

Source code in src/groundlens/integrations/langchain/evaluator.py

def evaluate_run(
    self,
    run: Run,
    example: Example | None = None,
) -> Any:
    """Evaluate a LangSmith run for hallucination risk.

    Extracts the question from run inputs, the response from run
    outputs, and optionally context from the example inputs. Returns
    a LangSmith ``EvaluationResult`` with the groundlens score.

    Args:
        run: The LangSmith run to evaluate. Must have ``inputs``
            and ``outputs`` dicts.
        example: Optional LangSmith example providing ground truth
            or context for SGI evaluation.

    Returns:
        An ``EvaluationResult`` with key ``"groundlens"``, the normalized
        score, and a comment containing the explanation.

    Example:
        >>> evaluator = GroundlensEvaluator()
        >>> result = evaluator.evaluate_run(run, example)
        >>> result.key
        'groundlens'
    """
    (evaluation_result_cls,) = _import_langsmith_types()

    inputs = run.inputs or {}
    outputs = run.outputs or {}

    question = inputs.get(self._input_key, "")
    response = outputs.get(self._output_key, "")

    if not question:
        for key in ("input", "query", "prompt"):
            question = inputs.get(key, "")
            if question:
                break

    if not response:
        for key in ("answer", "result", "text", "response"):
            response = outputs.get(key, "")
            if response:
                break

    context: str | None = None
    if example is not None and example.inputs:
        context = example.inputs.get(self._context_key)

    if not question or not response:
        logger.warning(
            "GroundlensEvaluator: missing question or response for run %s",
            run.id,
        )
        return evaluation_result_cls(
            key="groundlens",
            score=None,
            comment="Missing question or response — could not evaluate.",
        )

    score: GroundlensScore = evaluate(
        question=str(question),
        response=str(response),
        context=str(context) if context else None,
        model=self._groundlens_model,
    )

    logger.info(
        "GroundlensEvaluator run=%s method=%s value=%.3f flagged=%s",
        run.id,
        score.method,
        score.value,
        score.flagged,
    )

    return evaluation_result_cls(
        key="groundlens",
        score=score.normalized,
        comment=score.explanation,
    )

GroundlensCallback (LangChain)¶

`groundlens.integrations.langchain.callback.GroundlensCallback(groundlens_model: str = 'all-MiniLM-L6-v2', context_key: str = 'context')` ¶

LangChain callback handler that scores every LLM response with groundlens.

Stores prompts on on_llm_start and evaluates responses on on_llm_end. Flagged results are logged as warnings. Scores are accumulated in :attr:scores for later inspection.

Parameters:

Name	Type	Description	Default
`groundlens_model`	`str`	Sentence-transformer model for groundlens scoring. Defaults to `"all-MiniLM-L6-v2"`.	`'all-MiniLM-L6-v2'`
`context_key`	`str`	Metadata key to look for context in `kwargs`. Defaults to `"context"`.	`'context'`

Example

cb = GroundlensCallback()

Use as a LangChain callback¶

from langchain_openai import ChatOpenAI llm = ChatOpenAI(callbacks=[cb]) result = llm.invoke("Summarize the document.")

Inspect scores after execution¶

for run_id, score in cb.scores.items(): ... print(f"{run_id}: {score.explanation}")

Source code in src/groundlens/integrations/langchain/callback.py

def __init__(
    self,
    groundlens_model: str = "all-MiniLM-L6-v2",
    context_key: str = "context",
) -> None:
    self._groundlens_model = groundlens_model
    self._context_key = context_key
    self._prompts: dict[UUID, list[str]] = {}
    self._contexts: dict[UUID, str | None] = {}
    self.scores: dict[UUID, GroundlensScore] = {}

Functions¶

`on_llm_start(serialized: dict[str, Any], prompts: list[str], *, run_id: UUID, **kwargs: Any) -> None` ¶

Store prompts when an LLM call begins.

Parameters:

Name	Type	Description	Default
`serialized`	`dict[str, Any]`	Serialized LLM configuration.	required
`prompts`	`list[str]`	List of prompt strings sent to the LLM.	required
`run_id`	`UUID`	Unique identifier for this LLM run.	required
`**kwargs`	`Any`	Additional keyword arguments from LangChain.	`{}`

Source code in src/groundlens/integrations/langchain/callback.py

def on_llm_start(
    self,
    serialized: dict[str, Any],
    prompts: list[str],
    *,
    run_id: UUID,
    **kwargs: Any,
) -> None:
    """Store prompts when an LLM call begins.

    Args:
        serialized: Serialized LLM configuration.
        prompts: List of prompt strings sent to the LLM.
        run_id: Unique identifier for this LLM run.
        **kwargs: Additional keyword arguments from LangChain.
    """
    self._prompts[run_id] = prompts
    metadata = kwargs.get("metadata") or {}
    self._contexts[run_id] = metadata.get(self._context_key)
    logger.debug("on_llm_start run_id=%s prompts=%d", run_id, len(prompts))

`on_llm_end(response: LLMResult, *, run_id: UUID, **kwargs: Any) -> None` ¶

Evaluate the LLM response for hallucination risk.

Parameters:

Name	Type	Description	Default
`response`	`LLMResult`	The LLM result containing generated text.	required
`run_id`	`UUID`	Unique identifier for this LLM run.	required
`**kwargs`	`Any`	Additional keyword arguments from LangChain.	`{}`

Source code in src/groundlens/integrations/langchain/callback.py

def on_llm_end(
    self,
    response: LLMResult,
    *,
    run_id: UUID,
    **kwargs: Any,
) -> None:
    """Evaluate the LLM response for hallucination risk.

    Args:
        response: The LLM result containing generated text.
        run_id: Unique identifier for this LLM run.
        **kwargs: Additional keyword arguments from LangChain.
    """
    prompts = self._prompts.pop(run_id, [])
    context = self._contexts.pop(run_id, None)

    if not prompts or not response.generations:
        logger.debug("on_llm_end run_id=%s — no prompts or generations", run_id)
        return

    prompt = prompts[0]
    generation = response.generations[0]
    if not generation:
        return

    text = generation[0].text

    score = evaluate(
        question=prompt,
        response=text,
        context=context,
        model=self._groundlens_model,
    )

    self.scores[run_id] = score

    if score.flagged:
        logger.warning(
            "Groundlens FLAGGED run_id=%s method=%s value=%.3f — %s",
            run_id,
            score.method,
            score.value,
            score.explanation,
        )
    else:
        logger.info(
            "Groundlens OK run_id=%s method=%s value=%.3f",
            run_id,
            score.method,
            score.value,
        )

`on_llm_error(error: BaseException, *, run_id: UUID, **kwargs: Any) -> None` ¶

Clean up state when an LLM call fails.

Parameters:

Name	Type	Description	Default
`error`	`BaseException`	The exception that caused the LLM call to fail.	required
`run_id`	`UUID`	Unique identifier for this LLM run.	required
`**kwargs`	`Any`	Additional keyword arguments from LangChain.	`{}`

Source code in src/groundlens/integrations/langchain/callback.py

def on_llm_error(
    self,
    error: BaseException,
    *,
    run_id: UUID,
    **kwargs: Any,
) -> None:
    """Clean up state when an LLM call fails.

    Args:
        error: The exception that caused the LLM call to fail.
        run_id: Unique identifier for this LLM run.
        **kwargs: Additional keyword arguments from LangChain.
    """
    self._prompts.pop(run_id, None)
    self._contexts.pop(run_id, None)
    logger.error("on_llm_error run_id=%s error=%s", run_id, error)

GroundlensTool (CrewAI)¶

`groundlens.integrations.crewai.tool.GroundlensTool(name: str = 'groundlens_verify', description: str | None = None, groundlens_model: str = 'all-MiniLM-L6-v2')` ¶

CrewAI tool for verifying LLM outputs using groundlens.

Extends the CrewAI tool pattern to let agents self-verify their outputs. The tool evaluates a question-response pair (with optional context) and returns a human-readable verification summary.

Parameters:

Name	Type	Description	Default
`name`	`str`	Tool name visible to the agent. Defaults to `"groundlens_verify"`.	`'groundlens_verify'`
`description`	`str \| None`	Tool description for agent tool selection.	`None`
`groundlens_model`	`str`	Sentence-transformer model for groundlens scoring. Defaults to `"all-MiniLM-L6-v2"`.	`'all-MiniLM-L6-v2'`

Example

from groundlens.integrations.crewai import GroundlensTool tool = GroundlensTool()

Agent uses the tool to verify its own output¶

result = tool._run( ... question="What causes rain?", ... response="Rain is caused by condensation.", ... context="Water cycle: evaporation, condensation, precipitation.", ... ) "PASS" in result or "FLAGGED" in result True

Source code in src/groundlens/integrations/crewai/tool.py

def __init__(
    self,
    name: str = "groundlens_verify",
    description: str | None = None,
    groundlens_model: str = "all-MiniLM-L6-v2",
) -> None:
    self.name = name
    if description is not None:
        self.description = description
    self._groundlens_model = groundlens_model

GroundlensFilter (Semantic Kernel)¶

`groundlens.integrations.semantic_kernel.filter.GroundlensFilter(groundlens_model: str = 'all-MiniLM-L6-v2', input_key: str = 'input', context_key: str = 'context')` ¶

Semantic Kernel function invocation filter with groundlens scoring.

Intercepts function invocation results and evaluates them for hallucination risk. Scores are attached to the invocation context metadata under the "groundlens_score" key and stored in :attr:scores for later inspection.

Parameters:

Name	Type	Description	Default
`groundlens_model`	`str`	Sentence-transformer model for groundlens scoring. Defaults to `"all-MiniLM-L6-v2"`.	`'all-MiniLM-L6-v2'`
`input_key`	`str`	Key to extract the question from function arguments. Defaults to `"input"`.	`'input'`
`context_key`	`str`	Key to extract context from function arguments. Defaults to `"context"`.	`'context'`

Example

filt = GroundlensFilter()

Register with a Semantic Kernel instance¶

kernel.add_filter("function_invocation", filt)

After invocation, inspect scores:¶

for fn_name, score in filt.scores: ... print(f"{fn_name}: {score.explanation}")

Source code in src/groundlens/integrations/semantic_kernel/filter.py

def __init__(
    self,
    groundlens_model: str = "all-MiniLM-L6-v2",
    input_key: str = "input",
    context_key: str = "context",
) -> None:
    self._groundlens_model = groundlens_model
    self._input_key = input_key
    self._context_key = context_key
    self.scores: list[tuple[str, GroundlensScore]] = []

Functions¶

`on_function_invocation(context: Any, next_handler: Callable[..., Awaitable[None]]) -> None` `async` ¶

Intercept a function invocation and evaluate the result.

Calls the next filter/function in the pipeline, then evaluates the result with groundlens. Attaches the score to the context metadata.

Parameters:

Name	Type	Description	Default
`context`	`Any`	The Semantic Kernel `FunctionInvocationContext` containing function arguments and result.	required
`next_handler`	`Callable[..., Awaitable[None]]`	The next handler in the filter pipeline.	required

Example

This method is called automatically by Semantic Kernel¶

when registered as a function invocation filter.¶

Source code in src/groundlens/integrations/semantic_kernel/filter.py

async def on_function_invocation(
    self,
    context: Any,
    next_handler: Callable[..., Awaitable[None]],
) -> None:
    """Intercept a function invocation and evaluate the result.

    Calls the next filter/function in the pipeline, then evaluates
    the result with groundlens. Attaches the score to the context
    metadata.

    Args:
        context: The Semantic Kernel ``FunctionInvocationContext``
            containing function arguments and result.
        next_handler: The next handler in the filter pipeline.

    Example:
        >>> # This method is called automatically by Semantic Kernel
        >>> # when registered as a function invocation filter.
    """
    await next_handler(context)

    function_name = getattr(context, "function_name", "unknown")
    arguments = getattr(context, "arguments", {}) or {}
    result = getattr(context, "result", None)

    if result is None:
        logger.debug("GroundlensFilter: no result for function %s", function_name)
        return

    result_value = getattr(result, "value", None)
    result_value = str(result) if result_value is None else str(result_value)

    question = str(arguments.get(self._input_key, ""))
    context_text: str | None = arguments.get(self._context_key)
    if context_text is not None:
        context_text = str(context_text)

    if not question:
        logger.debug(
            "GroundlensFilter: no input found for function %s, skipping",
            function_name,
        )
        return

    score: GroundlensScore = evaluate(
        question=question,
        response=result_value,
        context=context_text,
        model=self._groundlens_model,
    )

    self.scores.append((function_name, score))

    metadata = getattr(context, "metadata", None)
    if metadata is not None and isinstance(metadata, dict):
        metadata["groundlens_score"] = score

    if score.flagged:
        logger.warning(
            "GroundlensFilter FLAGGED function=%s method=%s value=%.3f — %s",
            function_name,
            score.method,
            score.value,
            score.explanation,
        )
    else:
        logger.info(
            "GroundlensFilter OK function=%s method=%s value=%.3f",
            function_name,
            score.method,
            score.value,
        )

GroundlensChecker (AutoGen)¶

`groundlens.integrations.autogen.checker.GroundlensChecker(groundlens_model: str = 'all-MiniLM-L6-v2', context_key: str = 'context')` ¶

AutoGen reply checker that evaluates messages with groundlens.

Designed to be used as a reply validation step in AutoGen agent conversations. Evaluates the last assistant message against the preceding user message for hallucination risk.

Parameters:

Name	Type	Description	Default
`groundlens_model`	`str`	Sentence-transformer model for groundlens scoring. Defaults to `"all-MiniLM-L6-v2"`.	`'all-MiniLM-L6-v2'`
`context_key`	`str`	Key to look for context in message metadata. Defaults to `"context"`.	`'context'`

Example

checker = GroundlensChecker() messages = [ ... {"role": "user", "content": "Summarize this document."}, ... {"role": "assistant", "content": "The document discusses..."}, ... ] result = checker.check(messages, sender=None) result["method"] 'dgi' result["flagged"] False

Source code in src/groundlens/integrations/autogen/checker.py

def __init__(
    self,
    groundlens_model: str = "all-MiniLM-L6-v2",
    context_key: str = "context",
) -> None:
    self._groundlens_model = groundlens_model
    self._context_key = context_key

Functions¶

`check(messages: list[dict[str, Any]], sender: Any, **kwargs: Any) -> dict[str, Any]` ¶

Evaluate the last message in the conversation.

Extracts the last assistant message as the response and the most recent preceding user message as the question. If context is found in message metadata, SGI scoring is used; otherwise DGI is applied.

Parameters:

Name	Type	Description	Default
`messages`	`list[dict[str, Any]]`	List of conversation message dicts. Each dict should have `"role"` and `"content"` keys.	required
`sender`	`Any`	The AutoGen agent that sent the last message. Used for logging; can be `None`.	required
`**kwargs`	`Any`	Additional keyword arguments. If a `"context"` key is present, it is used for SGI evaluation.	`{}`

Returns:

Type	Description
`dict[str, Any]`	A dict containing: - `"score"`: The raw groundlens score value. - `"normalized"`: Score mapped to [0, 1]. - `"flagged"`: Whether human review is recommended. - `"method"`: Scoring method used (`"sgi"` or `"dgi"`). - `"explanation"`: Human-readable interpretation.

Example

checker = GroundlensChecker() result = checker.check( ... messages=[ ... {"role": "user", "content": "What is 2+2?"}, ... {"role": "assistant", "content": "2+2 equals 4."}, ... ], ... sender=None, ... ) isinstance(result["score"], float) True

Source code in src/groundlens/integrations/autogen/checker.py

def check(
    self,
    messages: list[dict[str, Any]],
    sender: Any,
    **kwargs: Any,
) -> dict[str, Any]:
    """Evaluate the last message in the conversation.

    Extracts the last assistant message as the response and the
    most recent preceding user message as the question. If context
    is found in message metadata, SGI scoring is used; otherwise
    DGI is applied.

    Args:
        messages: List of conversation message dicts. Each dict should
            have ``"role"`` and ``"content"`` keys.
        sender: The AutoGen agent that sent the last message.
            Used for logging; can be ``None``.
        **kwargs: Additional keyword arguments. If a ``"context"``
            key is present, it is used for SGI evaluation.

    Returns:
        A dict containing:
            - ``"score"``: The raw groundlens score value.
            - ``"normalized"``: Score mapped to [0, 1].
            - ``"flagged"``: Whether human review is recommended.
            - ``"method"``: Scoring method used (``"sgi"`` or ``"dgi"``).
            - ``"explanation"``: Human-readable interpretation.

    Example:
        >>> checker = GroundlensChecker()
        >>> result = checker.check(
        ...     messages=[
        ...         {"role": "user", "content": "What is 2+2?"},
        ...         {"role": "assistant", "content": "2+2 equals 4."},
        ...     ],
        ...     sender=None,
        ... )
        >>> isinstance(result["score"], float)
        True
    """
    if not messages:
        logger.warning("GroundlensChecker.check called with empty messages")
        return {
            "score": None,
            "normalized": None,
            "flagged": None,
            "method": None,
            "explanation": "No messages to evaluate.",
        }

    last_message = messages[-1]
    response = str(last_message.get("content", ""))

    question = ""
    for msg in reversed(messages[:-1]):
        if msg.get("role") == "user":
            question = str(msg.get("content", ""))
            break

    if not question:
        question = response

    context: str | None = kwargs.get(self._context_key)

    if context is None:
        for msg in reversed(messages):
            msg_metadata = msg.get("metadata", {})
            if isinstance(msg_metadata, dict) and self._context_key in msg_metadata:
                context = str(msg_metadata[self._context_key])
                break

    sender_name = getattr(sender, "name", str(sender)) if sender else "unknown"
    logger.debug(
        "GroundlensChecker.check sender=%s messages=%d context=%s",
        sender_name,
        len(messages),
        "provided" if context else "none",
    )

    score: GroundlensScore = evaluate(
        question=question,
        response=response,
        context=context,
        model=self._groundlens_model,
    )

    if score.flagged:
        logger.warning(
            "GroundlensChecker FLAGGED sender=%s method=%s value=%.3f — %s",
            sender_name,
            score.method,
            score.value,
            score.explanation,
        )
    else:
        logger.info(
            "GroundlensChecker OK sender=%s method=%s value=%.3f",
            sender_name,
            score.method,
            score.value,
        )

    return {
        "score": score.value,
        "normalized": score.normalized,
        "flagged": score.flagged,
        "method": score.method,
        "explanation": score.explanation,
    }

Internal Modules¶

Internal API

The following modules are internal implementation details. They are documented here for completeness but are not part of the public API and may change without notice.

Geometry Primitives¶

`groundlens._internal.geometry` ¶

Geometric primitives for embedding space operations.

This module provides the mathematical building blocks used by SGI and DGI. All operations are on vectors in R^n (the embedding space of a sentence transformer), which can be understood geometrically on the unit hypersphere S^(n-1) when vectors are L2-normalized.

Key concepts:

Euclidean distance in R^n is used by SGI to compare how far the response embedding is from the question vs. the context.
Displacement vectors (r - q) capture the semantic "movement" from question to response. DGI projects these onto a reference direction.
Unit normalization maps vectors to S^(n-1). On the unit hypersphere, dot product equals cosine similarity, and Euclidean distance is a monotonic function of angular distance.

References

Marin (2025). Semantic Grounding Index. arXiv:2512.13771. Marin (2026). A Geometric Taxonomy of Hallucinations. arXiv:2602.13224v3.

Functions¶

`euclidean_distance(a: EmbeddingVector, b: EmbeddingVector) -> float` ¶

Compute Euclidean distance between two embedding vectors.

Parameters:

Name	Type	Description	Default
`a`	`EmbeddingVector`	First embedding vector, shape (d,).	required
`b`	`EmbeddingVector`	Second embedding vector, shape (d,).	required

Returns:

Type	Description
`float`	Non-negative scalar distance.

Source code in src/groundlens/_internal/geometry.py

def euclidean_distance(a: EmbeddingVector, b: EmbeddingVector) -> float:
    """Compute Euclidean distance between two embedding vectors.

    Args:
        a: First embedding vector, shape (d,).
        b: Second embedding vector, shape (d,).

    Returns:
        Non-negative scalar distance.
    """
    return float(np.linalg.norm(a - b))

`unit_normalize(v: EmbeddingVector) -> EmbeddingVector` ¶

Project vector onto the unit hypersphere S^(n-1).

Parameters:

Name	Type	Description	Default
`v`	`EmbeddingVector`	Input vector, shape (d,).	required

Returns:

Type	Description
`EmbeddingVector`	Unit vector v / \|\|v\|\|, or the zero vector if \|\|v\|\| < epsilon.

Source code in src/groundlens/_internal/geometry.py

def unit_normalize(v: EmbeddingVector) -> EmbeddingVector:
    """Project vector onto the unit hypersphere S^(n-1).

    Args:
        v: Input vector, shape (d,).

    Returns:
        Unit vector v / ||v||, or the zero vector if ||v|| < epsilon.
    """
    norm = float(np.linalg.norm(v))
    if norm < _EPSILON:
        return v
    return v / norm

`displacement_vector(question_emb: EmbeddingVector, response_emb: EmbeddingVector) -> EmbeddingVector` ¶

Compute the displacement from question to response in embedding space.

The displacement delta = phi(response) - phi(question) captures the semantic transformation applied by the LLM when generating a response. In grounded responses, this displacement aligns with a characteristic reference direction.

Parameters:

Name	Type	Description	Default
`question_emb`	`EmbeddingVector`	Question embedding, shape (d,).	required
`response_emb`	`EmbeddingVector`	Response embedding, shape (d,).	required

Returns:

Type	Description
`EmbeddingVector`	Displacement vector, shape (d,).

Source code in src/groundlens/_internal/geometry.py

def displacement_vector(
    question_emb: EmbeddingVector,
    response_emb: EmbeddingVector,
) -> EmbeddingVector:
    """Compute the displacement from question to response in embedding space.

    The displacement delta = phi(response) - phi(question) captures the
    semantic transformation applied by the LLM when generating a response.
    In grounded responses, this displacement aligns with a characteristic
    reference direction.

    Args:
        question_emb: Question embedding, shape (d,).
        response_emb: Response embedding, shape (d,).

    Returns:
        Displacement vector, shape (d,).
    """
    return response_emb - question_emb

`cosine_similarity(a: EmbeddingVector, b: EmbeddingVector) -> float` ¶

Compute cosine similarity between two vectors.

Parameters:

Name	Type	Description	Default
`a`	`EmbeddingVector`	First vector, shape (d,).	required
`b`	`EmbeddingVector`	Second vector, shape (d,).	required

Returns:

Type	Description
`float`	Cosine similarity in [-1, 1]. Returns 0.0 if either vector
`float`	has near-zero norm.

Source code in src/groundlens/_internal/geometry.py

def cosine_similarity(a: EmbeddingVector, b: EmbeddingVector) -> float:
    """Compute cosine similarity between two vectors.

    Args:
        a: First vector, shape (d,).
        b: Second vector, shape (d,).

    Returns:
        Cosine similarity in [-1, 1]. Returns 0.0 if either vector
        has near-zero norm.
    """
    norm_a = float(np.linalg.norm(a))
    norm_b = float(np.linalg.norm(b))
    if norm_a < _EPSILON or norm_b < _EPSILON:
        return 0.0
    return float(np.dot(a, b) / (norm_a * norm_b))

`mean_direction(vectors: list[EmbeddingVector]) -> EmbeddingVector` ¶

Compute the mean direction of a set of unit vectors.

This is the maximum-likelihood estimate of the mean direction parameter mu of a von Mises-Fisher distribution on S^(n-1).

Parameters:

Name	Type	Description	Default
`vectors`	`list[EmbeddingVector]`	List of unit-normalized vectors, each shape (d,).	required

Returns:

Type	Description
`EmbeddingVector`	Unit-normalized mean direction, shape (d,). Zero vector if
`EmbeddingVector`	the input vectors cancel out.

Raises:

Type	Description
`ValueError`	If the input list is empty.

Source code in src/groundlens/_internal/geometry.py

def mean_direction(vectors: list[EmbeddingVector]) -> EmbeddingVector:
    """Compute the mean direction of a set of unit vectors.

    This is the maximum-likelihood estimate of the mean direction
    parameter mu of a von Mises-Fisher distribution on S^(n-1).

    Args:
        vectors: List of unit-normalized vectors, each shape (d,).

    Returns:
        Unit-normalized mean direction, shape (d,). Zero vector if
        the input vectors cancel out.

    Raises:
        ValueError: If the input list is empty.
    """
    if not vectors:
        msg = "Cannot compute mean direction of empty vector list."
        raise ValueError(msg)

    mu: EmbeddingVector = np.mean(np.stack(vectors), axis=0)
    return unit_normalize(mu)

Thresholds¶

`groundlens._internal.thresholds` ¶

Threshold constants and normalization functions.

All thresholds are derived empirically from the experiments reported in arXiv:2512.13771 (SGI) and arXiv:2602.13224v3 (DGI).

These constants define the decision boundaries for flagging LLM outputs as potential hallucinations. They are intentionally conservative: the default behavior is to flag for human review rather than silently pass.

Attributes¶

`SGI_STRONG_PASS: float = 1.2` `module-attribute` ¶

SGI score indicating strong context engagement. Green zone.

`SGI_REVIEW: float = 0.95` `module-attribute` ¶

SGI score below which output is flagged for human review. Red zone.

`DGI_PASS: float = 0.3` `module-attribute` ¶

DGI score indicating alignment with grounded reference direction. Green zone.

Functions¶

`normalize_sgi(raw_sgi: float) -> float` ¶

Normalize raw SGI score to [0, 1] range.

Uses tanh mapping with offset to produce a smooth sigmoid curve

normalized = tanh(max(0, raw - 0.3))

This maps the raw SGI range (~0.5 to ~2.0) into a [0, 1] range suitable for dashboards and threshold comparison.

Mapping reference points

SGI 0.30 → 0.000 (floor) SGI 0.95 → 0.457 (review threshold) SGI 1.20 → 0.604 (strong pass) SGI 2.00 → 0.885 (very strong)

Parameters:

Name	Type	Description	Default
`raw_sgi`	`float`	The raw SGI ratio (q_dist / ctx_dist).	required

Returns:

Type	Description
`float`	Score in [0.0, 1.0].

Source code in src/groundlens/_internal/thresholds.py

def normalize_sgi(raw_sgi: float) -> float:
    """Normalize raw SGI score to [0, 1] range.

    Uses tanh mapping with offset to produce a smooth sigmoid curve:
        normalized = tanh(max(0, raw - 0.3))

    This maps the raw SGI range (~0.5 to ~2.0) into a [0, 1] range
    suitable for dashboards and threshold comparison.

    Mapping reference points:
        SGI 0.30 → 0.000 (floor)
        SGI 0.95 → 0.457 (review threshold)
        SGI 1.20 → 0.604 (strong pass)
        SGI 2.00 → 0.885 (very strong)

    Args:
        raw_sgi: The raw SGI ratio (q_dist / ctx_dist).

    Returns:
        Score in [0.0, 1.0].
    """
    shifted = max(0.0, raw_sgi - 0.3)
    return min(1.0, max(0.0, math.tanh(shifted)))

`normalize_dgi(raw_dgi: float) -> float` ¶

Normalize raw DGI score from [-1, 1] to [0, 1] range.

Simple linear mapping: normalized = (raw + 1) / 2.

Mapping reference points

DGI -1.0 → 0.000 (opposite to grounded direction) DGI 0.0 → 0.500 (orthogonal) DGI 0.3 → 0.650 (pass threshold) DGI 1.0 → 1.000 (perfectly aligned)

Parameters:

Name	Type	Description	Default
`raw_dgi`	`float`	The raw DGI cosine similarity to reference direction.	required

Returns:

Type	Description
`float`	Score in [0.0, 1.0].

Source code in src/groundlens/_internal/thresholds.py

def normalize_dgi(raw_dgi: float) -> float:
    """Normalize raw DGI score from [-1, 1] to [0, 1] range.

    Simple linear mapping: normalized = (raw + 1) / 2.

    Mapping reference points:
        DGI -1.0 → 0.000 (opposite to grounded direction)
        DGI  0.0 → 0.500 (orthogonal)
        DGI  0.3 → 0.650 (pass threshold)
        DGI  1.0 → 1.000 (perfectly aligned)

    Args:
        raw_dgi: The raw DGI cosine similarity to reference direction.

    Returns:
        Score in [0.0, 1.0].
    """
    return min(1.0, max(0.0, (raw_dgi + 1.0) / 2.0))

Constants¶

Constant	Value	Module	Description
`SGI_STRONG_PASS`	1.20	`groundlens._internal.thresholds`	SGI strong pass threshold
`SGI_REVIEW`	0.95	`groundlens._internal.thresholds`	SGI review/flag threshold
`DGI_PASS`	0.30	`groundlens._internal.thresholds`	DGI pass threshold
`DEFAULT_MODEL`	`"all-MiniLM-L6-v2"`	`groundlens._internal.embeddings`	Default sentence-transformer model

Type Summary¶

Type	Description	Key fields
`SGIResult`	SGI computation result	`value`, `normalized`, `flagged`, `q_dist`, `ctx_dist`
`DGIResult`	DGI computation result	`value`, `normalized`, `flagged`
`GroundlensScore`	Unified evaluation result	`value`, `normalized`, `flagged`, `method`, `explanation`, `detail`
`CalibrationResult`	DGI calibration output	`model`, `n_pairs`, `embedding_dim`, `mu_hat`, `concentration`
`LLMResponse`	Provider response wrapper	`text`, `model`, `usage`, `groundlens_score`

API Reference¶

Core Functions¶

compute_sgi¶

groundlens.sgi.compute_sgi(question: str, context: str, response: str, *, model: str = DEFAULT_MODEL) -> SGIResult ¶

compute_dgi¶

groundlens.dgi.compute_dgi(question: str, response: str, *, model: str = DEFAULT_MODEL, reference_csv: str | None = None) -> DGIResult ¶

evaluate¶

groundlens.evaluate.evaluate(question: str, response: str, context: str | None = None, *, model: str = DEFAULT_MODEL, reference_csv: str | None = None) -> GroundlensScore ¶

With context → SGI¶

Without context → DGI¶

evaluate_batch¶

groundlens.evaluate.evaluate_batch(items: list[dict[str, str]], *, model: str = DEFAULT_MODEL, reference_csv: str | None = None) -> list[GroundlensScore] ¶

calibrate¶

groundlens.calibrate.calibrate(pairs: list[tuple[str, str]] | None = None, csv_path: str | None = None, *, model: str = DEFAULT_MODEL, metadata: dict[str, str] | None = None) -> CalibrationResult ¶

Core Classes¶

SGI¶

groundlens.sgi.SGI(model: str = DEFAULT_MODEL) ¶

Functions¶

score(question: str, context: str, response: str) -> SGIResult ¶

DGI¶

groundlens.dgi.DGI(model: str = DEFAULT_MODEL, reference_csv: str | None = None) ¶

Functions¶

calibrate(pairs: list[tuple[str, str]] | None = None, csv_path: str | None = None) -> None ¶

score(question: str, response: str) -> DGIResult ¶

Result Types¶

SGIResult¶

groundlens.score.SGIResult(value: float, normalized: float, flagged: bool, q_dist: float, ctx_dist: float, method: str = 'sgi', explanation: str = '') dataclass ¶

Functions¶

__post_init__() -> None ¶

DGIResult¶

groundlens.score.DGIResult(value: float, normalized: float, flagged: bool, method: str = 'dgi', explanation: str = '') dataclass ¶

Functions¶

__post_init__() -> None ¶

GroundlensScore¶

groundlens.score.GroundlensScore(value: float, normalized: float, flagged: bool, method: str, explanation: str, detail: SGIResult | DGIResult) dataclass ¶

CalibrationResult¶

groundlens.calibrate.CalibrationResult(model: str, n_pairs: int, embedding_dim: int, mu_hat: NDArray[np.float32], concentration: float, metadata: dict[str, str] = dict()) dataclass ¶

Functions¶

save(path: str | Path) -> None ¶

load(path: str | Path) -> CalibrationResult classmethod ¶

Providers¶

GroundlensOpenAI¶

groundlens.providers.openai.GroundlensOpenAI(api_key: str, model: str = 'gpt-4o', groundlens_model: str = 'all-MiniLM-L6-v2', groundlens_threshold: float = 0.45) ¶

Functions¶

chat(prompt: str, context: str | None = None, **kwargs: Any) -> LLMResponse ¶

complete(prompt: str, context: str | None = None) -> LLMResponse ¶

GroundlensAnthropic¶

groundlens.providers.anthropic.GroundlensAnthropic(api_key: str, model: str = 'claude-sonnet-4-20250514', groundlens_model: str = 'all-MiniLM-L6-v2', groundlens_threshold: float = 0.45) ¶

Functions¶

chat(prompt: str, context: str | None = None, **kwargs: Any) -> LLMResponse ¶

complete(prompt: str, context: str | None = None) -> LLMResponse ¶

GroundlensGemini¶

groundlens.providers.google.GroundlensGemini(api_key: str, model: str = 'gemini-2.0-flash', groundlens_model: str = 'all-MiniLM-L6-v2', groundlens_threshold: float = 0.45) ¶

Functions¶

chat(prompt: str, context: str | None = None, **kwargs: Any) -> LLMResponse ¶

complete(prompt: str, context: str | None = None) -> LLMResponse ¶

Integrations¶

GroundlensEvaluator (LangChain)¶

groundlens.integrations.langchain.evaluator.GroundlensEvaluator(groundlens_model: str = 'all-MiniLM-L6-v2', input_key: str = 'question', output_key: str = 'output', context_key: str = 'context') ¶

Typically used with LangSmith evaluate():¶

from langsmith import evaluate¶

evaluate(chain, data="dataset", evaluators=[evaluator])¶

Functions¶

evaluate_run(run: Run, example: Example | None = None) -> Any ¶

GroundlensCallback (LangChain)¶

groundlens.integrations.langchain.callback.GroundlensCallback(groundlens_model: str = 'all-MiniLM-L6-v2', context_key: str = 'context') ¶

Use as a LangChain callback¶

Inspect scores after execution¶

Functions¶

on_llm_start(serialized: dict[str, Any], prompts: list[str], *, run_id: UUID, **kwargs: Any) -> None ¶

on_llm_end(response: LLMResult, *, run_id: UUID, **kwargs: Any) -> None ¶

on_llm_error(error: BaseException, *, run_id: UUID, **kwargs: Any) -> None ¶

GroundlensTool (CrewAI)¶

groundlens.integrations.crewai.tool.GroundlensTool(name: str = 'groundlens_verify', description: str | None = None, groundlens_model: str = 'all-MiniLM-L6-v2') ¶

Agent uses the tool to verify its own output¶

GroundlensFilter (Semantic Kernel)¶

groundlens.integrations.semantic_kernel.filter.GroundlensFilter(groundlens_model: str = 'all-MiniLM-L6-v2', input_key: str = 'input', context_key: str = 'context') ¶

Register with a Semantic Kernel instance¶

After invocation, inspect scores:¶

Functions¶

`groundlens.sgi.compute_sgi(question: str, context: str, response: str, *, model: str = DEFAULT_MODEL) -> SGIResult` ¶

`groundlens.dgi.compute_dgi(question: str, response: str, *, model: str = DEFAULT_MODEL, reference_csv: str | None = None) -> DGIResult` ¶

`groundlens.evaluate.evaluate(question: str, response: str, context: str | None = None, *, model: str = DEFAULT_MODEL, reference_csv: str | None = None) -> GroundlensScore` ¶

`groundlens.evaluate.evaluate_batch(items: list[dict[str, str]], *, model: str = DEFAULT_MODEL, reference_csv: str | None = None) -> list[GroundlensScore]` ¶

`groundlens.calibrate.calibrate(pairs: list[tuple[str, str]] | None = None, csv_path: str | None = None, *, model: str = DEFAULT_MODEL, metadata: dict[str, str] | None = None) -> CalibrationResult` ¶

`groundlens.sgi.SGI(model: str = DEFAULT_MODEL)` ¶

`score(question: str, context: str, response: str) -> SGIResult` ¶

`groundlens.dgi.DGI(model: str = DEFAULT_MODEL, reference_csv: str | None = None)` ¶

`calibrate(pairs: list[tuple[str, str]] | None = None, csv_path: str | None = None) -> None` ¶

`score(question: str, response: str) -> DGIResult` ¶

`groundlens.score.SGIResult(value: float, normalized: float, flagged: bool, q_dist: float, ctx_dist: float, method: str = 'sgi', explanation: str = '')` `dataclass` ¶

`__post_init__() -> None` ¶

`groundlens.score.DGIResult(value: float, normalized: float, flagged: bool, method: str = 'dgi', explanation: str = '')` `dataclass` ¶

`__post_init__() -> None` ¶

`groundlens.score.GroundlensScore(value: float, normalized: float, flagged: bool, method: str, explanation: str, detail: SGIResult | DGIResult)` `dataclass` ¶

`groundlens.calibrate.CalibrationResult(model: str, n_pairs: int, embedding_dim: int, mu_hat: NDArray[np.float32], concentration: float, metadata: dict[str, str] = dict())` `dataclass` ¶

`save(path: str | Path) -> None` ¶

`load(path: str | Path) -> CalibrationResult` `classmethod` ¶

`groundlens.providers.openai.GroundlensOpenAI(api_key: str, model: str = 'gpt-4o', groundlens_model: str = 'all-MiniLM-L6-v2', groundlens_threshold: float = 0.45)` ¶

`chat(prompt: str, context: str | None = None, **kwargs: Any) -> LLMResponse` ¶

`complete(prompt: str, context: str | None = None) -> LLMResponse` ¶

`groundlens.providers.anthropic.GroundlensAnthropic(api_key: str, model: str = 'claude-sonnet-4-20250514', groundlens_model: str = 'all-MiniLM-L6-v2', groundlens_threshold: float = 0.45)` ¶

`chat(prompt: str, context: str | None = None, **kwargs: Any) -> LLMResponse` ¶

`complete(prompt: str, context: str | None = None) -> LLMResponse` ¶

`groundlens.providers.google.GroundlensGemini(api_key: str, model: str = 'gemini-2.0-flash', groundlens_model: str = 'all-MiniLM-L6-v2', groundlens_threshold: float = 0.45)` ¶

`chat(prompt: str, context: str | None = None, **kwargs: Any) -> LLMResponse` ¶

`complete(prompt: str, context: str | None = None) -> LLMResponse` ¶

`groundlens.integrations.langchain.evaluator.GroundlensEvaluator(groundlens_model: str = 'all-MiniLM-L6-v2', input_key: str = 'question', output_key: str = 'output', context_key: str = 'context')` ¶

`evaluate_run(run: Run, example: Example | None = None) -> Any` ¶

`groundlens.integrations.langchain.callback.GroundlensCallback(groundlens_model: str = 'all-MiniLM-L6-v2', context_key: str = 'context')` ¶

`on_llm_start(serialized: dict[str, Any], prompts: list[str], *, run_id: UUID, **kwargs: Any) -> None` ¶

`on_llm_end(response: LLMResult, *, run_id: UUID, **kwargs: Any) -> None` ¶

`on_llm_error(error: BaseException, *, run_id: UUID, **kwargs: Any) -> None` ¶

`groundlens.integrations.crewai.tool.GroundlensTool(name: str = 'groundlens_verify', description: str | None = None, groundlens_model: str = 'all-MiniLM-L6-v2')` ¶

`groundlens.integrations.semantic_kernel.filter.GroundlensFilter(groundlens_model: str = 'all-MiniLM-L6-v2', input_key: str = 'input', context_key: str = 'context')` ¶

`on_function_invocation(context: Any, next_handler: Callable[..., Awaitable[None]]) -> None` `async` ¶

`groundlens.integrations.autogen.checker.GroundlensChecker(groundlens_model: str = 'all-MiniLM-L6-v2', context_key: str = 'context')` ¶

`check(messages: list[dict[str, Any]], sender: Any, **kwargs: Any) -> dict[str, Any]` ¶

`groundlens._internal.geometry` ¶

`euclidean_distance(a: EmbeddingVector, b: EmbeddingVector) -> float` ¶

`unit_normalize(v: EmbeddingVector) -> EmbeddingVector` ¶

`displacement_vector(question_emb: EmbeddingVector, response_emb: EmbeddingVector) -> EmbeddingVector` ¶

`cosine_similarity(a: EmbeddingVector, b: EmbeddingVector) -> float` ¶

`mean_direction(vectors: list[EmbeddingVector]) -> EmbeddingVector` ¶

`groundlens._internal.thresholds` ¶

`SGI_STRONG_PASS: float = 1.2` `module-attribute` ¶

`SGI_REVIEW: float = 0.95` `module-attribute` ¶

`DGI_PASS: float = 0.3` `module-attribute` ¶

`normalize_sgi(raw_sgi: float) -> float` ¶

`normalize_dgi(raw_dgi: float) -> float` ¶