API Reference¶
This page provides the complete API reference for groundlens. All public classes and functions are documented with their signatures, parameters, return types, and examples.
For auto-generated documentation from source docstrings, ensure mkdocstrings is configured in your MkDocs build.
Core Functions¶
compute_sgi¶
groundlens.sgi.compute_sgi(question: str, context: str, response: str, *, model: str = DEFAULT_MODEL) -> SGIResult
¶
Compute the Semantic Grounding Index for a response.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
question
|
str
|
The input query. |
required |
context
|
str
|
Source document, retrieved chunks, or reference text. |
required |
response
|
str
|
The LLM output to evaluate. |
required |
model
|
str
|
Sentence transformer model name. Default |
DEFAULT_MODEL
|
Returns:
| Type | Description |
|---|---|
SGIResult
|
SGIResult with raw score, normalized score, and flag status. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If any input string is empty. |
Example
from groundlens import compute_sgi result = compute_sgi( ... question="What is the capital of France?", ... context="France is in Western Europe. Its capital is Paris.", ... response="The capital of France is Paris.", ... ) result.flagged False
Source code in src/groundlens/sgi.py
compute_dgi¶
groundlens.dgi.compute_dgi(question: str, response: str, *, model: str = DEFAULT_MODEL, reference_csv: str | None = None) -> DGIResult
¶
Compute the Directional Grounding Index for a response.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
question
|
str
|
The input query. |
required |
response
|
str
|
The LLM output to evaluate. |
required |
model
|
str
|
Sentence transformer model name. |
DEFAULT_MODEL
|
reference_csv
|
str | None
|
Path to domain-specific calibration CSV.
If |
None
|
Returns:
| Type | Description |
|---|---|
DGIResult
|
DGIResult with raw score, normalized score, and flag status. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If question or response is empty. |
Example
from groundlens import compute_dgi result = compute_dgi( ... question="What causes seasons on Earth?", ... response="Seasons are caused by Earth's 23.5-degree axial tilt.", ... ) result.flagged False
Source code in src/groundlens/dgi.py
evaluate¶
groundlens.evaluate.evaluate(question: str, response: str, context: str | None = None, *, model: str = DEFAULT_MODEL, reference_csv: str | None = None) -> GroundlensScore
¶
Evaluate a single LLM response for hallucination risk.
Auto-selects scoring method
- SGI when
contextis provided (grounded verification). - DGI when
contextisNone(context-free verification).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
question
|
str
|
The input query. |
required |
response
|
str
|
The LLM output to evaluate. |
required |
context
|
str | None
|
Source document or retrieved text. If provided, SGI is used. |
None
|
model
|
str
|
Sentence transformer model name. |
DEFAULT_MODEL
|
reference_csv
|
str | None
|
DGI calibration CSV path (only used when context is None). |
None
|
Returns:
| Type | Description |
|---|---|
GroundlensScore
|
GroundlensScore with method, value, flag, and explanation. |
Example
from groundlens import evaluate
With context → SGI¶
score = evaluate("Q?", "A.", context="Source text.") score.method 'sgi'
Without context → DGI¶
score = evaluate("Q?", "A.") score.method 'dgi'
Source code in src/groundlens/evaluate.py
evaluate_batch¶
groundlens.evaluate.evaluate_batch(items: list[dict[str, str]], *, model: str = DEFAULT_MODEL, reference_csv: str | None = None) -> list[GroundlensScore]
¶
Evaluate a batch of LLM responses.
Each item in the list is a dict with keys
question(required)response(required)context(optional — triggers SGI when present)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
items
|
list[dict[str, str]]
|
List of dicts, each containing question, response, and optionally context. |
required |
model
|
str
|
Sentence transformer model name. |
DEFAULT_MODEL
|
reference_csv
|
str | None
|
DGI calibration CSV path. |
None
|
Returns:
| Type | Description |
|---|---|
list[GroundlensScore]
|
List of GroundlensScore results, one per input item. |
Raises:
| Type | Description |
|---|---|
KeyError
|
If any item is missing |
Example
from groundlens import evaluate_batch items = [ ... {"question": "Q1?", "response": "A1.", "context": "C1."}, ... {"question": "Q2?", "response": "A2."}, ... ] results = evaluate_batch(items) len(results) 2
Source code in src/groundlens/evaluate.py
calibrate¶
groundlens.calibrate.calibrate(pairs: list[tuple[str, str]] | None = None, csv_path: str | None = None, *, model: str = DEFAULT_MODEL, metadata: dict[str, str] | None = None) -> CalibrationResult
¶
Compute a DGI reference direction from calibration data.
Provide either pairs directly or a csv_path to a file
with verified grounded (question, response) pairs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pairs
|
list[tuple[str, str]] | None
|
List of (question, response) tuples. |
None
|
csv_path
|
str | None
|
Path to a CSV file with |
None
|
model
|
str
|
Sentence transformer model to use for embedding. |
DEFAULT_MODEL
|
metadata
|
dict[str, str] | None
|
Optional metadata to attach (domain name, date, notes). |
None
|
Returns:
| Type | Description |
|---|---|
CalibrationResult
|
CalibrationResult with computed reference direction and statistics. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If neither |
Example
result = calibrate(pairs=[("Q?", "A.") for _ in range(20)]) result.n_pairs 20
Source code in src/groundlens/calibrate.py
102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 | |
Core Classes¶
SGI¶
groundlens.sgi.SGI(model: str = DEFAULT_MODEL)
¶
Reusable SGI scorer with a pre-configured embedding model.
Use this class when evaluating multiple responses with the same model
to avoid repeating the model parameter.
Example
sgi = SGI(model="all-MiniLM-L6-v2") result = sgi.score( ... question="What is X?", ... context="X is Y.", ... response="X is Y.", ... ) result.flagged False
Initialize SGI scorer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
str
|
Sentence transformer model name or path. |
DEFAULT_MODEL
|
Source code in src/groundlens/sgi.py
Functions¶
score(question: str, context: str, response: str) -> SGIResult
¶
Compute SGI for a single response.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
question
|
str
|
The input query. |
required |
context
|
str
|
Source document or reference text. |
required |
response
|
str
|
The LLM output to evaluate. |
required |
Returns:
| Type | Description |
|---|---|
SGIResult
|
SGIResult with score and flag status. |
Source code in src/groundlens/sgi.py
DGI¶
groundlens.dgi.DGI(model: str = DEFAULT_MODEL, reference_csv: str | None = None)
¶
Reusable DGI scorer with pre-configured model and calibration.
Use this class when evaluating multiple responses against the same reference direction. Supports both bundled and custom calibration.
Example
dgi = DGI() result = dgi.score( ... question="What is ML?", ... response="ML is a branch of AI.", ... ) result.flagged False
dgi = DGI(reference_csv="my_domain_pairs.csv") result = dgi.score(question="...", response="...")
Initialize DGI scorer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
str
|
Sentence transformer model name. |
DEFAULT_MODEL
|
reference_csv
|
str | None
|
Path to domain-specific calibration CSV. |
None
|
Source code in src/groundlens/dgi.py
Functions¶
calibrate(pairs: list[tuple[str, str]] | None = None, csv_path: str | None = None) -> None
¶
Set custom calibration data.
Either provide pairs directly or a path to a CSV file. This replaces any previously cached reference direction.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pairs
|
list[tuple[str, str]] | None
|
List of verified (question, response) tuples. |
None
|
csv_path
|
str | None
|
Path to a calibration CSV file. |
None
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If neither |
Source code in src/groundlens/dgi.py
score(question: str, response: str) -> DGIResult
¶
Compute DGI for a single response.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
question
|
str
|
The input query. |
required |
response
|
str
|
The LLM output to evaluate. |
required |
Returns:
| Type | Description |
|---|---|
DGIResult
|
DGIResult with score and flag status. |
Source code in src/groundlens/dgi.py
Result Types¶
SGIResult¶
groundlens.score.SGIResult(value: float, normalized: float, flagged: bool, q_dist: float, ctx_dist: float, method: str = 'sgi', explanation: str = '')
dataclass
¶
Result of Semantic Grounding Index computation.
SGI measures whether a response engaged with the provided context or stayed anchored to the question. Higher values indicate stronger context engagement (grounded).
Attributes:
| Name | Type | Description |
|---|---|---|
value |
float
|
Raw SGI score = dist(response, question) / dist(response, context). |
normalized |
float
|
Score mapped to [0, 1] via tanh normalization. |
flagged |
bool
|
|
q_dist |
float
|
Euclidean distance from response to question embedding. |
ctx_dist |
float
|
Euclidean distance from response to context embedding. |
method |
str
|
Always |
explanation |
str
|
Human-readable interpretation of the score. |
Functions¶
__post_init__() -> None
¶
Generate explanation from score if not provided.
Source code in src/groundlens/score.py
DGIResult¶
groundlens.score.DGIResult(value: float, normalized: float, flagged: bool, method: str = 'dgi', explanation: str = '')
dataclass
¶
Result of Directional Grounding Index computation.
DGI measures whether the question-to-response displacement vector aligns with the mean displacement of verified grounded pairs. Higher values indicate alignment with grounded patterns.
Attributes:
| Name | Type | Description |
|---|---|---|
value |
float
|
Raw DGI score = cosine similarity to reference direction. Range: [-1, 1]. |
normalized |
float
|
Score mapped to [0, 1] via linear normalization. |
flagged |
bool
|
|
method |
str
|
Always |
explanation |
str
|
Human-readable interpretation of the score. |
Functions¶
__post_init__() -> None
¶
Generate explanation from score if not provided.
Source code in src/groundlens/score.py
GroundlensScore¶
groundlens.score.GroundlensScore(value: float, normalized: float, flagged: bool, method: str, explanation: str, detail: SGIResult | DGIResult)
dataclass
¶
Unified score container returned by high-level evaluate() calls.
Wraps either an SGIResult or DGIResult with additional metadata.
Attributes:
| Name | Type | Description |
|---|---|---|
value |
float
|
Raw score from the underlying method. |
normalized |
float
|
Score in [0, 1]. |
flagged |
bool
|
Whether human review is recommended. |
method |
str
|
|
explanation |
str
|
Human-readable interpretation. |
detail |
SGIResult | DGIResult
|
The full SGIResult or DGIResult for method-specific fields. |
CalibrationResult¶
groundlens.calibrate.CalibrationResult(model: str, n_pairs: int, embedding_dim: int, mu_hat: NDArray[np.float32], concentration: float, metadata: dict[str, str] = dict())
dataclass
¶
Result of DGI calibration.
Attributes:
| Name | Type | Description |
|---|---|---|
model |
str
|
Sentence transformer model used for calibration. |
n_pairs |
int
|
Number of (question, response) pairs used. |
embedding_dim |
int
|
Dimensionality of the embedding space. |
mu_hat |
NDArray[float32]
|
The computed reference direction vector. |
concentration |
float
|
Estimated concentration parameter (kappa) of the von Mises-Fisher distribution. Higher values indicate more consistent displacement directions in the reference data. |
Functions¶
save(path: str | Path) -> None
¶
Save calibration result to JSON.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path
|
Output file path. The mu_hat vector is stored as a list. |
required |
Source code in src/groundlens/calibrate.py
load(path: str | Path) -> CalibrationResult
classmethod
¶
Load a saved calibration result.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path
|
Path to JSON calibration file. |
required |
Returns:
| Type | Description |
|---|---|
CalibrationResult
|
CalibrationResult instance with restored mu_hat vector. |
Source code in src/groundlens/calibrate.py
Providers¶
GroundlensOpenAI¶
groundlens.providers.openai.GroundlensOpenAI(api_key: str, model: str = 'gpt-4o', groundlens_model: str = 'all-MiniLM-L6-v2', groundlens_threshold: float = 0.45)
¶
OpenAI LLM provider with built-in groundlens scoring.
Wraps the OpenAI chat completions API and automatically evaluates each response for hallucination risk.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
api_key
|
str
|
OpenAI API key. |
required |
model
|
str
|
Chat model to use for generation. Defaults to |
'gpt-4o'
|
groundlens_model
|
str
|
Sentence-transformer model for groundlens scoring.
Defaults to |
'all-MiniLM-L6-v2'
|
groundlens_threshold
|
float
|
Score threshold override (reserved for future use).
Defaults to |
0.45
|
Example
llm = GroundlensOpenAI(api_key="sk-...") resp = llm.chat("Summarize this document.", context="The document text.") print(resp.groundlens_score.explanation)
Source code in src/groundlens/providers/openai.py
Functions¶
chat(prompt: str, context: str | None = None, **kwargs: Any) -> LLMResponse
¶
Send a chat completion request and score the response.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
prompt
|
str
|
The user message content. |
required |
context
|
str | None
|
Optional source document. When provided, SGI scoring is used; otherwise DGI scoring is applied. |
None
|
**kwargs
|
Any
|
Additional keyword arguments forwarded to the
OpenAI |
{}
|
Returns:
| Type | Description |
|---|---|
LLMResponse
|
LLMResponse containing the generated text, model identifier, |
LLMResponse
|
usage metadata, and a groundlens hallucination score. |
Raises:
| Type | Description |
|---|---|
OpenAIError
|
If the API call fails. |
Example
llm = GroundlensOpenAI(api_key="sk-...") resp = llm.chat("What causes tides?") resp.text 'Tides are primarily caused by...'
Source code in src/groundlens/providers/openai.py
complete(prompt: str, context: str | None = None) -> LLMResponse
¶
Generate a completion for the given prompt.
Convenience method that delegates to :meth:chat.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
prompt
|
str
|
The user prompt or instruction. |
required |
context
|
str | None
|
Optional source document for grounded evaluation. |
None
|
Returns:
| Type | Description |
|---|---|
LLMResponse
|
LLMResponse with generated text and groundlens score. |
Source code in src/groundlens/providers/openai.py
GroundlensAnthropic¶
groundlens.providers.anthropic.GroundlensAnthropic(api_key: str, model: str = 'claude-sonnet-4-20250514', groundlens_model: str = 'all-MiniLM-L6-v2', groundlens_threshold: float = 0.45)
¶
Anthropic Claude provider with built-in groundlens scoring.
Wraps the Anthropic messages API and automatically evaluates each response for hallucination risk.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
api_key
|
str
|
Anthropic API key. |
required |
model
|
str
|
Claude model to use for generation. Defaults to
|
'claude-sonnet-4-20250514'
|
groundlens_model
|
str
|
Sentence-transformer model for groundlens scoring.
Defaults to |
'all-MiniLM-L6-v2'
|
groundlens_threshold
|
float
|
Score threshold override (reserved for future use).
Defaults to |
0.45
|
Example
llm = GroundlensAnthropic(api_key="sk-ant-...") resp = llm.chat("Summarize this.", context="Source text here.") print(resp.groundlens_score.explanation)
Source code in src/groundlens/providers/anthropic.py
Functions¶
chat(prompt: str, context: str | None = None, **kwargs: Any) -> LLMResponse
¶
Send a message to Claude and score the response.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
prompt
|
str
|
The user message content. |
required |
context
|
str | None
|
Optional source document. When provided, SGI scoring is used; otherwise DGI scoring is applied. |
None
|
**kwargs
|
Any
|
Additional keyword arguments forwarded to the
Anthropic |
{}
|
Returns:
| Type | Description |
|---|---|
LLMResponse
|
LLMResponse containing the generated text, model identifier, |
LLMResponse
|
usage metadata, and a groundlens hallucination score. |
Raises:
| Type | Description |
|---|---|
APIError
|
If the API call fails. |
Example
llm = GroundlensAnthropic(api_key="sk-ant-...") resp = llm.chat("Explain photosynthesis.") resp.text 'Photosynthesis is the process by which...'
Source code in src/groundlens/providers/anthropic.py
complete(prompt: str, context: str | None = None) -> LLMResponse
¶
Generate a completion for the given prompt.
Convenience method that delegates to :meth:chat.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
prompt
|
str
|
The user prompt or instruction. |
required |
context
|
str | None
|
Optional source document for grounded evaluation. |
None
|
Returns:
| Type | Description |
|---|---|
LLMResponse
|
LLMResponse with generated text and groundlens score. |
Source code in src/groundlens/providers/anthropic.py
GroundlensGemini¶
groundlens.providers.google.GroundlensGemini(api_key: str, model: str = 'gemini-2.0-flash', groundlens_model: str = 'all-MiniLM-L6-v2', groundlens_threshold: float = 0.45)
¶
Google Gemini provider with built-in groundlens scoring.
Wraps the Google Generative AI SDK and automatically evaluates each response for hallucination risk.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
api_key
|
str
|
Google AI API key. |
required |
model
|
str
|
Gemini model to use for generation. Defaults to
|
'gemini-2.0-flash'
|
groundlens_model
|
str
|
Sentence-transformer model for groundlens scoring.
Defaults to |
'all-MiniLM-L6-v2'
|
groundlens_threshold
|
float
|
Score threshold override (reserved for future use).
Defaults to |
0.45
|
Example
llm = GroundlensGemini(api_key="AI...") resp = llm.chat("Summarize this.", context="Source text here.") print(resp.groundlens_score.explanation)
Source code in src/groundlens/providers/google.py
Functions¶
chat(prompt: str, context: str | None = None, **kwargs: Any) -> LLMResponse
¶
Send a prompt to Gemini and score the response.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
prompt
|
str
|
The user message content. |
required |
context
|
str | None
|
Optional source document. When provided, SGI scoring is used; otherwise DGI scoring is applied. |
None
|
**kwargs
|
Any
|
Additional keyword arguments forwarded to the
Gemini |
{}
|
Returns:
| Type | Description |
|---|---|
LLMResponse
|
LLMResponse containing the generated text, model identifier, |
LLMResponse
|
usage metadata, and a groundlens hallucination score. |
Raises:
| Type | Description |
|---|---|
GoogleAPIError
|
If the API call fails. |
Example
llm = GroundlensGemini(api_key="AI...") resp = llm.chat("Explain gravity.") resp.text 'Gravity is a fundamental force...'
Source code in src/groundlens/providers/google.py
complete(prompt: str, context: str | None = None) -> LLMResponse
¶
Generate a completion for the given prompt.
Convenience method that delegates to :meth:chat.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
prompt
|
str
|
The user prompt or instruction. |
required |
context
|
str | None
|
Optional source document for grounded evaluation. |
None
|
Returns:
| Type | Description |
|---|---|
LLMResponse
|
LLMResponse with generated text and groundlens score. |
Source code in src/groundlens/providers/google.py
Integrations¶
GroundlensEvaluator (LangChain)¶
groundlens.integrations.langchain.evaluator.GroundlensEvaluator(groundlens_model: str = 'all-MiniLM-L6-v2', input_key: str = 'question', output_key: str = 'output', context_key: str = 'context')
¶
LangSmith run evaluator that scores outputs with groundlens.
Extracts input, output, and optional context from LangSmith runs and examples, then computes SGI (when context is available) or DGI (context-free) scores.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
groundlens_model
|
str
|
Sentence-transformer model for groundlens scoring.
Defaults to |
'all-MiniLM-L6-v2'
|
input_key
|
str
|
Key to extract the question from run inputs.
Defaults to |
'question'
|
output_key
|
str
|
Key to extract the response from run outputs.
Defaults to |
'output'
|
context_key
|
str
|
Key to extract context from example inputs.
Defaults to |
'context'
|
Example
evaluator = GroundlensEvaluator()
Typically used with LangSmith evaluate():¶
from langsmith import evaluate¶
evaluate(chain, data="dataset", evaluators=[evaluator])¶
Source code in src/groundlens/integrations/langchain/evaluator.py
Functions¶
evaluate_run(run: Run, example: Example | None = None) -> Any
¶
Evaluate a LangSmith run for hallucination risk.
Extracts the question from run inputs, the response from run
outputs, and optionally context from the example inputs. Returns
a LangSmith EvaluationResult with the groundlens score.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
run
|
Run
|
The LangSmith run to evaluate. Must have |
required |
example
|
Example | None
|
Optional LangSmith example providing ground truth or context for SGI evaluation. |
None
|
Returns:
| Type | Description |
|---|---|
Any
|
An |
Any
|
score, and a comment containing the explanation. |
Example
evaluator = GroundlensEvaluator() result = evaluator.evaluate_run(run, example) result.key 'groundlens'
Source code in src/groundlens/integrations/langchain/evaluator.py
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 | |
GroundlensCallback (LangChain)¶
groundlens.integrations.langchain.callback.GroundlensCallback(groundlens_model: str = 'all-MiniLM-L6-v2', context_key: str = 'context')
¶
LangChain callback handler that scores every LLM response with groundlens.
Stores prompts on on_llm_start and evaluates responses on
on_llm_end. Flagged results are logged as warnings. Scores
are accumulated in :attr:scores for later inspection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
groundlens_model
|
str
|
Sentence-transformer model for groundlens scoring.
Defaults to |
'all-MiniLM-L6-v2'
|
context_key
|
str
|
Metadata key to look for context in |
'context'
|
Example
cb = GroundlensCallback()
Use as a LangChain callback¶
from langchain_openai import ChatOpenAI llm = ChatOpenAI(callbacks=[cb]) result = llm.invoke("Summarize the document.")
Inspect scores after execution¶
for run_id, score in cb.scores.items(): ... print(f"{run_id}: {score.explanation}")
Source code in src/groundlens/integrations/langchain/callback.py
Functions¶
on_llm_start(serialized: dict[str, Any], prompts: list[str], *, run_id: UUID, **kwargs: Any) -> None
¶
Store prompts when an LLM call begins.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
serialized
|
dict[str, Any]
|
Serialized LLM configuration. |
required |
prompts
|
list[str]
|
List of prompt strings sent to the LLM. |
required |
run_id
|
UUID
|
Unique identifier for this LLM run. |
required |
**kwargs
|
Any
|
Additional keyword arguments from LangChain. |
{}
|
Source code in src/groundlens/integrations/langchain/callback.py
on_llm_end(response: LLMResult, *, run_id: UUID, **kwargs: Any) -> None
¶
Evaluate the LLM response for hallucination risk.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
response
|
LLMResult
|
The LLM result containing generated text. |
required |
run_id
|
UUID
|
Unique identifier for this LLM run. |
required |
**kwargs
|
Any
|
Additional keyword arguments from LangChain. |
{}
|
Source code in src/groundlens/integrations/langchain/callback.py
on_llm_error(error: BaseException, *, run_id: UUID, **kwargs: Any) -> None
¶
Clean up state when an LLM call fails.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
error
|
BaseException
|
The exception that caused the LLM call to fail. |
required |
run_id
|
UUID
|
Unique identifier for this LLM run. |
required |
**kwargs
|
Any
|
Additional keyword arguments from LangChain. |
{}
|
Source code in src/groundlens/integrations/langchain/callback.py
GroundlensTool (CrewAI)¶
groundlens.integrations.crewai.tool.GroundlensTool(name: str = 'groundlens_verify', description: str | None = None, groundlens_model: str = 'all-MiniLM-L6-v2')
¶
CrewAI tool for verifying LLM outputs using groundlens.
Extends the CrewAI tool pattern to let agents self-verify their outputs. The tool evaluates a question-response pair (with optional context) and returns a human-readable verification summary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Tool name visible to the agent. Defaults to
|
'groundlens_verify'
|
description
|
str | None
|
Tool description for agent tool selection. |
None
|
groundlens_model
|
str
|
Sentence-transformer model for groundlens scoring.
Defaults to |
'all-MiniLM-L6-v2'
|
Example
from groundlens.integrations.crewai import GroundlensTool tool = GroundlensTool()
Agent uses the tool to verify its own output¶
result = tool._run( ... question="What causes rain?", ... response="Rain is caused by condensation.", ... context="Water cycle: evaporation, condensation, precipitation.", ... ) "PASS" in result or "FLAGGED" in result True
Source code in src/groundlens/integrations/crewai/tool.py
GroundlensFilter (Semantic Kernel)¶
groundlens.integrations.semantic_kernel.filter.GroundlensFilter(groundlens_model: str = 'all-MiniLM-L6-v2', input_key: str = 'input', context_key: str = 'context')
¶
Semantic Kernel function invocation filter with groundlens scoring.
Intercepts function invocation results and evaluates them for
hallucination risk. Scores are attached to the invocation context
metadata under the "groundlens_score" key and stored in
:attr:scores for later inspection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
groundlens_model
|
str
|
Sentence-transformer model for groundlens scoring.
Defaults to |
'all-MiniLM-L6-v2'
|
input_key
|
str
|
Key to extract the question from function arguments.
Defaults to |
'input'
|
context_key
|
str
|
Key to extract context from function arguments.
Defaults to |
'context'
|
Example
filt = GroundlensFilter()
Register with a Semantic Kernel instance¶
kernel.add_filter("function_invocation", filt)
After invocation, inspect scores:¶
for fn_name, score in filt.scores: ... print(f"{fn_name}: {score.explanation}")
Source code in src/groundlens/integrations/semantic_kernel/filter.py
Functions¶
on_function_invocation(context: Any, next_handler: Callable[..., Awaitable[None]]) -> None
async
¶
Intercept a function invocation and evaluate the result.
Calls the next filter/function in the pipeline, then evaluates the result with groundlens. Attaches the score to the context metadata.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
context
|
Any
|
The Semantic Kernel |
required |
next_handler
|
Callable[..., Awaitable[None]]
|
The next handler in the filter pipeline. |
required |
Example
This method is called automatically by Semantic Kernel¶
when registered as a function invocation filter.¶
Source code in src/groundlens/integrations/semantic_kernel/filter.py
GroundlensChecker (AutoGen)¶
groundlens.integrations.autogen.checker.GroundlensChecker(groundlens_model: str = 'all-MiniLM-L6-v2', context_key: str = 'context')
¶
AutoGen reply checker that evaluates messages with groundlens.
Designed to be used as a reply validation step in AutoGen agent conversations. Evaluates the last assistant message against the preceding user message for hallucination risk.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
groundlens_model
|
str
|
Sentence-transformer model for groundlens scoring.
Defaults to |
'all-MiniLM-L6-v2'
|
context_key
|
str
|
Key to look for context in message metadata.
Defaults to |
'context'
|
Example
checker = GroundlensChecker() messages = [ ... {"role": "user", "content": "Summarize this document."}, ... {"role": "assistant", "content": "The document discusses..."}, ... ] result = checker.check(messages, sender=None) result["method"] 'dgi' result["flagged"] False
Source code in src/groundlens/integrations/autogen/checker.py
Functions¶
check(messages: list[dict[str, Any]], sender: Any, **kwargs: Any) -> dict[str, Any]
¶
Evaluate the last message in the conversation.
Extracts the last assistant message as the response and the most recent preceding user message as the question. If context is found in message metadata, SGI scoring is used; otherwise DGI is applied.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
messages
|
list[dict[str, Any]]
|
List of conversation message dicts. Each dict should
have |
required |
sender
|
Any
|
The AutoGen agent that sent the last message.
Used for logging; can be |
required |
**kwargs
|
Any
|
Additional keyword arguments. If a |
{}
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
A dict containing:
- |
Example
checker = GroundlensChecker() result = checker.check( ... messages=[ ... {"role": "user", "content": "What is 2+2?"}, ... {"role": "assistant", "content": "2+2 equals 4."}, ... ], ... sender=None, ... ) isinstance(result["score"], float) True
Source code in src/groundlens/integrations/autogen/checker.py
83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 | |
Internal Modules¶
Internal API
The following modules are internal implementation details. They are documented here for completeness but are not part of the public API and may change without notice.
Geometry Primitives¶
groundlens._internal.geometry
¶
Geometric primitives for embedding space operations.
This module provides the mathematical building blocks used by SGI and DGI. All operations are on vectors in R^n (the embedding space of a sentence transformer), which can be understood geometrically on the unit hypersphere S^(n-1) when vectors are L2-normalized.
Key concepts:
-
Euclidean distance in R^n is used by SGI to compare how far the response embedding is from the question vs. the context.
-
Displacement vectors (r - q) capture the semantic "movement" from question to response. DGI projects these onto a reference direction.
-
Unit normalization maps vectors to S^(n-1). On the unit hypersphere, dot product equals cosine similarity, and Euclidean distance is a monotonic function of angular distance.
References
Marin (2025). Semantic Grounding Index. arXiv:2512.13771. Marin (2026). A Geometric Taxonomy of Hallucinations. arXiv:2602.13224v3.
Functions¶
euclidean_distance(a: EmbeddingVector, b: EmbeddingVector) -> float
¶
Compute Euclidean distance between two embedding vectors.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
a
|
EmbeddingVector
|
First embedding vector, shape (d,). |
required |
b
|
EmbeddingVector
|
Second embedding vector, shape (d,). |
required |
Returns:
| Type | Description |
|---|---|
float
|
Non-negative scalar distance. |
Source code in src/groundlens/_internal/geometry.py
unit_normalize(v: EmbeddingVector) -> EmbeddingVector
¶
Project vector onto the unit hypersphere S^(n-1).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
v
|
EmbeddingVector
|
Input vector, shape (d,). |
required |
Returns:
| Type | Description |
|---|---|
EmbeddingVector
|
Unit vector v / ||v||, or the zero vector if ||v|| < epsilon. |
Source code in src/groundlens/_internal/geometry.py
displacement_vector(question_emb: EmbeddingVector, response_emb: EmbeddingVector) -> EmbeddingVector
¶
Compute the displacement from question to response in embedding space.
The displacement delta = phi(response) - phi(question) captures the semantic transformation applied by the LLM when generating a response. In grounded responses, this displacement aligns with a characteristic reference direction.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
question_emb
|
EmbeddingVector
|
Question embedding, shape (d,). |
required |
response_emb
|
EmbeddingVector
|
Response embedding, shape (d,). |
required |
Returns:
| Type | Description |
|---|---|
EmbeddingVector
|
Displacement vector, shape (d,). |
Source code in src/groundlens/_internal/geometry.py
cosine_similarity(a: EmbeddingVector, b: EmbeddingVector) -> float
¶
Compute cosine similarity between two vectors.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
a
|
EmbeddingVector
|
First vector, shape (d,). |
required |
b
|
EmbeddingVector
|
Second vector, shape (d,). |
required |
Returns:
| Type | Description |
|---|---|
float
|
Cosine similarity in [-1, 1]. Returns 0.0 if either vector |
float
|
has near-zero norm. |
Source code in src/groundlens/_internal/geometry.py
mean_direction(vectors: list[EmbeddingVector]) -> EmbeddingVector
¶
Compute the mean direction of a set of unit vectors.
This is the maximum-likelihood estimate of the mean direction parameter mu of a von Mises-Fisher distribution on S^(n-1).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vectors
|
list[EmbeddingVector]
|
List of unit-normalized vectors, each shape (d,). |
required |
Returns:
| Type | Description |
|---|---|
EmbeddingVector
|
Unit-normalized mean direction, shape (d,). Zero vector if |
EmbeddingVector
|
the input vectors cancel out. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the input list is empty. |
Source code in src/groundlens/_internal/geometry.py
Thresholds¶
groundlens._internal.thresholds
¶
Threshold constants and normalization functions.
All thresholds are derived empirically from the experiments reported in arXiv:2512.13771 (SGI) and arXiv:2602.13224v3 (DGI).
These constants define the decision boundaries for flagging LLM outputs as potential hallucinations. They are intentionally conservative: the default behavior is to flag for human review rather than silently pass.
Attributes¶
SGI_STRONG_PASS: float = 1.2
module-attribute
¶
SGI score indicating strong context engagement. Green zone.
SGI_REVIEW: float = 0.95
module-attribute
¶
SGI score below which output is flagged for human review. Red zone.
DGI_PASS: float = 0.3
module-attribute
¶
DGI score indicating alignment with grounded reference direction. Green zone.
Functions¶
normalize_sgi(raw_sgi: float) -> float
¶
Normalize raw SGI score to [0, 1] range.
Uses tanh mapping with offset to produce a smooth sigmoid curve
normalized = tanh(max(0, raw - 0.3))
This maps the raw SGI range (~0.5 to ~2.0) into a [0, 1] range suitable for dashboards and threshold comparison.
Mapping reference points
SGI 0.30 → 0.000 (floor) SGI 0.95 → 0.457 (review threshold) SGI 1.20 → 0.604 (strong pass) SGI 2.00 → 0.885 (very strong)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
raw_sgi
|
float
|
The raw SGI ratio (q_dist / ctx_dist). |
required |
Returns:
| Type | Description |
|---|---|
float
|
Score in [0.0, 1.0]. |
Source code in src/groundlens/_internal/thresholds.py
normalize_dgi(raw_dgi: float) -> float
¶
Normalize raw DGI score from [-1, 1] to [0, 1] range.
Simple linear mapping: normalized = (raw + 1) / 2.
Mapping reference points
DGI -1.0 → 0.000 (opposite to grounded direction) DGI 0.0 → 0.500 (orthogonal) DGI 0.3 → 0.650 (pass threshold) DGI 1.0 → 1.000 (perfectly aligned)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
raw_dgi
|
float
|
The raw DGI cosine similarity to reference direction. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Score in [0.0, 1.0]. |
Source code in src/groundlens/_internal/thresholds.py
Constants¶
| Constant | Value | Module | Description |
|---|---|---|---|
SGI_STRONG_PASS |
1.20 | groundlens._internal.thresholds |
SGI strong pass threshold |
SGI_REVIEW |
0.95 | groundlens._internal.thresholds |
SGI review/flag threshold |
DGI_PASS |
0.30 | groundlens._internal.thresholds |
DGI pass threshold |
DEFAULT_MODEL |
"all-MiniLM-L6-v2" |
groundlens._internal.embeddings |
Default sentence-transformer model |
Type Summary¶
| Type | Description | Key fields |
|---|---|---|
SGIResult |
SGI computation result | value, normalized, flagged, q_dist, ctx_dist |
DGIResult |
DGI computation result | value, normalized, flagged |
GroundlensScore |
Unified evaluation result | value, normalized, flagged, method, explanation, detail |
CalibrationResult |
DGI calibration output | model, n_pairs, embedding_dim, mu_hat, concentration |
LLMResponse |
Provider response wrapper | text, model, usage, groundlens_score |