Skip to content

LangChain Integration

groundlens provides two LangChain integration components: GroundlensEvaluator for LangSmith experiment evaluation, and GroundlensCallback for real-time chain monitoring.

Installation

pip install "groundlens[langchain]"

This installs langsmith and langchain-core.

GroundlensEvaluator

The evaluator implements the LangSmith RunEvaluator protocol, enabling groundlens as an evaluator in LangSmith experiment pipelines.

Basic Usage

from groundlens.integrations.langchain import GroundlensEvaluator
from langsmith import evaluate as ls_evaluate

evaluator = GroundlensEvaluator()

# Run evaluation against a LangSmith dataset
results = ls_evaluate(
    my_chain,
    data="my-qa-dataset",
    evaluators=[evaluator],
)

Configuration

evaluator = GroundlensEvaluator(
    groundlens_model="all-MiniLM-L6-v2",  # Embedding model
    input_key="question",                 # Key for question in run inputs
    output_key="output",                  # Key for response in run outputs
    context_key="context",                # Key for context in example inputs
)
Parameter Default Description
groundlens_model "all-MiniLM-L6-v2" Sentence-transformer for scoring
input_key "question" Key to extract question from run inputs
output_key "output" Key to extract response from run outputs
context_key "context" Key to extract context from example inputs

Key Extraction

The evaluator attempts to find the question from run inputs using the input_key. If not found, it falls back to checking "input", "query", and "prompt" keys. Similarly, for the response, it checks output_key first, then "answer", "result", "text", and "response".

Context is extracted from the example.inputs dict (the ground-truth/reference data in LangSmith). When context is present, SGI scoring is used; when absent, DGI scoring is applied.

Result Format

The evaluator returns a LangSmith EvaluationResult with:

  • key: "groundlens"
  • score: The normalized groundlens score (0--1)
  • comment: The human-readable explanation

GroundlensCallback

The callback handler intercepts every LLM call in a LangChain chain and scores the response in real time. Flagged responses generate log warnings.

Basic Usage

from groundlens.integrations.langchain import GroundlensCallback
from langchain_openai import ChatOpenAI

cb = GroundlensCallback()
llm = ChatOpenAI(callbacks=[cb])

# Every LLM call is automatically scored
result = llm.invoke("What is the capital of France?")

# Inspect scores after execution
for run_id, score in cb.scores.items():
    print(f"{run_id}: {score.method} = {score.value:.3f} ({score.explanation})")

Configuration

cb = GroundlensCallback(
    groundlens_model="all-MiniLM-L6-v2",  # Embedding model
    context_key="context",                # Metadata key for context
)

Providing Context

Pass context via the metadata parameter:

result = llm.invoke(
    "Summarize this document.",
    metadata={"context": "The document discusses renewable energy trends..."},
)

When context is found in metadata, SGI scoring is used. Otherwise, DGI scoring is applied.

Accessing Scores

All scores are stored in cb.scores, keyed by LangChain run UUID:

cb = GroundlensCallback()
llm = ChatOpenAI(callbacks=[cb])

llm.invoke("Question 1?")
llm.invoke("Question 2?")

# Iterate all scores
for run_id, score in cb.scores.items():
    if score.flagged:
        print(f"WARNING: Run {run_id} flagged -- {score.explanation}")

Logging

The callback uses Python's logging module:

  • WARNING level for flagged responses
  • INFO level for passing responses
  • DEBUG level for lifecycle events (start, end, error)
import logging
logging.basicConfig(level=logging.INFO)

# Now groundlens callback events appear in logs
cb = GroundlensCallback()

Using Both Together

For comprehensive monitoring, use the callback for real-time alerting and the evaluator for batch experiment evaluation:

from groundlens.integrations.langchain import GroundlensCallback, GroundlensEvaluator
from langchain_openai import ChatOpenAI
from langsmith import evaluate as ls_evaluate

# Real-time monitoring
cb = GroundlensCallback()
llm = ChatOpenAI(callbacks=[cb])

# Batch evaluation
evaluator = GroundlensEvaluator()
results = ls_evaluate(llm, data="dataset", evaluators=[evaluator])