# Evaluation

The effectiveness of a RAG pipeline is assessed through four key evaluation benchmarks: Context Relevance, Answer Relevance, Groundedness, and Ground Truth. Each benchmark uses a scoring range from 0 to 10.

## Context Relevance

Measures the relevance of the chunks retrieved by the auto\_retriever in relation to the user's query. Determines the efficiency of the auto\_retriever in fetching contextually relevant information, ensuring that the foundation for generating responses is solid. A score between 0 (least relevant) to 10 (most relevant) evaluates the retriever's performance in sourcing relevant data.&#x20;

**Parameters**

* **User Query** :  The Question/Query to get the response of.

**Code snippet**

```python
pipeline = generator.Generate(question=query,retriever=retriever,llm=llm)
print(pipeline.get_context_relevancy())
```

## Answer Relevance&#x20;

Evaluates the relevance of the LLM's response to the user query. It assess the LLM's ability to generate useful and appropriate answers, reflecting its utility in practical scenarios. A score from 0 (irrelevant) to 10 (highly relevant) quantifies the relevance of responses to user queries.

**Parameters**

* **User Query** :  The Question/Query to get the response of.

**Code snippet**

```python
pipeline = generator.Generate(question=query,retriever=retriever,llm=llm)
print(pipeline.get_answer_relevancy())
```

## Groundedness&#x20;

Determines the extent to which the language model's responses are grounded in the information retrieved by the auto\_retriever, aiming to identify any hallucinated content, it ensures that the outputs are based on factual information. The response is divided into statements which are then cross-referenced with retrieved chunks, scored from 0 (completely hallucinated) to 10 (fully grounded).

**Parameters**

* **User Query** :  The Question/Query to get the response of.

**Code snippet**

```python
pipeline = generator.Generate(question=query,retriever=retriever,llm=llm)
print(pipeline.get_groundedness())
```

## Ground Truth&#x20;

Measures the alignment between the LLM's response and a predefined correct answer provided by the user. Evaluates the overall effectiveness of the pipeline in understanding and responding to queries as intended, serving as a comprehensive benchmark of performance. This benchmark considers the entire processing pipeline's ability to produce the expected outcome, with scores reflecting the degree of match to the ground truth answer. A score from 0 to 10 quantifies how well the LLM is performing.&#x20;

**Parameters**

* **User Query** :  The Question/Query to get the response of.
* **Ground Truth** : The actual answer to the user query passed earlier.&#x20;

**Code snippet**

```python
pipeline = generator.Generate(question=query,retriever=retriever,llm=llm)
print(pipeline.get_ground_truth(ground_truth))
```

## RAG Triad&#x20;

Computes and returns the relevancy (Context and Answer) and groundedness scores for the response  generated by the pipeline. This method directly calculates all three key evaluation metrics.

* Context Relevancy
* Answer Relevancy
* Groundedness &#x20;

**Code snippet**

```python
pipeline = generator.Generate(question=query,retriever=retriever,llm=llm)
print(pipeline.get_rag_triad_evals())
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://beyondllm.aiplanet.com/core-components/evaluation.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
