# Evaluate retriever

### Retriever Evaluation in BeyondLLM

Evaluating the performance of your chosen retriever is crucial for ensuring the effectiveness and accuracy of your BeyondLLM application. Evaluating retrievers helps:

* **Measure Retrieval Quality:** Quantify how well the retriever identifies relevant information from the knowledge base based on user queries.
* **Compare Different Retrievers:** Assess and compare the performance of various retriever types `(Normal, Reranker, Hybrid)` to determine the best option for your specific application.
* **Optimize Retrieval Parameters:** Fine-tune parameters like `top_k` and `reranker models` to improve retrieval effectiveness.

### Evaluation Metrics

BeyondLLM offers two key metrics for retriever evaluation:

* **Hit Rate:** This metric represents the percentage of queries where the retriever successfully retrieves at least one relevant document from the knowledge base. A higher hit rate indicates better overall retrieval performance.
* **Mean Reciprocal Rank (MRR):** This metric considers the ranking of relevant documents within the retrieved results. It calculates the reciprocal of the rank of the first relevant document for each query and averages these values across all queries. A higher MRR signifies that relevant documents are ranked higher in the retrieval results.

### Evaluation Process with retriever.evaluate(llm)

The retriever.evaluate(llm) function facilitates the evaluation process by automatically generating question-answer pairs from your data using the provided Large Language Model (LLM). These QA pairs are then used to assess the retriever's performance based on the hit rate and MRR metrics.

**Here's how it works:**

1. **QA Pair Generation:** The LLM is prompted to generate questions based on the content of your knowledge base. For each piece of text (node) in your data, the LLM creates a set of questions that are likely to be answered by that specific text segment.
2. **Retrieval and Evaluation:** Each generated question is used as a query to the retriever. The retrieved documents are then compared to the expected relevant document (the one from which the question was generated). The hit rate and MRR are calculated based on whether the retriever successfully identified the correct document and its ranking within the results.

### **Important Considerations:**

* **LLM Calls:** Generating QA pairs requires multiple LLM calls, which can be time-consuming and resource-intensive, depending on the size of your knowledge base and the number of questions generated per text segment.
* **LLM Capabilities:** The quality of the generated QA pairs significantly impacts the evaluation results. Ensure your chosen LLM has adequate question generation capabilities and is aligned with the domain and content of your knowledge base.

### **Example Usage:**

```python
from beyondllm.retrieve import auto_retriever
from beyondllm.source import fit
from beyondllm.retrieve import auto_retriever
from beyondllm.llms import ChatOpenAIModel

data = fit(path="<your-doc-path-here>", dtype="<your-dtype>")
retriever = auto_retriever(data=data, type="normal", top_k=5) # takes default FastEmbedEmbeddings model

# used for generating QA pairs in evaluation
llm = ChatOpenAIModel(model="gpt-3.5-turbo",api_key = "",model_kwargs = {"max_tokens":512,"temperature":0.1})  

results = retriever.evaluate(llm)

print(f"Hit Rate: {results['hit_rate']}")
print(f"MRR: {results['mrr']}")
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://beyondllm.aiplanet.com/core-components/auto-retriever/evaluate-retriever.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
