📊Evaluation
The effectiveness of a RAG pipeline is assessed through four key evaluation benchmarks: Context Relevance, Answer Relevance, Groundedness, and Ground Truth. Each benchmark uses a scoring range from 0 to 10.
Context Relevance
Measures the relevance of the chunks retrieved by the auto_retriever in relation to the user's query. Determines the efficiency of the auto_retriever in fetching contextually relevant information, ensuring that the foundation for generating responses is solid. A score between 0 (least relevant) to 10 (most relevant) evaluates the retriever's performance in sourcing relevant data.
Parameters
User Query : The Question/Query to get the response of.
Code snippet
Answer Relevance
Evaluates the relevance of the LLM's response to the user query. It assess the LLM's ability to generate useful and appropriate answers, reflecting its utility in practical scenarios. A score from 0 (irrelevant) to 10 (highly relevant) quantifies the relevance of responses to user queries.
Parameters
User Query : The Question/Query to get the response of.
Code snippet
Groundedness
Determines the extent to which the language model's responses are grounded in the information retrieved by the auto_retriever, aiming to identify any hallucinated content, it ensures that the outputs are based on factual information. The response is divided into statements which are then cross-referenced with retrieved chunks, scored from 0 (completely hallucinated) to 10 (fully grounded).
Parameters
User Query : The Question/Query to get the response of.
Code snippet
Ground Truth
Measures the alignment between the LLM's response and a predefined correct answer provided by the user. Evaluates the overall effectiveness of the pipeline in understanding and responding to queries as intended, serving as a comprehensive benchmark of performance. This benchmark considers the entire processing pipeline's ability to produce the expected outcome, with scores reflecting the degree of match to the ground truth answer. A score from 0 to 10 quantifies how well the LLM is performing.
Parameters
User Query : The Question/Query to get the response of.
Ground Truth : The actual answer to the user query passed earlier.
Code snippet
RAG Triad
Computes and returns the relevancy (Context and Answer) and groundedness scores for the response generated by the pipeline. This method directly calculates all three key evaluation metrics.
Context Relevancy
Answer Relevancy
Groundedness
Code snippet
Last updated