BeyondLLM
  • Getting started
    • 📄Overview
    • 🔧Installation
    • 🚀Quickstart Guide
  • Core Components
    • 🌐Source
    • 🧬Embeddings
    • 🤖Auto Retriever
      • 🔫Evaluate retriever
    • 💼Vector Store
    • 🧠LLMs
    • 🔋Generator
    • 🧠Memory
    • 📊Evaluation
    • ⏰Observability
  • Advanced RAG
    • 📚Re-ranker Retrievers
    • 🔀Hybrid Retrievers
    • 📐Finetune Embeddings
  • Integration
    • 🦜️🔗 Langchain
    • 🦙 LlamaIndex
  • Use Cases
    • 💬Chat with PowerPoint Presentation
    • 🔍Document Search and Chat
    • 🤖Customer Service Bot
    • 🗣️Multilingual RAG
  • How to Guides
    • ➕How to add new LLM?
    • ➕How to add new Embeddings?
    • ➕How to add a new Loader?
  • Community Spotlight
    • 🔄Share your work
    • 👏Acknowledgements
Powered by GitBook
On this page
  • Context Relevance
  • Answer Relevance
  • Groundedness
  • Ground Truth
  • RAG Triad
  1. Core Components

Evaluation

The effectiveness of a RAG pipeline is assessed through four key evaluation benchmarks: Context Relevance, Answer Relevance, Groundedness, and Ground Truth. Each benchmark uses a scoring range from 0 to 10.

Context Relevance

Measures the relevance of the chunks retrieved by the auto_retriever in relation to the user's query. Determines the efficiency of the auto_retriever in fetching contextually relevant information, ensuring that the foundation for generating responses is solid. A score between 0 (least relevant) to 10 (most relevant) evaluates the retriever's performance in sourcing relevant data.

Parameters

  • User Query : The Question/Query to get the response of.

Code snippet

pipeline = generator.Generate(question=query,retriever=retriever,llm=llm)
print(pipeline.get_context_relevancy())

Answer Relevance

Evaluates the relevance of the LLM's response to the user query. It assess the LLM's ability to generate useful and appropriate answers, reflecting its utility in practical scenarios. A score from 0 (irrelevant) to 10 (highly relevant) quantifies the relevance of responses to user queries.

Parameters

  • User Query : The Question/Query to get the response of.

Code snippet

pipeline = generator.Generate(question=query,retriever=retriever,llm=llm)
print(pipeline.get_answer_relevancy())

Groundedness

Determines the extent to which the language model's responses are grounded in the information retrieved by the auto_retriever, aiming to identify any hallucinated content, it ensures that the outputs are based on factual information. The response is divided into statements which are then cross-referenced with retrieved chunks, scored from 0 (completely hallucinated) to 10 (fully grounded).

Parameters

  • User Query : The Question/Query to get the response of.

Code snippet

pipeline = generator.Generate(question=query,retriever=retriever,llm=llm)
print(pipeline.get_groundedness())

Ground Truth

Measures the alignment between the LLM's response and a predefined correct answer provided by the user. Evaluates the overall effectiveness of the pipeline in understanding and responding to queries as intended, serving as a comprehensive benchmark of performance. This benchmark considers the entire processing pipeline's ability to produce the expected outcome, with scores reflecting the degree of match to the ground truth answer. A score from 0 to 10 quantifies how well the LLM is performing.

Parameters

  • User Query : The Question/Query to get the response of.

  • Ground Truth : The actual answer to the user query passed earlier.

Code snippet

pipeline = generator.Generate(question=query,retriever=retriever,llm=llm)
print(pipeline.get_ground_truth(ground_truth))

RAG Triad

Computes and returns the relevancy (Context and Answer) and groundedness scores for the response generated by the pipeline. This method directly calculates all three key evaluation metrics.

  • Context Relevancy

  • Answer Relevancy

  • Groundedness

Code snippet

pipeline = generator.Generate(question=query,retriever=retriever,llm=llm)
print(pipeline.get_rag_triad_evals())
PreviousMemoryNextObservability

Last updated 1 year ago

📊