🦜️🔗 Langchain
Introduction
This section delves into the seamless integration of BeyondLLM with LangChain, a powerful toolkit for constructing and evaluating intelligent systems. By harnessing the combined capabilities of these tools, we'll demonstrate the creation of a robust document retrieval and question-answering (QA) system empowered by Retrieval-Augmented Generation (RAG).
Installation
The following code snippet installs the essential Python packages required for this integration:
!pip install langchain sentence-transformers chromadb llama-cpp-python langchain_community pypdf langchain-groq
!pip install beyondllm
!pip install faiss-cpu
Importing Necessary Libraries
Next, we import the necessary libraries to work with LangChain, document loading, text processing, embeddings, vector stores, language models, prompts, and evaluation metrics:
from beyondllm.utils import CONTEXT_RELEVANCE, GROUNDEDNESS, ANSWER_RELEVANCE
from langchain_community.document_loaders import PyPDFDirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain_groq import ChatGroq
from langchain.schema.runnable import RunnablePassthrough
from langchain.schema.output_parser import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
import re
import numpy as np
import pysbd
API Keys
Here, you'll need to replace <your groq api key>
with your actual Groq API key to establish a connection with the language model:
GROQ_API_KEY = "<your groq api key>"
Loading PDF Documents
This code snippet employs the PyPDFDirectoryLoader
class from LangChain to load PDF documents situated within a specified directory:
loader = PyPDFDirectoryLoader("/content/sample_data/Data")
docs = loader.load()
Text Splitting
For efficient processing, we leverage the RecursiveCharacterTextSplitter
class to partition the loaded documents into manageable chunks. The chunk_size
parameter controls the maximum size of each chunk, and chunk_overlap
determines the character overlap between consecutive chunks:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=756, chunk_overlap=50)
chunks = text_splitter.split_documents(docs)
Document Embeddings
We generate document embeddings using the HuggingFaceEmbeddings
class. This creates numerical representations that capture the semantic meaning of each document chunk. Here, we're using the pre-trained BAAI/bge-base-en-v1.5 model:
embeddings = HuggingFaceEmbeddings(model_name="BAAI/bge-base-en-v1.5")
print(embeddings)
Vector Store Creation
The document chunk embeddings are used to construct a vector store employing FAISS (Fast Approximate Nearest Neighbor Search). This facilitates efficient retrieval of documents similar to a given query based on their semantic closeness:
vectorstore = FAISS.from_documents(chunks, embeddings)
vectorstore
Querying and Retrieval
Formulate the Query: Define a query that represents the user's information need. For instance,
query = "what causes heart diseases"
Similarity Search: Utilize the vector store's
similarity_search
method to find documents that exhibit semantic similarity to the query. Thesearch_kwargs
argument allows you to configure the search parameters, such as the number of nearest neighbors (k
) to retrieve:
query = "what causes heart diseases"
search = vectorstore.similarity_search(query)
# Set up the retriever for similarity search
retriever = vectorstore.as_retriever(search_kwargs={'k': 3})
retriever.invoke(query)
Language Model Initialization
We initialize a language model instance using the ChatGroq
class from LangChain. This provides access to a powerful language model capable of generating text, translating languages, and answering questions. Remember to replace <your groq api key>
with your actual API key:
from langchain_groq import ChatGroq
llm = ChatGroq(
model="llama3-8b-8192",
groq_api_key=GROQ_API_KEY,
temperature=0.1 # Set the temperature to 0.1
)
Defining the Prompt Template
We'll create a prompt template using ChatPromptTemplate
to structure the interaction between the user query and the language model. This template provides clear instructions to the model:
template = """
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are an AI assistant that follows instruction extremely well. Please be truthful and give direct answers.<|eot_id|><|start_header_id|>user<|end_header_id|>
{query}<|eot_id|>
"""
prompt = ChatPromptTemplate.from_template(template)
Creating the RAG Chain
Now, we construct the Retrieval-Augmented Generation (RAG) chain. This chain orchestrates the retrieval of relevant documents based on the user query, formats the query and retrieved documents into a prompt, feeds it to the language model, and processes the model's response:
rag_chain = (
{"context": retriever, "query": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
Extracting Numbers from Response
A helper function to extract numerical values from the generated response:
def extract_number(response):
match = re.search(r'\b(10|[0-9]+)\b', response)
if match:
return int(match.group(0))
return np.nan
Tokenizing Sentences
Another helper function to split the response into sentences for further analysis:
def sent_tokenize(text: str):
seg = pysbd.Segmenter(language="en", clean=False)
return seg.segment(text)
Evaluating the RAG Chain with BeyondLLM Metrics
Context Relevancy
This function assesses how relevant the retrieved context is to the given query:
def get_context_relevancy(llm, query, context):
total_score = 0
score_count = 0
for content in context:
score_response = llm.invoke(CONTEXT_RELEVANCE.format(question=query, context=content))
# Access the content attribute directly
score_str = score_response.content
# Accumulate the score
score = float(extract_number(score_str))
total_score += score
score_count += 1
average_score = total_score / score_count if score_count > 0 else 0
return f"Context Relevancy Score: {round(average_score, 1)}"
Answer Relevancy
This function evaluates how relevant the generated answer is to the given query:
def get_answer_relevancy(llm, query, response):
answer_relevancy_score = llm.invoke(ANSWER_RELEVANCE.format(question=query, context=response))
return f"Answer Relevancy Score: {answer_relevancy_score}"
Groundedness
This function assesses how grounded the generated answer is in the provided context:
def get_groundedness(llm, response, context):
total_score = 0
score_count = 0
# Tokenize the response into sentences
statements = sent_tokenize(response)
for statement in statements:
score_response = llm.invoke(GROUNDEDNESS.format(statement=statement, context=" ".join(context)))
# Access the content attribute directly
score_str = score_response.content
# Accumulate the score
score = float(extract_number(score_str))
total_score += score
score_count += 1
average_groundedness = total_score / score_count if score_count > 0 else 0
return f"Groundedness Score: {round(average_groundedness, 1)}"
Example Usage
# Example query
query = "what causes heart diseases?"
# Retrieve relevant documents based on the user query
retrieved_docs = retriever.invoke(query)
# Prepare the context from the retrieved documents
context = [doc.page_content for doc in retrieved_docs]
# Get context relevancy score
print(get_context_relevancy(llm, query, context))
# Generate response using RAG chain
response = rag_chain.invoke(query)
# Get answer relevancy score
answer_relevancy_score = llm.invoke(ANSWER_RELEVANCE.format(question=query, context=response))
print(answer_relevancy_score.content)
# Get groundedness score
print(get_groundedness(llm, response, context))
This will give us the following output:
Context Relevancy Score: 7.7
Answer Relevancy Score: 9
Groundedness Score: 7.9
This way, we combine BeyondLLM's evaluation capabilities with LangChain's RAG framework, effectively assess the quality of generated responses based on context relevancy, answer relevancy, and groundedness.
Last updated