🦜️🔗 Langchain
Introduction
This section delves into the seamless integration of BeyondLLM with LangChain, a powerful toolkit for constructing and evaluating intelligent systems. By harnessing the combined capabilities of these tools, we'll demonstrate the creation of a robust document retrieval and question-answering (QA) system empowered by Retrieval-Augmented Generation (RAG).
Installation
The following code snippet installs the essential Python packages required for this integration:
Importing Necessary Libraries
Next, we import the necessary libraries to work with LangChain, document loading, text processing, embeddings, vector stores, language models, prompts, and evaluation metrics:
API Keys
Here, you'll need to replace <your groq api key>
with your actual Groq API key to establish a connection with the language model:
Loading PDF Documents
This code snippet employs the PyPDFDirectoryLoader
class from LangChain to load PDF documents situated within a specified directory:
Text Splitting
For efficient processing, we leverage the RecursiveCharacterTextSplitter
class to partition the loaded documents into manageable chunks. The chunk_size
parameter controls the maximum size of each chunk, and chunk_overlap
determines the character overlap between consecutive chunks:
Document Embeddings
We generate document embeddings using the HuggingFaceEmbeddings
class. This creates numerical representations that capture the semantic meaning of each document chunk. Here, we're using the pre-trained BAAI/bge-base-en-v1.5 model:
Vector Store Creation
The document chunk embeddings are used to construct a vector store employing FAISS (Fast Approximate Nearest Neighbor Search). This facilitates efficient retrieval of documents similar to a given query based on their semantic closeness:
Querying and Retrieval
Formulate the Query: Define a query that represents the user's information need. For instance,
query = "what causes heart diseases"
Similarity Search: Utilize the vector store's
similarity_search
method to find documents that exhibit semantic similarity to the query. Thesearch_kwargs
argument allows you to configure the search parameters, such as the number of nearest neighbors (k
) to retrieve:
Language Model Initialization
We initialize a language model instance using the ChatGroq
class from LangChain. This provides access to a powerful language model capable of generating text, translating languages, and answering questions. Remember to replace <your groq api key>
with your actual API key:
Defining the Prompt Template
We'll create a prompt template using ChatPromptTemplate
to structure the interaction between the user query and the language model. This template provides clear instructions to the model:
Creating the RAG Chain
Now, we construct the Retrieval-Augmented Generation (RAG) chain. This chain orchestrates the retrieval of relevant documents based on the user query, formats the query and retrieved documents into a prompt, feeds it to the language model, and processes the model's response:
Extracting Numbers from Response
A helper function to extract numerical values from the generated response:
Tokenizing Sentences
Another helper function to split the response into sentences for further analysis:
Evaluating the RAG Chain with BeyondLLM Metrics
Context Relevancy
This function assesses how relevant the retrieved context is to the given query:
Answer Relevancy
This function evaluates how relevant the generated answer is to the given query:
Groundedness
This function assesses how grounded the generated answer is in the provided context:
Example Usage
This will give us the following output:
This way, we combine BeyondLLM's evaluation capabilities with LangChain's RAG framework, effectively assess the quality of generated responses based on context relevancy, answer relevancy, and groundedness.
Last updated