➕How to add a new Loader?
In building a RAG pipeline, the initial phase involves sourcing data from various origins and preparing it for usability. This process comprises two key steps: firstly, loading the data, and subsequently, splitting or chunking it for effective handling. To incorporate a new loader, adhere to these three common practices:
Identify and define the specific type of loader using the llama index module.
Configure the parameters of the loader accordingly.
Utilize the fit function for subsequent data processing tasks.
Here's an example of how to add a new LLM, for your Notion Pages.
Note: Each Loader has its own documentation. We should refer to their documentation to learn how to use them.
Configure Parameters
Incorporating a new loader into the RAG pipeline requires consideration of the necessary configurations and user inputs. To achieve this, we define a dataclass
that encapsulates the parameters required for configuring the loader. Within the load function, we typically initialize the loader, ensuring its readiness for subsequent operations. Additionally, if the loader necessitates retrieving a secret token from an environment variable, such configuration can be seamlessly handled within the dataclass
. This standardized format ensures consistency across various loaders, such as urlLoader
and youtubeLoader
.
Initialize the loader
The load
function in the Enterprise RAG utilizes the llama index loaders. Here, in this case, it is the NotionPageReader
that is being used.
Split the Document
The split
method divides the loaded document into smaller chunks based on specified size and overlap parameters, allowing efficient processing.
Implement the Loader
This method combines all the different methods within the dataclass and uses the base implementation to execute the loader.
Last updated