📐Finetune Embeddings

Beyondllm lets you fine-tune embedding models on your own data to achieve more accurate and better results. You can fine-tune any model available on the Hugging Face

Step 1 : Importing Modules

You need an LLM to generate QA pairs for fine-tuning and FineTuneEmbeddings module to fine-tune the model.

from beyondllm.llms import GeminiModel
from beyondllm.embeddings import FineTuneEmbeddings

# Initializing llm
llm = llms.GeminiModel()

# calling the finetuning engine
fine_tuned_model = FineTuneEmbeddings()

Step 2 : Data to FineTune

You need data to fine-tune your model, It could be 1 or more files so you need to make a list of all the files you want to train your model on.

list_of_files = ['your-file-here-1', 'your-file-here-2']

Step 3 : Training the Model

Once everything is ready you start training by using the train function in FineTuneEmbeddings.

Parameters:

  • Files : The list of files you want to train your model on.

  • Model name : The model you want to fine-tune.

  • LLM : Language model to generate the dataset for fine-tuning.

  • Output path : The path where your embedding model will be saved.

# Training the embedding model
embed_model = fine_tuned_model.train(list_of_files, "BAAI/bge-small-en-v1.5", llm, "fintune")

(Optional) Step 4 : Loading the model

Optionally, If you have already fine-tuned your model and utilize it again, you can do so with the load_model function

Parameters:

  • Path : The path where you saved the model after fine-tuning

# Option to load an already fine-tuned model
embed_model = fine_tuned_model.load_model("fintune")

Step 5 : Voila, Use your embedding model

Setup your retriever using the fine-tuned model and use it in your use case.

retriever = retrieve.auto_retriever(data, embed_model, type="normal", top_k=4)

Last updated