Sentence Transformer Model Setup Guide

This guide covers the setup for the Sentence Transformers model, which maps sentences to a 384-dimensional vector space, useful for semantic search and clustering.

Installation Steps

  1. Install Git LFS:
  2. Clone the model repository: git clone https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2.
  3. Create a new folder for your project and navigate into it: cd your-folder-name.
  4. Create and activate a virtual environment:
  5. Install Sentence Transformers: pip install -U sentence-transformers.

Running main.py

Create a file named main.py inside your project folder. Copy and paste the provided script:

        
            from sentence_transformers import SentenceTransformer
            import numpy as np

            # Function to calculate cosine similarity
            def cosine_similarity(embedding1, embedding2):
                return np.dot(embedding1, embedding2) / (np.linalg.norm(embedding1) * np.linalg.norm(embedding2))

            # Initialize the model
            model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

            # User input for the primary sentence
            primary_sentence = input("Enter the primary sentence: ")

            # User input for the number of secondary sentences
            num_secondary_sentences = int(input("How many secondary sentences do you want to enter? "))

            # Initialize a list to hold secondary sentences
            secondary_sentences = []

            # Loop to get the secondary sentences
            for i in range(num_secondary_sentences):
                sentence = input(f"Enter secondary sentence {i+1}: ")
                secondary_sentences.append(sentence)

            # Embed the primary sentence
            primary_embedding = model.encode(primary_sentence)

            # Iterate over each secondary sentence, calculate, and print similarity
            for i, secondary_sentence in enumerate(secondary_sentences):
                secondary_embedding = model.encode(secondary_sentence)
                similarity = cosine_similarity(primary_embedding, secondary_embedding)
                print(f"Similarity score between \"{primary_sentence}\" and \"{secondary_sentence}\": {similarity:.2f}")
        
    

Run the script using python main.py or python3 main.py in your terminal.