Sentence Transformer Model Setup Guide

This guide covers the setup for the Sentence Transformers model, which maps sentences to a 384-dimensional vector space, useful for semantic search and clustering.

Installation Steps

Install Git LFS:
- Mac: Use Homebrew - brew install git-lfs.
- Linux/Windows: Download from Git LFS website.
Clone the model repository: git clone https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2.
Create a new folder for your project and navigate into it: cd your-folder-name.
Create and activate a virtual environment:
- Mac/Linux: python -m venv .env then source .env/bin/activate.
- Windows: python -m venv .env then .env\Scripts\activate.
Install Sentence Transformers: pip install -U sentence-transformers.

Running main.py

Create a file named main.py inside your project folder. Copy and paste the provided script:

        
            from sentence_transformers import SentenceTransformer
            import numpy as np

            # Function to calculate cosine similarity
            def cosine_similarity(embedding1, embedding2):
                return np.dot(embedding1, embedding2) / (np.linalg.norm(embedding1) * np.linalg.norm(embedding2))

            # Initialize the model
            model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

            # User input for the primary sentence
            primary_sentence = input("Enter the primary sentence: ")

            # User input for the number of secondary sentences
            num_secondary_sentences = int(input("How many secondary sentences do you want to enter? "))

            # Initialize a list to hold secondary sentences
            secondary_sentences = []

            # Loop to get the secondary sentences
            for i in range(num_secondary_sentences):
                sentence = input(f"Enter secondary sentence {i+1}: ")
                secondary_sentences.append(sentence)

            # Embed the primary sentence
            primary_embedding = model.encode(primary_sentence)

            # Iterate over each secondary sentence, calculate, and print similarity
            for i, secondary_sentence in enumerate(secondary_sentences):
                secondary_embedding = model.encode(secondary_sentence)
                similarity = cosine_similarity(primary_embedding, secondary_embedding)
                print(f"Similarity score between \"{primary_sentence}\" and \"{secondary_sentence}\": {similarity:.2f}")

Run the script using python main.py or python3 main.py in your terminal.