This guide covers the setup for the Sentence Transformers model, which maps sentences to a 384-dimensional vector space, useful for semantic search and clustering.
brew install git-lfs
.git clone https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
.cd your-folder-name
.python -m venv .env
then source .env/bin/activate
.python -m venv .env
then .env\Scripts\activate
.pip install -U sentence-transformers
.Create a file named main.py
inside your project folder. Copy and paste the provided script:
from sentence_transformers import SentenceTransformer
import numpy as np
# Function to calculate cosine similarity
def cosine_similarity(embedding1, embedding2):
return np.dot(embedding1, embedding2) / (np.linalg.norm(embedding1) * np.linalg.norm(embedding2))
# Initialize the model
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
# User input for the primary sentence
primary_sentence = input("Enter the primary sentence: ")
# User input for the number of secondary sentences
num_secondary_sentences = int(input("How many secondary sentences do you want to enter? "))
# Initialize a list to hold secondary sentences
secondary_sentences = []
# Loop to get the secondary sentences
for i in range(num_secondary_sentences):
sentence = input(f"Enter secondary sentence {i+1}: ")
secondary_sentences.append(sentence)
# Embed the primary sentence
primary_embedding = model.encode(primary_sentence)
# Iterate over each secondary sentence, calculate, and print similarity
for i, secondary_sentence in enumerate(secondary_sentences):
secondary_embedding = model.encode(secondary_sentence)
similarity = cosine_similarity(primary_embedding, secondary_embedding)
print(f"Similarity score between \"{primary_sentence}\" and \"{secondary_sentence}\": {similarity:.2f}")
Run the script using python main.py
or python3 main.py
in your terminal.