similarity-vector
Documentation / vectorize/similarity-vector
Other​
calculateCosineSimilarity()​
function calculateCosineSimilarity(vectorA: number[], vectorB: number[]): number;
Defined in: vectorize/similarity-vector.js:165
Cosine similarity gets similarity of two vectors by whether they have the same direction (similar) or are poles apart. Cosine similarity is often used with text representations to compare how similar two documents or sentences are to each other. The output of cosine similarity ranges from -1 to 1, where -1 means the two vectors are completely dissimilar, and 1 indicates maximum similarity.
Parameters​
Parameter | Type | Description |
---|---|---|
|
| |
|
|
Returns​
number
-1 to 1 similarity score
Similarity​
convertTextToEmbedding()​
function convertTextToEmbedding(text: string, options?: object): Promise<{
embeddingsDict: {
};
embedding: number[];
}>;
Defined in: vectorize/similarity-vector.js:23
Text embeddings convert words or phrases into numerical vectors in a high-dimensional space, where each dimension represents a semantic feature extracted by a model like MiniLM-L6-v2. In this concept space, words with similar meanings have vectors that are close together, allowing for quantitative comparisons of semantic similarity. These vector representations enable powerful applications in natural language processing, including semantic search, text classification, and clustering, by leveraging the geometric properties of the embedding space to capture and analyze the relationships between words and concepts. Text Embeddings, Classification, and Semantic Search (Youtube)

Parameters​
Parameter | Type | Description |
---|---|---|
|
| The text to embed. |
| { | |
|
| The pipeline to use for embedding. |
|
| default=4 - The number of decimal places to round to. |
Returns​
Promise
<{
embeddingsDict
: {
};
embedding
: number
[];
}>
getEmbeddingModel()​
function getEmbeddingModel(options?: object): Promise<AutoTokenizer>;
Defined in: vectorize/similarity-vector.js:47
Initialize HuggingFace Transformers pipeline for embedding text.

Parameters​
Parameter | Type | Description |
---|---|---|
| { | |
|
| default "feature-extraction", |
|
| default="Xenova/all-MiniLM-L6-v2" - The name of the model to use |
Returns​
Promise
<AutoTokenizer
>
The pipeline. *
searchVectorIndex()​
function searchVectorIndex(
index: HierarchicalNSW,
query: string,
options?: object): Promise<object[]>;
Defined in: vectorize/similarity-vector.js:129
Searches the vector index for the nearest neighbors of a given query.


Parameters​
Parameter | Type | Description |
---|---|---|
|
| The HNSW index to search. |
|
| The query string to search for. |
| { | Optional parameters for the search. |
|
| The number of nearest neighbors to return. |
Returns​
Promise
<object
[]>
A promise that resolves to an array of nearest neighbors, each with an id and distance.
Throws​
If there's an error during the search process.
Example​
const index = await addEmbeddingVectorsToIndex(documentVectors);
const results = await searchVectorIndex(index, 'example query');
console.log(results); // [{id: 3, distance: 0.1}, {id: 7, distance: 0.2}, ...]
getAllEmbeddings()​
function getAllEmbeddings(index: HierarchicalNSW, precision: number): number[][];
Defined in: vectorize/similarity-vector.js:145
Retrieves all embeddings from the HNSW index.
Parameters​
Parameter | Type | Default value | Description |
---|---|---|---|
|
|
| The HNSW index containing the embeddings. |
|
|
| default=8 - The number of decimal places to round to. |
Returns​
number
[][]
An array of embedding vectors. *
weighRelevanceConceptVector()​
function weighRelevanceConceptVector(
documents: string[],
query: string,
options?: Object): Promise<object[]>;
Defined in: vectorize/similarity-vector.js:186
Rerank documents's chunks based on relevance to query, based on cosine similarity of their concept vectors generated by a 20MB MiniLM transformer model downloaded locally.
A Complete Overview of Word Embeddings
Parameters​
Parameter | Type | Description |
---|---|---|
|
| |
|
| |
|
|
Returns​
Promise
<object
[]>