Skip to main content

embeddings-to-graph

ai-research-agent / similarity/embeddings-to-graph

Similarity

convertEmbeddingsToUMAP()

function convertEmbeddingsToUMAP(embeddingsDict, options?): Promise<PlotDataPoint[]>

UMAP: Convert Embeddings to 2D or 3D Graph

UMAP (Uniform Manifold Approximation and Projection) is a dimensionality reduction technique that takes high-dimensional embeddings and converts into lower-dimensional coordinates for visualization.

  1. Input: The process starts with high-dimensional embeddings. These could be word embeddings, image feature vectors, or any other type of high-dimensional data representation.

  2. Dimensionality reduction: UMAP algorithmically reduces the number of dimensions while trying to preserve the structure of the data. It typically reduces the data to 2 or 3 dimensions for easy visualization.

  3. Topological approach: UMAP uses concepts from topological data analysis and manifold learning to perform this reduction. It constructs a high-dimensional graph representation and then optimizes a low-dimensional layout to be as similar as possible.

  4. Output: The result is a set of 2D or 3D coordinates for each input embedding. These can be plotted on a scatter plot, where each point represents an original high-dimensional datapoint.

  5. Preservation of structure: UMAP aims to keep similar items close together and dissimilar items far apart in the lower-dimensional space, preserving both local and global structure of the data.

  6. Visualization: The resulting UMAP coordinates can reveal clusters, patterns, and relationships in the data that were not easily visible in the original high-dimensional space.

Understanding UMAP

UMAP Algorithm Overview

Parameters

ParameterTypeDescription

embeddingsDict

{}

The dictionary of embeddings.

options?

{ numberDimensions: number; numberDistance: number; numberNeighbors: number; }

options.numberDimensions?

number

[default=2] - The number of dimensions for UMAP output.

options.numberDistance?

number

[default=0.1] - The minimum distance parameter for UMAP.

options.numberNeighbors?

number

[default=15] - The number of nearest neighbors for UMAP.

Returns

Promise<PlotDataPoint[]>

An array of plot data points.

Author

McInnes et al. (2018)
Coenen et al. (2019)