The dictionary of embeddings.
Optional
options: { [default=2] - The number of dimensions for UMAP output.
[default=15] - The number of nearest neighbors for UMAP.
[default=0.1] - The minimum distance parameter for UMAP.
An array of plot data points.
UMAP: Convert Embeddings to 2D or 3D Graph
UMAP (Uniform Manifold Approximation and Projection) is a dimensionality reduction technique that takes high-dimensional embeddings and converts into lower-dimensional coordinates for visualization.
Input: The process starts with high-dimensional embeddings. These could be word embeddings, image feature vectors, or any other type of high-dimensional data representation.
Dimensionality reduction: UMAP algorithmically reduces the number of dimensions while trying to preserve the structure of the data. It typically reduces the data to 2 or 3 dimensions for easy visualization.
Topological approach: UMAP uses concepts from topological data analysis and manifold learning to perform this reduction. It constructs a high-dimensional graph representation and then optimizes a low-dimensional layout to be as similar as possible.
Output: The result is a set of 2D or 3D coordinates for each input embedding. These can be plotted on a scatter plot, where each point represents an original high-dimensional datapoint.
Preservation of structure: UMAP aims to keep similar items close together and dissimilar items far apart in the lower-dimensional space, preserving both local and global structure of the data.
Visualization: The resulting UMAP coordinates can reveal clusters, patterns, and relationships in the data that were not easily visible in the original high-dimensional space.
Understanding UMAP
UMAP Algorithm Overview