Function convertEmbeddingsToUMAP

  • UMAP (Uniform Manifold Approximation and Projection) is a dimensionality reduction technique that takes high-dimensional embeddings and converts into lower-dimensional coordinates for visualization.

    1. Input: The process starts with high-dimensional embeddings. These could be word embeddings, image feature vectors, or any other type of high-dimensional data representation.

    2. Dimensionality reduction: UMAP algorithmically reduces the number of dimensions while trying to preserve the structure of the data. It typically reduces the data to 2 or 3 dimensions for easy visualization.

    3. Topological approach: UMAP uses concepts from topological data analysis and manifold learning to perform this reduction. It constructs a high-dimensional graph representation and then optimizes a low-dimensional layout to be as similar as possible.

    4. Output: The result is a set of 2D or 3D coordinates for each input embedding. These can be plotted on a scatter plot, where each point represents an original high-dimensional datapoint.

    5. Preservation of structure: UMAP aims to keep similar items close together and dissimilar items far apart in the lower-dimensional space, preserving both local and global structure of the data.

    6. Visualization: The resulting UMAP coordinates can reveal clusters, patterns, and relationships in the data that were not easily visible in the original high-dimensional space.

    Understanding UMAP
    UMAP Algorithm Overview

    Parameters

    • embeddingsDict: {}

      The dictionary of embeddings.

      • Optionaloptions: {
            numberDimensions: number;
            numberNeighbors: number;
            numberDistance: number;
        } = {}
        • numberDimensions: number

          [default=2] - The number of dimensions for UMAP output.

        • numberNeighbors: number

          [default=15] - The number of nearest neighbors for UMAP.

        • numberDistance: number

          [default=0.1] - The minimum distance parameter for UMAP.

      Returns Promise<PlotDataPoint[]>

      An array of plot data points.