Function extractTopicTermGroupsLDA

  • Latent Dirichlet (pronounced Dee-ruesh-ley) allocation is used in natural language processing to discover abstract topics in a collection of documents. It is a generative probabilistic model that assumes documents are mixtures of topics, where a topic is a probability distribution over words. LDA uses Bayesian inference to simultaneously learn the topics and topic mixtures that occur around each other in an unsupervised manner.

    Latent Dirichlet Allocation (LDA) with Gibbs Sampling Explained
    Latent Dirichlet Allocation
    Topic Models (Youtube)

    Parameters

    • sentences: string[]

      Array of input sentences.

    • Optionaloptions: {
          topicCount: number;
          numberOfTermsPerTopic: number;
          alpha: number;
          beta: number;
          numberOfIterations: number;
          valueBurnIn: number;
          valueSampleLag: number;
      } = {}

      Configuration options for LDA.

      • topicCount: number

        default=10 - Number of topics to extract.

      • numberOfTermsPerTopic: number

        default=10 - Number of terms to show for each topic.

      • alpha: number

        default=0.1 - Dirichlet prior on document-topic distributions.

      • beta: number

        default=0.01 - Dirichlet prior on topic-word distributions.

      • numberOfIterations: number

        default=1000 - Number of iterations for the LDA algorithm.

      • valueBurnIn: number

        default=100 - Number of burn-in iterations.

      • valueSampleLag: number

        default=10 - Lag between samples.

    Returns any[]

    • Array of topics, each containing term-probability pairs.