Seminar on ITMS and LVS: Quantitative Methods and Text Mining

The main objective of this workshop is to introduce researchers to user-friendly analytical tools. ITMS and LVS are two web-based tools for visualization and quantitative analysis.  In contrast to existing software programs (e.g., SAS, SPSS, and Tableau), these two applications are built in R and require no installation or programming skills.

This hands-on workshop will provide an overview of available statistical and text-mining techniques in these tools. You will learn how to import csv, text and pdf files, create plots, and run statistical analysis, including conditional trees and random forest tests. You will also learn about natural language pre-processing techniques, such as stopwords removal and stemming. Finally, you will be able to perform topic modeling and cluster analysis.

Part 1: Quantitative Methods – Language Variation Suite LVS slides

Part 2: Text Mining Methods – Interactive Text Mining Suite ITMS slides

Workshop exercice materials:

  • sample of categorical data csv file – link
  • sample of continuous data csv file – link

Interactive Topic Modeling – ITMS

Topic modeling refers to an algorithm that explains “an observed corpus with a small set of distributions over terms” and “models for uncovering underlying semantic structure of a document collection”  (Blei et al. 2003, Blei et al. 2009, Blei 2012). Several algorithms have been put forth to build a probabilistic topic model, e.g  mixture-of-unigram (Nigam et al. 2000), Latent Semantic Indexing (Deerwester et al. 1990; Hofmann 1999) and Latent Dirichlet Allocation LDA (Blei et al. 2003). For more information, see Matthew Jockers and David Blei.

Interactive Text Mining Suite applies various LDA algorithms (topicmodels, lda and stm R packages). In addition, it allows users interactively choose number of topics, iterations and select the best models.

Screen shot 2016-03-18 at 1.46.42 PMScreen shot 2016-03-18 at 1.48.48 PM

We  welcome suggestions and feedback.