This workshop covers R basics from learning about RStudio architecture to creating your own graphs with ggplot.
- Slides Introduction
- CSV file – Movie_metadata.csv (IU BOX)
- R script – intro.r (IU BOX)
- R script – plotting.r (IU BOX)
“The purpose of visualization is insight, not pictures”
(Ben Schneidermen, 1999)
Information visualization affords new opportunities for corpus linguistics. In addition to interpretable data synthesis (Keim et al., 2006), visualization allows researchers to unveil linguistic patterns through data exploration and discovery. Until recently, the full integration of visual analytics into corpus tools was not feasible. For example, web-based corpora (e.g., COCA and BNC) were limited to pre-defined text collections and functionalities, whereas software applications were mainly built for a specific purpose (e.g., AntConc – concordances, TigerSearch – syntactic query). The recent development of Shiny web framework makes it possible to integrate visualization tools into the corpus analysis. Shiny is a reactive system allowing for interactive data analysis and visualization. Built with R, Shiny web framework also provides access to advanced text mining and quantitative algorithms, thus advancing corpus linguistics studies. In this workshop, you will learn the fundamentals of reactive web framework and visual analytics for corpus analysis.
Venue: Corpora2017 Conference at Saint Petersburg Russia, June 27-30, 2017
Acknowledgment: this workshop is partially sponsored by Cyberinfrastructure for Network Science Center
“The impact of data scientists’ work depends on how well others can understand their insights to take further actions” (blog)
In this workshop, I will introduce you to the concept of Declarative Reactive Web Frameworks, allowing for interactive user-friendly data visualization and data analytics, particularly Shiny. Shiny is an R package that creates interactive applications for data visualization.
You will learn some Shiny basics: how to build your reactive app and deploy it to the server.
Credits: Some ideas are based on the great tutorial by Dean Attali.
The main objective of this workshop is to introduce researchers to user-friendly analytical tools. ITMS and LVS are two web-based tools for visualization and quantitative analysis. In contrast to existing software programs (e.g., SAS, SPSS, and Tableau), these two applications are built in R and require no installation or programming skills.
This hands-on workshop will provide an overview of available statistical and text-mining techniques in these tools. You will learn how to import csv, text and pdf files, create plots, and run statistical analysis, including conditional trees and random forest tests. You will also learn about natural language pre-processing techniques, such as stopwords removal and stemming. Finally, you will be able to perform topic modeling and cluster analysis.
Part 1: Quantitative Methods – Language Variation Suite LVS slides
Part 2: Text Mining Methods – Interactive Text Mining Suite ITMS slides
Workshop exercice materials: zip file
Language Variation Suite has been released with more customizable features for language variation analysis.
New features include:
Plot Customization – titles, labels, colors
Redesigned user-friendly interface
Tuning parameters for cluster analysis
Do not hesitate to contact if you have any issues or if you like to request new features.
ITMS – Interactive Text Mining Suite ITMS is a web application for text analysis. This application offers the computational and statistical power of R and the Shiny web application interactivity.
The new release includes the following features:
Contributors: Jefferson Davis, Irina Trapido and Jay Lee
As always, please do not hesitate to contact if you have any issues or to request new features!
Our workshop Optimizing Language Variation Analysis: Language Variation Suite is held on 11/03/16 at Simon Fraser University, Vancouver, Canada!
“Mastery of quantitative methods is increasingly becoming a vital component of linguistic training” (Johnson 2008:1)
“The science of analytical reasoning facilitated by visual interactive interfaces” (Thomas et al. 2005)
Please come and learn how to perform advanced statistical methods with a user-friendly interactive toolkit for (socio)linguistic analysis. Do not hesitate to contact us if you have any questions and suggestions (obscrivn AT indiana DOT edu)
Olga, Manuel and Rafael
LVS provides three types of model comparison (LRT, AIC, and BIC) using the package MASS. The stepwise regression uses both directions (step up and step down) and selects the best model (best predictors).
All three criteria assess model fit. LRT is based on log likelihood ratio (k = qchisq(1-p, df=1), where for p=0.05, k = 3.84). For more information on AIC ( Akaike Information Criterion ) and BIC (Bayesian information criterion) – see http://www.jmp.com/support/help/Likelihood_AICc_and_BIC.shtml.
Steps to perform stepwise regression in LVS:
As always, your feedback and suggestions are greatly appreciated! (LVS Team)
Language Variation Suite has added a Varbrul analysis. Varbrul is “an implementation of logistic regression that is used by many sociolinguists” (Keith Johnson, 2008, 174). At present LVS calculates Varbrul weights for a binary dependent variable and categorical independent variables. Varbrul output format is based on chapter 5.7 (K.Johnson,2008): inverse logit (inv.logit) is used from the package gtools. Contrasts option in logistic regression is set to contr.sum. For a binary variable, the calculation is inv.logit(coeficient*1) and inv.logit(coeficient*-1), which outputs weights for two values (e.g. men and women).
Shiny application allows for an interactive quantitative analysis and it is based on R programming language. Since this toolkit is still under development, we will greatly appreciate its evaluation, comments, and feedback!