Machine Learning 101 with Jupyter Notebook

Where: Woodburn Hall 200, IU

When: 02/22/20 10:30am-4pm

The Machine Learning 101 crash course is open to anyone who is curious about Machine Learning and has a limited or no prior experience with python.


PART 1: Jupyter Notebook

Jupyter Notebook ( is a widely used open-source platform for creating interactive code, visualization, and documents.

You will learn how to 1) perform basic data description and visualization, 2) run missing data analysis and imputation, and 3) prepare data for Machine Learning (splitting into training and testing sets).

PART 2: Machine Learning Concepts

You will review the ML terminology and common techniques (classification, clustering, and regression).

Screen Shot 2020-02-21 at 11.42.06 PM

Source: Common Machine Learning Techniques – scikit-learn.


Jupyter Notebooks:

Exploring Google Collaboratory  notebook (adapted from Tairi Delgado. 2018. Hands-On Data Analytics for Beginners with Google Colaboratory) [Download Link]

Exploring Your Data notebook [Download Link]

Data: Iris csv [Download Link]

Slides  – [Download Link]

Notebook CheatSheet [Download Link]


Guest Lecture: Introduction to Gephi for Data Science

Venue: Luddy Hall 4012 02/17/20 4pm-5pm EST

Zoom link:

Not all data comes in nicely structured relational databases and tables, and not all problems are best approached with statistics and machine learning. Many problems naturally lend themselves to graph approaches… (Jason Widjaja, 2019)

In this hands-on workshop, you will learn basic concepts of network analysis, gather network data from Wikipedia and build the network visualization using Gephi, an open source platform for network.

Screen Shot 2020-02-14 at 12.17.24 AM


Slides [Download pdf]


Faculty Accelerator Crash Course: Rmarkdown with R Introduction

Venue: Woodburn 200, IU, Bloomington

Time: 02/01/20 10:30am-5pm EST

In this crash course we will focus on Markdown using RStudio and R, however the learned skills can be easily applied to other applications (Jupyter notebook, readme files and wiki pages in GitHub). We will start with R basics – we will work with actual data to (data import, processing, visualization, descriptive stats). Then we will learn how to combine our code and markdown to create and publish a written document.

We will be using the following materials:






Knowles, Thea. 2019. Dissertating with RMarkdown and Bookdown. R-Ladies. Available at

Grolemund, Garrett and Hadley Wickham. 2017. R for Data Science. O’Reilly. Available at

Introduction to Interactive Shiny Web Applications

If a picture is worth a thousand words, interactive  data visualization with Shiny web apps must be worth millions [Anonymous].

Venue: University of Cincinnati, Langsam Library 480

Date: 11/28 1:00-3:00pm EST

In this hands-on workshop you will learn about interactive web applications built with R and Shiny, explore Shiny widgets, and create and deploy your first web app.


Useful Links:

  • Shiny Gallery – link
  • Show Me Shiny – link
  • Example of Interactive Dashboard – link


Introduction to R for Business Analytics

Venue: Kelley School of Business K303 10am-12pm 10/26/18

In this workshop, we will look at the basic statistics, scatterplots, boxplots and time series plots.

Screen Shot 2018-10-25 at 3.18.15 PM

The materials for the workshop [link to the folder on IUBOX]:

  1. Slides [ppt and pdf]
  2. pirates.csv
  3. eustockmarkets.csv
  4. kelley303.r

Recommended reading: Phillips, N. (2018).  YaRrr! The Pirate’s Guide to R. Available at

Recommended courses: Time Series in R – DataCamp

NWAV47 Workshop: Language Variation Suite – Web Application for Quantitative Data Analysis

Date and Venue: New York, CUNY,  October 18, 2-4pm

Workshop Materials:

  1. Language Variation Suite – web application –
  2. Slides –
  3. Data:

Workshop: Optimizing (Socio-)linguistic Analysis: Language Variation Suite Toolkit

Time: 04/12/18 1:30pm EST – 3:00pm EST

Venue: Woodburn Hall, 200 (SSRC Grand Hall)

In the format of hands-on session, this workshop will introduce participants to the Language Variation Suite (LVS), a user-friendly interactive web application built in R. LVS provides access to advanced statistical methods and visualization techniques, such as mixed-effects modeling, conditional and random tree analyses, cluster analysis. These advanced methods enable researchers to handle imbalanced data, measure individual and group variation, estimate significance, and rank variables according to their significance.

Workshop files:

  1. Categorical data csv – Use of R in New York (Labov 1966)
  2. Continuous data csv – Intervocalic /d/ (Díaz-Campos et al. 2016)
  3. Language Variation Suite
  4. Slides


Workshop: Data Visualization for Corpus Linguistics via Shiny Framework

“The purpose of visualization is insight, not pictures”
(Ben Schneidermen, 1999)

Information visualization affords new opportunities for corpus linguistics. In addition to interpretable data synthesis (Keim et al., 2006), visualization allows researchers to unveil linguistic patterns through data exploration and discovery. Until recently, the full integration of visual analytics into corpus tools was not feasible. For example, web-based corpora (e.g., COCA and BNC) were limited to pre-defined text collections and functionalities, whereas software applications were mainly built for a specific purpose (e.g., AntConc – concordances, TigerSearch – syntactic query). The recent development of Shiny web framework makes it possible to integrate visualization tools into the corpus analysis. Shiny is a reactive system allowing for interactive data analysis and visualization. Built with R, Shiny web framework also provides access to advanced text mining and quantitative algorithms, thus advancing corpus linguistics studies. In this workshop, you will learn the fundamentals of reactive web framework and visual analytics for corpus analysis.

Workshop Materials:

Additional Sources:


Venue: Corpora2017 Conference at Saint Petersburg Russia, June 27-30, 2017

Acknowledgment: this workshop is partially sponsored by Cyberinfrastructure for Network Science Center