Where: Woodburn Hall 200, IU
When: 02/22/20 10:30am-4pm
The Machine Learning 101 crash course is open to anyone who is curious about Machine Learning and has a limited or no prior experience with python.
PART 1: Jupyter Notebook
Jupyter Notebook (https://jupyter.org/) is a widely used open-source platform for creating interactive code, visualization, and documents.
You will learn how to 1) perform basic data description and visualization, 2) run missing data analysis and imputation, and 3) prepare data for Machine Learning (splitting into training and testing sets).
PART 2: Machine Learning Concepts
You will review the ML terminology and common techniques (classification, clustering, and regression).
Source: Common Machine Learning Techniques – scikit-learn.
Exploring Google Collaboratory notebook (adapted from Tairi Delgado. 2018. Hands-On Data Analytics for Beginners with Google Colaboratory) [Download Link]
Exploring Your Data notebook [Download Link]
Data: Iris csv [Download Link]
Slides – [Download Link]
Notebook CheatSheet [Download Link]
Venue: Luddy Hall 4012 02/17/20 4pm-5pm EST
Zoom link: https://iu.zoom.us/j/581940047
Not all data comes in nicely structured relational databases and tables, and not all problems are best approached with statistics and machine learning. Many problems naturally lend themselves to graph approaches… (Jason Widjaja, 2019)
In this hands-on workshop, you will learn basic concepts of network analysis, gather network data from Wikipedia and build the network visualization using Gephi, an open source platform for network.
Slides [Download pdf]
Venue: Woodburn 200, IU, Bloomington
Time: 02/01/20 10:30am-5pm EST
In this crash course we will focus on Markdown using RStudio and R, however the learned skills can be easily applied to other applications (Jupyter notebook, readme files and wiki pages in GitHub). We will start with R basics – we will work with actual data to (data import, processing, visualization, descriptive stats). Then we will learn how to combine our code and markdown to create and publish a written document.
We will be using the following materials:
Knowles, Thea. 2019. Dissertating with RMarkdown and Bookdown. R-Ladies. Available at https://bookdown.org/thea_knowles/dissertating_rmd_presentation/
Grolemund, Garrett and Hadley Wickham. 2017. R for Data Science. O’Reilly. Available at https://r4ds.had.co.nz/
If a picture is worth a thousand words, interactive data visualization with Shiny web apps must be worth millions [Anonymous].
Venue: University of Cincinnati, Langsam Library 480
Date: 11/28 1:00-3:00pm EST
In this hands-on workshop you will learn about interactive web applications built with R and Shiny, explore Shiny widgets, and create and deploy your first web app.
- Shiny Gallery – link
- Show Me Shiny – link
- Example of Interactive Dashboard – link
Venue: Kelley School of Business K303 10am-12pm 10/26/18
In this workshop, we will look at the basic statistics, scatterplots, boxplots and time series plots.
The materials for the workshop [link to the folder on IUBOX]:
- Slides [ppt and pdf]
Recommended reading: Phillips, N. (2018). YaRrr! The Pirate’s Guide to R. Available at https://bookdown.org/ndphillips/YaRrr/
Recommended courses: Time Series in R – DataCamp
Date and Venue: New York, CUNY, October 18, 2-4pm
- Language Variation Suite – web application – http://languagevariationsuite.com
- Slides – https://www.slideshare.net/obscrivn/workshop-nwav-47-lvs-tool-for-quantitative-data-analysis
Time: 04/12/18 1:30pm EST – 3:00pm EST
Venue: Woodburn Hall, 200 (SSRC Grand Hall)
In the format of hands-on session, this workshop will introduce participants to the Language Variation Suite (LVS), a user-friendly interactive web application built in R. LVS provides access to advanced statistical methods and visualization techniques, such as mixed-effects modeling, conditional and random tree analyses, cluster analysis. These advanced methods enable researchers to handle imbalanced data, measure individual and group variation, estimate significance, and rank variables according to their significance.
- Categorical data csv – Use of R in New York (Labov 1966)
- Continuous data csv – Intervocalic /d/ (Díaz-Campos et al. 2016)
- Language Variation Suite
This workshop covers R basics from learning about RStudio architecture to creating your own graphs with ggplot.
- Slides Introduction
- CSV file – Movie_metadata.csv (IU BOX)
- R script – intro.r (IU BOX)
- R script – plotting.r (IU BOX)
“The purpose of visualization is insight, not pictures”
(Ben Schneidermen, 1999)
Information visualization affords new opportunities for corpus linguistics. In addition to interpretable data synthesis (Keim et al., 2006), visualization allows researchers to unveil linguistic patterns through data exploration and discovery. Until recently, the full integration of visual analytics into corpus tools was not feasible. For example, web-based corpora (e.g., COCA and BNC) were limited to pre-defined text collections and functionalities, whereas software applications were mainly built for a specific purpose (e.g., AntConc – concordances, TigerSearch – syntactic query). The recent development of Shiny web framework makes it possible to integrate visualization tools into the corpus analysis. Shiny is a reactive system allowing for interactive data analysis and visualization. Built with R, Shiny web framework also provides access to advanced text mining and quantitative algorithms, thus advancing corpus linguistics studies. In this workshop, you will learn the fundamentals of reactive web framework and visual analytics for corpus analysis.
Venue: Corpora2017 Conference at Saint Petersburg Russia, June 27-30, 2017
Acknowledgment: this workshop is partially sponsored by Cyberinfrastructure for Network Science Center