R Resources

Artwork by @allison_horst

My short list of useful books, courses, and expert blogs. Many are free. These could be departure points for your own learning journey. My hope is that you take and copy sections for sharing with your own community of practice.

Yes, I am giving away all of the secrets. One of the things that I spend a lot of time thinking about is how to communicate about data science with other human beings. It is often too easy to focus on “just” the technical skills like being able to wrangle, explore, analyze, and visualize data. Describing, documenting, planning, collaborating, questioning, and communicating effectively is harder than it sounds. When team members each have a baseline of shared competencies, we work better together on the problems we face.

I re-visit the content of this list a few times a year as my work takes me further away from coding and more into defining how best to build production solutions. Many of you, too, will eventually be drawn into roles somewhere between educator, developer advocate, R admin, and devops engineer. Every R user also needs to have some exposure to a handful of Python libraries, YAML, DevOps and other tools that I mark with stars ⭐. Tell me about what is good here, and what was missed.

Inspired by these excellent pages:

Paul VanDerLaken’s R resources

Nathan Stephens’s Professional R Tooling and Integration

Oscar Baruffa’s Big Book of R

Martin Monkman’s Data Science with R: A Resource Compendium


The R Community

R is incredible software for statistics and data science. But while the bits and bytes of software are an essential component of its usefulness, software needs a community to be successful. And that’s an area where R really shines, as Shannon Ellis explains in this lovely ROpenSci blog post. For software, a thriving community offers developers, expertise, collaborators, writers and documentation, testers, agitators (to keep the community and software on track!), and so much more.

A brief list of sources that has helped me feel informed and included in the community:

R for Data Science Online Learning Community R4DS hosts interactive Slack channels (button in the upper right hand corner) for community news, book clubs, meetup events, birds-of-a-feature groups, and career tips
bookdown.org Bookdown is an open source R-package that makes writing and publishing technical books easy. This website is a collection of recent books.
#TidyTuesday R4DS’s weekly data project aimed at creating opportunities to develop understanding in how to summarize data to make meaningful charts. You will find the hashtag in use by the community over on Twitter.
Data Science StreamRs Several data science professionals sharing their knowledge via video stream

Introductory Books

Every one of us starts our journey from somewhere.

R for Data Science by Hadley Wickham and Garrett Grolemund is an excellent introduction to the Tidyverse. The R for Data Science Online Learning Community hosts book club-style weekly chapter discussions through their Slack channels. Vebash Naidoo has assembled a related solutions guide supplement. R for Data Science
R Cookbook by JD Long, Paul Teetor and R-Cookbook is full of how-to recipes, each of which solves a specific problem. The recipe includes a quick introduction followed by a discussion that unpacks the solution and provides insight into how it works. R Cookbook, 2nd Edition

Other books for those starting from Excel and basic statistics:

Data Science: A First Introduction by Tiffany-Anne Timbers, Trevor Campbell, and Melissa Lee
R Programming for Data Science by Roger Peng
YaRrr! The Pirate’s Guide to R by Nathaniel D Phillips
ModernDive Statistical Inference via Data Science by Chester Ismay and Albert Kim
Learning Statistics with R by Danielle Navarro
Statistical Inference via Data Science (2018) by Chester Ismay and Albert Y. Kim
Answering questions with data by Matt Crump
Statistical Thinking for the 21st Century by Russ Poldrack

Read the basic R manual, or at least the early chapters. It’s not as well written as more modern documentation, but it is important in being able to understand the tenents of legacy code.


Online Courses

Many of us want to try before we buy. Making the investment of time into reading a book (even a free one) could still be too much to ask. These materials, some even with interactive notebooks and project examples, may be a better path for you.

Kaggle An online community of data scientists and machine learning practitioners, now part of Google.
Linkedin Learning: Fred Nwanganga A sequence of introductory ⭐ Python ML video courses
Linkedin Learning: Keith McCormick Look for Keith’s Data Mining, CRISP-DM, and executive training materials
The R-Studio Cloud Primers
Data Science in a Box
STAT545 by Jenny Bryan
Tidy Data Science with the Tidyverse and Tidymodels by Jake Thompson
Dublin City University R Tutorials
R Programming by Roger Peng
Statistical Computing with R Programming Language: a Gentle Introduction by Max Reuter and Chris Barnes
Happy Git and GitHub for the useR by Jenny Bryan
STAT 447 : Data Science Programming Methods by Dirk Eddelbuettel
Contribute to an open-source project on GitHub Microsoft
A Very Short Course on Time Series Analysis by Roger Peng
Learn R by R-Exercises
YaRrr! The Pirate’s Guide to R (Video) by Nathaniel D Phillips
Johns Hopkins Chromebook Data Science (CBDS)
University of Cincinnati Introduction to R by Bradley Boehmke
University of Cincinnati Intermediate R by Bradley Boehmke

Data Visualization

If your toolkit has been limited to Powerpoint and Excel, you might not yet appreciate that there is so much more to crafting effective visual communication materials. In addition to the learning resources listed here, look for the recent books by Alberto Cairo and David McCandless at your local library.

Data Visualization, a Practial Introduction in R by Kieran Healy
ggplot2: Elegant Graphics for Data Analysis the online version of work-in-progress 3rd edition by Hadley Wickham, Danielle Navarro, and Thomas Lin Pedersen
Hands-On Data Visualization by Jack Dougherty and Ilya Ilyankou
Custom fonts and plot quality with ggplot on Windows by W R Chase
How to Create BBC Style Graphics
R Graph Gallery
Fundamentals of Data Visualization by Claus Wilke
Exploratory Data Analysis & Visualization by Zach Bogart and Joyce Robbins
The Complete ggplot2 Tutorial
flowingdata.com
How to standardize group colors in data visualizations in R by Paul van der Laken

Empirical Bayes

In this little book David Robinson introduces a powerful tool for handling uncertainty across observations. It teaches both the math behind these and the code that you can adapt to your own data through the detail of a single case study: batting averages in baseball. He wrote it for people (like me) who need to understand and apply methods, but would rather work with real data than face down pages of equations.

So why is Empirical Bayes worth learning? These methods are especially well suited for many modern applications of data science.

Introduction to Empirical Bayes


Shiny

Shiny is an R package that makes it easy to build interactive web apps for non-programmers. You can host standalone apps on a webpage or embed them or build dashboards to be served from a cloud facility. You can also extend Shiny apps with CSS themes, htmlwidgets, and JavaScript actions.

Mastering Shiny Mastering Shiny
Engineering Production-Grad Shiny Apps by Colin Fay, Sébastien Rochette, Vincent Guyader, & Cervan Girard on the {golem} package
R Views Enterprise-ready dashboards with Shiny and Databases
R Shiny upgrade packages
Advanced Shiny Tips and Tricks by Dean Attali

RMarkdown and Quarto

R Markdown is a file format for building dynamic documents that contain both code and document text. The package has been extended to provide systems for authoring and publishing books, presentations, blogs, and dashboards.

Quarto is a newer open-source, multi-language scientific and technical publishing system built on Pandoc. Like R Markdown, Quarto uses Knitr to execute code.

R Markdown Cookbook
Rmarkdown: The Definitive Guide by Yihui Xie, JJ Allaire, and Garrett Grolemund
RMarkdown Tips and Tricks by Richard Hanna
Blogdown: Creating Websites with R Markdown by Yihui Xie, Amber Thomas and Alison Presmanes Hill

NLP: Text Mining, Document Classification, Sentiment Analysis, and Topic Modeling

Supervised Machine Learning for Text Analysis in R by Julia Silge and Emil Hvitfeldt
Text mining by Julia Silge and David Robinson
Learn Tidytext by Julia Silge
Text as Data: An Overview by Ken Benoit
word2vec in R Belgium Network of Open Source Analytical Consultants
Learn Regular Expressions

More Advanced Books

Hands-On Programming with R by Garrett Grolemund
Advanced R (Wickham, 2018) by Hadley Wickham
R Packages by Hadley Wickham and Jenny Bryan
Efficient R programming by Colin Gillespie and Robin Lovelace
Advanced R by Bradley Boehmke
A data.table and dplyr tour

Machine Learning

Mastery of the wide array of Machine Learning techniques in real business contexts requires a broad and deep study. Long-time engineers like me often make the mistake of skipping ahead to the Kaggle award winning algorithms and the CV buzzwords. We’re smart. Just wing it, right? This is a bad idea.

The leading edge of thinking in many ML areas is changing rapidly. These are just a starting point:

Data Science for Business by Provost and Fawcett ( 2013 O’Reilly)
Machine Learning Engineering in Action Ben Wilson
Designing Machine Learning Systems Chip Huyen
Reliable Machine Learning Cathy Chen, Niall Murphy, Kranti Parisa, D. Sculley, Todd Underwood
Introduction to Computational Thinking and Data Science Grimson, Guttag, and Bell
Tidy Modeling with R by Max Kuhn and Julia Silge
Feature Engineering and Selection: A Practical Approach for Predictive Models by Max Kuhn and Kjell Johnson
An Introduction to Statistical Learning 2nd edition
An Introduction to Statistical Learning Labs tidymodels examples
Practical Machine Learning in R by Nwanganga and Chapple (2020) reviewed here.

Find other materials on AI ethics in your specific working domain and be prepared to consider the impacts of validating and generalizing your work. No computing tool does this for you by itself, even if it claims to be automatic.

Machine learning modeling frameworks offer streamlined solutions for pre-processing, scoring, and publishing models. The most popular is arguably scikit-learn over in the Python world. There are also fully supported proprietary systems available from SAS and Mathworks, at a cost. The most popular deep learning neural-net frameworks at this point in time are Tensorflow and Torch, with interfaces from several programming languages. In R the widely used frameworks are caret, tidymodels, and mlr3. Any machine learning solution put into production will require proper orchestration and monitoring to assure delivery to the enterprise’s service level requirements.

Building Machine Learning Powered Applications: Going from Idea to Product by Emmanuel Ameisen
The Phoenix Project by Gene Kim, Kevin Behr, and George Spafford
MLOPS with R: An end-to-end process for building machine learning applications on Azure
Interpretable Machine Learning
Deep Learning

Geospatial

Map of travel-times to trauma centers by Census tract in Iowa

Analyzing US Census Data: Methods, Maps, and Models in R The goal of this book, by Kyle Walker is to illustrate the utility of R to prepare, map, and present data products
Map Plots Created With R And Ggmap by Laura Ellis
Making Maps with R by Eric Anderson for (NOAA/SWFSC)
Spatial Data Science by Edzer Pebesma and Roger Bivand
Introduction to Spatial Data Programming with R by Michael Dorman
Predictive Soil Mapping with R by Tomislav Hengl and Robert A. MacMillan
Reproducible road safety research with R
Spatial Data Science by Angela Li
The GDSL Big List of Teaching Links University of Liverpool
Geocomputation with R by Robin Lovelace
another collection of geospatial sources sshuair

Meetups, Blogs, and User Groups

A seasoned useR group organizer reminded me recently that a non-trivial amount of effort is required to organize meetups, find speakers, get locations (non-covid times), market the meetups, etc. One way we all can help is to volunteer to speak, volunteer a location, help market, farm for speakers, etc. There is a moral motivation in the open source R community to lift one another up and recognize efforts. The software is “free” as in speech, not as in beer. If you learn something useful and encounter a buy me coffee button, be sure to offer a cup of java to the presenter in return.

R Views the RStudio blog
TidyTuesday Tweets curated by Silvia Canelón
Simply Statistics by Rafa Irizarry, Roger Peng, and Jeff Leek
R User Group Meetups Worldwide ~89 chapters sponsored by the R Consortium
themockup by Thomas Mock
Win Vector blog by Dr. John Mount and Dr. Nina Zumel
RevolutionAnalytics by David Smith
Business Science Blog by Matt Dancho
Julia Silge
Variance Explained by David Robinson
Ken Benoit on Text and NLP
Data Imaginist by Thomas Lin Pederson on ggplot2
Alison Hill by Alison Hill on education
Thomas Neitmann Biostatistics
Edward Visel by Edward Visel
Plant out of Place by Andrew Kniss on agriculture
Ag Data News by Aaron Smith
AllYourBayes by Domenic Di Francesco
Data Meets Narrative by Rebecca Barter
Jake Rozran Learns Data Science by Jake Rozran
RCrastinate by Sascha Wolfer
Colin Fraser
rgeomatic geospatial topics
Arnaud Amsellem for quantitative finance
Ted Laderas on Bioinformatics and Computational Biology
r4stats.com
R-bloggers
Karl Broman by Karl Broman on Biostatistics
Bruno Rodrigues on Education Statistics
Abdul Majed by Abdul Majed

Did you find this page helpful? Consider sharing it 🙌

Engineer and analyst