OToPS/Data visualization

Data visualization

edit

Visualization is a way of getting a "picture" of what is going on with the data. Charts, graphs, and infographics are all examples of types of visualization.

We can group visualization methods based on the intended audience (internal or external), on how long it takes to make or read them (fast or labor intensive), and on how much information they present (simple or dense).

Exploratory visualization is a set of graphics designed to explore data quickly. They are fast, often rich and dense, and best suited for internal audiences -- the analyst, their team, or expert audiences. Little time needs to be spent preparing them; the goal is to work fast, get rapid understanding, and check assumptions, shape ideas, and detect patterns that might be worth modeling in more detail.

Explanatory visualization methods show the results of an analysis, and the best ones often are built with a story or key result in mind. They are a way of communicating with an external audience. Effective explanatory visualizations are built with the audience in mind, and ideally rehearsed or "focused grouped" with stakeholders like audience members ahead of time -- especially if the stakes of a successful presentation are high.

Most business software includes a lot of different formats for charts. Statistical packages include these as well as more specialized, less well know methods for exploration.

Visualizations in R

edit

R has a wide and deep range of options for visualization. Base R knows how to do the workhorse exploratory visualizations, such as histograms, box and whisker plots, and scatterplots. There are good introductions to exploratory data analysis (EDA) visualizations in R, including the swirl package, as well as books.

ggplot2, plotly, and other packages provide rich and power options for explanatory visualization, and additional options can build gifs and other animated visualizations. These can be elegant. They also can be quite involved to make, with a lot of code to write or debug. ggplot2 is intended to be a full "grammar of graphics" -- a set of language for telling the computer exactly what you want it to make the visualization look like.

A fast way of getting started with the more advanced options is to find an example that you like that shares the code, and then save a copy, and try to change pieces of it and see how they work, or connect it to a new data frame and replace the variable names in the example to match the new variables of interest.

psych::pairs.panels() is a visualization worth knowing

edit

Want to see a lot of data with a simple line of code? Check out pairs.panels, from the psych package by William Revelle. It delivers a super-graphic that combines scatterplots, histograms, nonlinear smoothing, measures of central tendency, and more into a single figure. If the data object is set up correctly, it is two "words" of code to build one, like "graph(this)". And in less than a second, it will deliver an astonishingly detailed picture. To really digest one will need some time and familiarity with the components -- scatterplots, histograms, and LOESS smoothing to interpret the pictures it builds with the default settings.