This is the main navigation page for Unit 6 of the course Introduction to Statistical Analysis, developed using openly licensed materials from Saylor.org's Introduction to Statistics. Below you will find a full description of Unit 6 in general, as well as for each subunit. Follow the links within each subunit description to access particular topics, or proceed directly to the Unit 6 Content Page.

## UNIT 6: CORRELATION, REGRESSION, AND ANOVA

One of the main reasons we conduct analysis is in order to understand how two variables are related to one another. The most common type of relationship is a linear relationship. For example, we may want to know what happens to one variable when we increase or decrease the other variable. We want to answer questions such as, “does one increase as the other increases, or does it decrease?” For example, how does drinking soda relate to weight gain for teenagers? Does drinking more soda really relate to more weight gain? In this unit, you will learn to measure the degree of a relationship between two or more variables. Both correlation and regression are measures for comparing variables. However, they are quite different from one another. Correlation quantifies the strength of a relationship between two variables and is a measure of existing data. Regression, on the other hand, is the study of the strength of a linear relationship between an independent and dependent variable, and can be used to predict the value of the dependent variable when the value of the independent variable is unknown. A note of caution: Be careful to not automatically interpret correlation and regression as establishing cause-and-effect relationships!

Finally, you will learn about a method called “Analysis of Variance" (abbreviated ANOVA), which is used for hypothesis tests involving more than two averages. ANOVA is about examining the amount of variability in the y variable and trying to see where that variability is coming from. You will study the simplest form of ANOVA, called single factor or one-way ANOVA. Finally, you will briefly study the F distribution, used for ANOVA, and the test of two variances.

Time Advisory: This unit will take you 11 hours to complete.

### Learning Outcomes

Upon completion of this this unit, you will be able to:

• Discuss basic ideas of linear regression and correlation.
• Create and interpret a line of best fit.
• Calculate and interpret the correlation coefficient.
• Calculate and interpret outliers.
• Interpret the F probability distribution as the number of groups and the sample size change.
• Discuss two uses for the F distribution, ANOVA and the test of two variances.
• Conduct and interpret ANOVA.
• Conduct and interpret hypothesis tests of two variances (optional).

### Subunits

Unit six consists of three main topics:

## About the Resources in This Course

This course project draws upon three main types of resources:

The first are readings and video lectures from Barbara Illowsky and Susan Dean’s Collaborative Statistics, which is available freely under a Creative Commons Attribution 2.0 Generic (CC BY 2.0) license from the following location: http://cnx.org/content/col10522/latest/

The second type of resources in this course are lectures from Kahn Academy. These lectures are available under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) license. Kahn Academy has many lectures available from http://www.khanacademy.org/

Finally, the above resources have been woven together and organized into a format analogous to a traditional college-level course by professional consultants that work as experts within the subject area. This process was facilitated by The Saylor Foundation. Additionally, if you have worked through all of the material contained in this project, you may be interested in taking the final exam provided by Saylor.org or completing other courses available there that are not yet on Wikiversity.