Finding Associations Between Variables
Finding associations between categorical variables
Consider the example from the previous page (exam results).
We want to determine if there is an association between the two variables in the table.
To do this we must determine if the students' exam results (pass or fail) are dependent on their level of preparation (study or did not study).
We need to firstly determine the explanatory variable.
This is the variable which explains changes in the other variable.
In this case, this is the level of preparation (study or did not study).
When we read down each column in the table we can see a significant difference in the percentages of students that passed or failed in each category (studied or did not study).
This suggests that there is an association between the level of preparation and exam results.
If the explanatory variable is in the place of the column headings use a column percentages table to determine if there is an association.
If the explanatory variable is on the row headings use a row percentages table.
Describing the association
There are three steps we must follow when describing an association.
State whether or not there is an association.
Describe the association. In this example, we might say that the level of preparation affects the students' results.
Give an example to support the claim. We might, for example, say that "the more prepared the student is, the more likely they are to pass the exam."
Categorical variables: 100% stacked column graphs
Associations between variables can often be seen more clearly in a stacked column graph.
Above is a stacked column graph for the exam example.
Since the proportions of each colour are different in both columns, we know that there is an association.
If there was not an association, each coloured block would be the same size in all columns.
Finding associations between numerical variables
Scattergraphs are a visual representation of sets of numerical data. They allow us to determine if there is a relationship between the two sets of data.
The explanatory variable is on the horizontal axis.
The response variable is on the vertical axis.
Consider the below example.
The above table shows how monthly revenue ($) changes as monthly advertising spend changes.
In this example, monthly advertising spend is the explanatory variable, while monthly revenue is the response variable.
We can use the above data to construct a scattergraph, with the explanatory variable on the x-axis and the response variable on the y-axis.