“Correlation does not imply causation”. If you have ever done data analysis yourself then this phrase must be engrained in your brain. As over-used as this phrase seems it is probably not said enough. There is a reason for the popularity of the content about correlation vs causation (isn’t there?).

Simply speaking, correlation means there is a mutual relationship or connection between variables. If values of both variables increase simultaneously then the correlation is positive, if one increases while the other decreases – the correlation is negative.

Vector of the correlation

Although the existence of that connection alone does not mean one is caused by the other that is not to say it never is (just to make it a bit more confusing for you). Sometimes (!) one indeed causes the other. One of the common mistakes in that case is confusing the vector of the causality. Think of when you were a little child and you thought that trees made the wind blow.

On a more serious note, in middle ages people believed that lice were good for health because they almost never found lice on sick people. As we later discovered, lice are very sensitive to the increase of the temperature of human body. So they left the “host” before it was even obvious that the person was sick. That led everybody to believe they got sick after the lice left them while it was the other way around. So, in that case the correlation did mean causation, but people misinterpreted its direction. (source: https://en.wikipedia.org/wiki/Correlation_does_not_imply_causation)

But even when one event directly causes the other that kind of conclusion cannot be made based on a single correlation. For example, take smoking and lung cancer. There is a direct correlation between the two but before it got accepted as a fact that smoking indeed causes lung cancer, numerous other hypotheses were tested. The correlation between smoking and lung cancer was also supported by other correlations and tests involving many doctors working on the extensive research (source: https://sciencebasedmedicine.org/evidence-in-medicine-correlation-and-causation/).

Correlation caused by the third factor

Very often there is a third factor that causes correlation between two events. Without that third factor there would be no correlation between the other two.

correlation vs causation, funny example

Image source: https://www.youtube.com/watch?v=bnIIPqHZnAk

I’m afraid, Barney, unless you’re at a prom or a wedding, wearing a boutonniere will not help you much with the ladies.

One very famous example in this subject is the correlation between the ice cream sales and the rates of drowning. The third factor that makes both of those variables increase is the time of year and temperature. So, don’t worry and enjoy your ice cream, it will not make you drown.

Coincidental correlations

Very often correlations are just coincidental like in this example below.

Image source: http://www.tylervigen.com/spurious-correlations

As you can see if you dig long enough through large data sets you can always find some correlations. Whilst amusing, these correlations are made ‘after-the-event’ with unrelated datasets. The danger for anyone exploring data is the assumption that any correlations found will be meaningful. Worse still, if people approach data expecting specific factors to be related, confirmatory bias can kick in leading to correlations that support expectations.

In conclusion, correlations can occur in very unexpected combinations and it is important to always treat them with caution. The fact that the correlation exists is not the conclusion by itself, it’s an indication for further investigation.

If you have any queries about data or statistics, please contact us.

If you want to see more examples of funny correlations check out this website  http://www.tylervigen.com/spurious-correlations.

Featured image source: https://www.pinterest.co.uk/pin/312296555380025188