Learning statistics is one of the most fundamental things you can do to improve your life as a person. Beyond helping with career paths, mastering statistical concepts will help you better understand almost every other field including business, economics, psychology, and more.
Businesses use stats to determine how much merchandise to order, how many employees are needed, and whether to invest in new equipment or not. Economists use stats to study trends in the economy and how best to fix them.
Psychologists rely heavily on understanding averages and variance when assessing personality traits and behaviors. And while some people may laugh at you for learning math, knowing basic stats like mean, median, and mode helps dispel common myths about numbers.
The more you know about numbers, the better you will be! So what are we going to learn? Let’s get started by looking at something simple: means.
Averages are simply sums divided by integers. For example, if there were eight students and three classes, then the average number of points assigned per class would be 2+2+3=7 points per class. The mean is just that – the sum of all the individual values plus half the difference between the highest value and the lowest value. In this case, the mean is 5 because it includes both the two and the three, so add those together and divide by 2.
This seems very straightforward, but there are times when it does not work.
Another way to compare sets of data is through box plots. A box plot is a way to visualize both median, mean, and interquartile range (IQR) information for a set of numbers.
The average person usually has a mean perception of about 5 when asked to estimate length, but the IQR or spread of the numbers suggests that most people are either very accurate or very inaccurate.
Some experts suggest we have a “rule of thumb” internal reference point of around 2-3 inches, which is what the average person estimates as the length of a foot. This means half of all individuals are close enough to get the length of a foot right!
By looking at the whole distribution instead of just an individual number, we can see how well these experts were trained.
A scatter plot is one of the most important statistical tools. It gives you an easy way to compare two data sets or objects side-by-side. In fact, it is such a fundamental tool that many other graphical statistics are built off of it.
A basic scatterplot has only a x-axis (the horizontal axis) and a y-axis (the vertical axis). The x- and y-axes have numbers in them, but they are not labeled.
The points in the graph, however, are! These points are called ‘data’ or ‘observations’. Each observation can be an integer, a decimal number, or even a symbol.
There are three types of observations in a scatterplot: isolated dots, clusters, and lines.
Isolated dots occur when there are no other things coming close to each other. Clusters happen when there are several other observations close together. Lines are very specific, they are only used for shape.
This article will go into more detail about what kind of graphs people make use of, how to make your own graphs, and some examples.
A regression model is one to predict or determine what variable influences another. In other words, it looks at two variables and predicts whether one will affect the other.
There are three main types of regression models. These are linear regression, logistic regression, and multivariate (or more than two-variable) regression.
Linear regression is the most common type of regression used in business and economics. It goes beyond just predicting a dependent variable from an independent variable by including a coefficient that represents how much each variable influences the dependent variable.
A good example is looking at why some schools have higher test scores than others. You would not necessarily look only at income as a factor, but you would include factors like education spending and quality of teaching. Having more educated teachers can increase student learning, which is a major determinant of test score performance.
Businesses use regressions all the time to see if one variable has an influence on sales or customer satisfaction. If one does, they then develop strategies to reduce or eliminate this effect. For instance, if having better service causes people to buy less coffee, maybe offering cheaper coffee is a way to boost sales!
Logistic regression is similar to linear regression, except there is only ever a true binary result – either something works or it doesn’t. This is typically done with risk prediction tools suchas determining if someone is likely to commit fraud using statistics and information about them.
In psychology, there is an interesting statistic used to determine if one group of people are more or less likely to show a certain behavior than another. This statistical measure is called effect size, and it compares the average difference between groups with a standard deviation.
A small effect size means that the differences in the behaviors being studied are not very large, and therefore cannot be said to represent much of a change. Whereas, larger effect sizes mean that the differences are greater and can be considered significant.
In statistics, we use standardized effect sizes because they allow us to make comparisons across different studies. One common way to calculate this effect size is by using what’s known as Cohen’s D.
Cohen's D looks at how many times a given variable (in our case, group A versus B) differs from a control condition (group C). Then, it computes the average of these differences divided by the SD of the control condition.
This ratio is sometimes referred to as r, but we usually drop the decimal for simplicity. The final formula is then calculated as [average(AB-AC)]/SD(C), where AC is the average of the control conditions and BC is the average of the comparison conditions.
Note that when you take the square root of this value, you get something called eta squared.
In one way analysis of variance (or simply, one-way ANOVA), is an important tool to use when you need to know if there was a significant difference between two or more groups. This test is used to determine whether there are differences in mean values among several groups.
In this case, there are three different groups being analyzed. The first group has a separate mean value from the second group, which has its own mean value that differs from the third group.
The third group’s mean value does not differ from either of the other two groups; it is only like one of them! For example, let’s say we were testing how much sugar each group liked and found that the first group likes 2 tablespoons of sugar per day while the second prefers 4 and the third loves 6.
This would show that there is no significant difference between the third group and the second because they have the same average amount of sugar they enjoy. However, the first group is quite a bit lower than both others – even though they like slightly more! That is what one way ANOVA can tell us about.
Interpretation of results
If the mean values for all three groups are the same, then there is not enough evidence to conclude that there is a statistical significance in their differences.
A popular way to test whether there is a difference between two groups is called a “post hoc” analysis. This is because it occurs after the fact, when we already know that one group was different from another!
So what are post hoc analyses? They are statistical tests that compare only two groups at a time. For example, if you read your book very quickly then this article will be more helpful for you than others. Therefore, let me do a post hoc analysis to determine which books take the most time to read.
Our statistician would perform these tests by comparing the average reading times of Book 1 with the average reading time of Book 2, and then comparing the average reading time of Book 2 with the average reading time of Book 3. And so on and so forth until all four books have been analyzed.
This is an important concept in statistics. Because how well each individual test predicts the outcome depends on whether there has been enough variability within the groups being compared. If there hasn’t, then the prediction can’t work.