When I said I’m not a data scientist, I meant it. My last chart-happy post about male versus female confidence in skydiving was heavy on pretty graphs and light on actual data analysis. Fearless reader – and actual data scientist – Clifford Richardson ran the data through some heavy-duty industrial grinders and come up with a bit more rigorous work through.
Long story short, (TL;DR): “In short conclusion, freefall confidence isn’t statistically different between genders, canopy flight confidence seems to be, and females and males are equally varied in the amount of jumps it takes to get comfortable under that patchwork of life-saving nylon.”
And for the true stats lovers, here is the full explanation. Huge thanks and kudos to Clifford!
Synopsis of Skydiving Confidence Data
by Clifford Richardson
In response to the recent article about confidence in skydiving, I have taken the raw data and given it a more in-depth analysis to help make a proper conclusion. I use some common statistical tests and terms to make some formal conclusions, but I will explain the purposes behind them for those who are not mathematically inclined. I apologize in advance for the lengthy wording. I write for scientific communities and the boring prose is all I know.
First, I wanted to address the main question: Is there a statistical difference between men and women when looking at confidence for either canopy or freefall flight?
To examine this question, let’s look at the original survey questions:
- Male or Female?
- I felt confident in my canopy-flying and canopy-landing skills at jump #___.
- I felt confident in my freefall skills at jump #___.
Male or female is an easy answer to interpret. However, I wanted to compare solid “yes/no” answers to determine if that answer is dependent on gender (I’ll get into the reason for this in the next paragraph). Therefore I interpreted non-ambiguous numerical answers to questions 1 and 2 to mean that the jumper is now confident in their abilities. Ambiguous answers that could have meant yes or no were dropped from the data set; there were only 4 instances where this was the case.
Now that I had the data reported in a file the way I wanted it, I used a powerful computing program called “R” to perform a chi-squared (χ2) test on tabulated data. I know we all like graphics, but there are no fancy graphs for this portion of the analysis. We do have a table of the categorical data for confidence in both canopy and freefall skills (Tables 1 & 2).
The chi-squared test for non-parametric (categorical) analysis determines if one categorical variable is dependent on another, i.e. whether confidence depends on gender. If the p-value is less than or equal to 0.05, we are at least 95% certain that the two variables are dependent on each other. The value is just an arbitrary limit that statisticians decided was good enough to consider “statistically significant.” I promise I’m not just making this up as I go along. The benefit of the chi-squared test is that sample size doesn’t have a strong adverse effect on the test. The fact that more respondents were male is accounted for.
Table 1: Confidence versus Gender for Canopy Flight.
Numbers represent the number of respondents that fit the table criteria.
Pearson’s chi-squared p-value=7.259e-05.
Table 2: Confidence versus Gender for Freefall Flight.
Numbers represent the number of respondents that fit the table criteria.
Pearson’s chi-squared p-value=0.3962.
Confidence isn’t gender dependent for freefall skills, but appears to be gender-dependent for canopy flight.
Confidence isn’t gender dependent for freefall skills, but appears to be gender-dependent for canopy flight. However, we cannot say it is only due to gender with great certainty; we can only say that based on this data set, we see with over 99% certainty that gender has an effect on canopy flight confidence. This doesn’t account for other possible factors such as type of instruction, age, etc. This is why surveys are hard to infer concrete conclusions from, but in situations like this it is the best way to collect data because we don’t want to experiment on people in skydiving (throwing an inexperienced jumper under a high performance canopy to test their canopy skills is frowned upon). We must reduce the number of confounding factors and the only way to do that is to collect data relevant to those possible confounding variables. Survey analysis is really a chess game.
Should I stop there? No. I want to give as much effort as possible to this since Blue Skies Mag deserves it for collecting the data and making each datum available (free information is a blessing). I will however, make a calculated effort. For instance, because I saw that based on the simple chi-squared test that freefall skills and gender are statistically non-interdependent, there is no reason to pursue further analysis. The canopy flight skills I did, however.
Bear with me, because the following involves more complicated stuff. Things like probability distributions, probability density functions, and some fancy stats tests. So, we have something cool available to us: the number of jumps it took for a jumper to be confident in their canopy skills.
Right, so why is this cool? Well, because we can model both the male and female data by a probability function and compare the two models to see if males get confident in canopy flying sooner than females or vice versa. Fitting a probability distribution to the data is extremely important because it gives hints as to what tests we can perform to compare the two data sets and in some cases is mandatory for the tests. There are cheap and easy ways around this (the Kruskal-Wallis test for instance), but doing a little more work is better in my opinion. The simplest form of model fitting is linear regression. Add in some special parameters based on probability theory and you get more complicated, but useful equations that can describe your data. These are what probability density functions are. Because fitting more complicated models is, for lack of a better word, complicated … I let R do all the work for me by using a function to fit different distributions and seeing which fit best.
I am pretty comfortable with reading data, so I knew the best bet for this data would be the Weibull distribution. However, because I can’t just assume, I ran a validation test for “goodness of fit.” The test is called the Anderson-Darling test. The reason I chose this test is because of its sensitivity to skewed (tail skewed, not “cheezed” skewed) data. You can Google the details, but for validating Weibull distributions, it is fantastico. You can see the results below (Figures 1, 2, & 3).
Unlike the chi-squared test, the Anderson-Darling test p-value tries to test for independence. A p-value of less than or equal to 0.05 means the distribution does not fit. We want to see a value higher than 0.05.
What this tells me is that the models are adequate for the pooled data, and the data separated by gender. The next step is to perform log-likelihood tests on each model. I’m going to save some time and let you trust me on the values. Then the chi-squared distribution is used to calculate what’s known as the maximum likelihood ratio. It’s a way to analyze two models to see if it’s more likely that the data could have come from the model of the gender-separated data or the non-gender-separated data. It is not the same kind of result we calculated before using chi-squared, and instead a p-value of greater than 0.05 means that the differences in the data that we see are most likely just random and there’s no real dependence in gender.
Jumping to the result, I got a p-value of 0.9002. So are males becoming more confident under canopy quicker than females? No, not according to the data.
In short conclusion, freefall confidence isn’t statistically different between genders, canopy flight confidence seems to be, and females and males are equally varied in the amount of jumps it takes to get comfortable under that patchwork of life-saving nylon.
I recommend further investigation based on other surveys that build on top of this one. Future questions should include as wide of a range of factors as possible, including total jumps, age, type of instruction, years in the sport, number jumps in the last 30 days, and if the jumper has been involved in an incident that led to injury in both freefall and canopy flight. Big time acknowledgments to Blue Skies Mag for getting this stuff into the open for investigation.[team_member name=”Clifford Richardson” role=”Data Wrangler” image_url=”/wp-content/uploads/2014/11/DataWrangler.png”]About the author: “I worked as a biologist in New Mexico for 4 years catching wild animals, testing cancer and antimicrobial drugs, and analyzing research data. Got my skydiving license (A-71041) in July of this year and haven’t gone back to biology since. Now I live in California where I do whatever job pays to get money to spend on my next gear rental fee and lift ticket. Favorite color: red.”[/team_member]