Last time we discussed a data set, in many ways very similar to income/wealth distributions, but simple and small enough for an unskilled person to analyse with a standard spreadsheet package.
The data are distributed very unequally. A few unusually high values had a huge influence in the statistics of the sample:
"1% of the sample determines 41% of the sample's mean.
There's more. … From 3,252 'likes', for example, the range would collapse to 480 with the removal of the "1%": a 85% fall!"
One would need to consider very seriously the presence of a number of outliers: from the top six (in a very conservative "guesstimate") to the top 89 (in a much more structured and stringent estimate).
That data set could offer an interesting exercise for advanced high-school/early undergraduate students of statistics/econometrics. In particular, it would allow for ties with topical issues, like the Piketty/associates literature on income/wealth size distribution:
"In words, 25% of the sample gets between 1 and 2 ‘likes'; the next quarter gets between 2 and 4; the third gets between 4 and 8; and the last between 8 and 3,253 … Welcome to inequality."
It, however, has another interesting characteristic. I never explained what that data were supposed to mean. What are those "likes" and data points? I mentioned a survey, how was it conducted? Was that a random sample? What were the questions included?
Today's the day!