standupmaths: How to estimate a population using statisticians

How could you estimate population size? Catch random sample of individuals to be "marked" (capture). Catch another random sample and count proportion of the "marked" (recapture). The proportion of the "marked" in the second sample is statistically equivalent to the proportion of the "marked" in whole population.

Given that we have marked \( S_1 \) individuals during capture phase and have later recaptured \( M_2 \) of them together with \( S_2 - M_2 \) unmarked individuals, the total size of population \( N \) is given by \begin{equation} N = \frac{S_1 S_2}{M_2} . \end{equation} This formula should give a rather good approximation of \( N \) if samples sizes are large and enough and the sampling is truly random.

You can find two exemplary experiments in a video by Matt Parker (one of the well known standup mathematicians).

Berkson's paradox

Berkson's paradox is one of the click-baity results you can obtain while doing conditional comparisons in your mind. For example, there is a common belief that Hollywood ruins good books. It appears that the better the underlying material the worse movie is. Sometimes this perception is attributed to a higher expectations for the movies based on the better source material, but there is an alternative explanation - the mental analysis itself ignoring substantial amount of the available data. This is refered to as Berkson's paradox.

In case for the movies, people often remember the instances when either source material was good or the movie was good. Namely, we tend to forger (ignore) the cases where both source and movie were bad. Because of that spurious negative correlation between the variables emerges.

Similarly there is a related belief that good looking people tend to be jerks. We experience this because when selecting our dates we tend to choose other people who are either good looking or nice. Namely, we ignore ones who are neither good looking nor nice.

More details on that in videos by ASAPScience and Numberphile.

Facebook contest data set: Reproducing day-night cycle

During summer hiatus I have promised to look into a data set I have extracted from one particular Facebook contest. While many Facebook contests are not based on any logical reasoning, this contest appeals to me as it appears to require at least some thought or expertise.

Last time I have briefly explored the data set. Now I try to build models for these observations. This time I will consider the day-night cycle.

SciShow: Could Scientists Predict the Next Political Crisis?

Being able to forecast is a big thing for scientists, but not always having understanding of the system leads to better forecasting. In physics we have good understanding of celestial dynamics, thus we can make accurate predictions about the movement of planets. Though we also have a decent understanding of how the weather works, weather forecasts are not that reliable. And most likely weather forecasts won't get much better over the time, because we will never be able to precisely predict large number of minor influences, which get amplified and then the system deviates from the forecast (the so-called butterfly effect).

Situation with forecasts in social sciences is even worse, but we can do more to improve them. In this SciShow video Hank Green tells us about some of the interesting intricacies.

In this video I especially liked one aspect - open-minded non-experts outperformed close-minded experts. The non-experts did a better job, because they more often incorporated evidence contradicting their prior beliefs, while the experts more often rejected such evidence.

So, the takeaway lesson? I guess I should not trust an "expert" who shows off his firm beliefs. Nor should I myself be an "expert" who values his beliefs more than the evidence.