Describing high-frequency trader's behavior in the order book

We continue our series of posts on order book models by considering an order book model proposed by a group of scientists from Japan [1, 2, 3], which is based on high resolution data from foreign exchange market. Their work is extremely interesting as it starts from the empirical observations at the lowest level observable and is built up to reproduce some empirical observations at the higher levels. Also the model is analytically tractable using kinetic theory.

standupmaths: How to estimate a population using statisticians

How could you estimate population size? Catch random sample of individuals to be "marked" (capture). Catch another random sample and count proportion of the "marked" (recapture). The proportion of the "marked" in the second sample is statistically equivalent to the proportion of the "marked" in whole population.

Given that we have marked \( S_1 \) individuals during capture phase and have later recaptured \( M_2 \) of them together with \( S_2 - M_2 \) unmarked individuals, the total size of population \( N \) is given by \begin{equation} N = \frac{S_1 S_2}{M_2} . \end{equation} This formula should give a rather good approximation of \( N \) if samples sizes are large and enough and the sampling is truly random.

You can find two exemplary experiments in a video by Matt Parker (one of the most well-known standup mathematicians).

Berkson's paradox

Berkson's paradox is one of the click-baity results you can obtain while doing conditional comparisons in your mind. For example, there is a common belief that Hollywood ruins good books. It appears that the better the underlying material the worse movie is. Sometimes this perception is attributed to a higher expectations for the movies based on the better source material, but there is an alternative explanation - the mental analysis itself ignoring substantial amount of the available data. This is refered to as Berkson's paradox.

In case for the movies, people often remember the instances when either source material was good or the movie was good. Namely, we tend to forger (ignore) the cases where both source and movie were bad. Because of that spurious negative correlation between the variables emerges.

Similarly there is a related belief that good looking people tend to be jerks. We experience this because when selecting our dates we tend to choose other people who are either good looking or nice. Namely, we ignore ones who are neither good looking nor nice.

More details on that in videos by ASAPScience and Numberphile.

Facebook contest data set: Reproducing day-night cycle

During summer hiatus I have promised to look into a data set I have extracted from one particular Facebook contest. While many Facebook contests are not based on any logical reasoning, this contest appeals to me as it appears to require at least some thought or expertise.

Last time I have briefly explored the data set. Now I try to build models for these observations. This time I will consider the day-night cycle.