Sports analytics & 3 centuries old tradition (Part 2)

In the previous post I’ve introduced you to 3 centuries old tradition from Croatia, a knightly tournament that celebrates heroes from the past and their modern counterparts, alkars from Sinj. It’s definitely not some usual analytics subject, but hey, it’s analytics in everything, right? so we did some sports analytics stuff on the collected data and saved best for the last. Last time we introduced average points per run metric which we can use to gauge both quality of particular competition and alkars as well.

8_points_per_run

We’ve calculated average points per run per Alka and found out which competitions stood out, both positively and negatively, but we spotted something else as well – it looks like alkars are performing much better in the new millennium. As we already mentioned, 50-year average is 1.3 points per run and most Alka after 2000 were around that mark or better (except 2017 Alka). Let’s try to verify this by calculating 10-year moving average. Moving averages are used to smooth the data, filter out short-term fluctuations and highlight longer-term trends. Well, the trend is pretty evident (picture below)… collectively alkars raised their game considerably, from 1.2 to 1.3 range they started hitting well north of 1.4 points per run. As it stands, we are currently witnessing the golden period of Alka, quite possibly the best years to date.

9_10MA

We’ve also calculated average points per run for all alkars which gave us lot of interesting insights. As we already mentioned (several times, I know I’m repeating myself), average points per run for all alkars is around 1.3, however, as with all distributions, there are some outliers, in our case exceptional alkars, best of the best.

10_dist

As we can see, 3 alkars significantly differ from the rest with 1.8 points per run (half a point better than average) – Ognjen Preost, Alen Filipović Grčić and Stipe Šimundža. It’s also interesting to note that most efficient alkar who never won Alka (Aljoša Dragaš) has average of more than 1.5 points per run which puts him in TOP 10 of all alkars with minimum of 9 runs. We can see that being accurate was not enough this time, and like with life in general it helps to have a bit of luck as well. For different sports luck carries different weight (luck vs skill), and outliers like this one are more or less probable.  It would be interesting to see where Alka stands on this so called luck vs skill continuum compared to other sports. This is definitely worth investigating, but we’ll leave it for future analysis. Another interesting observation and peculiarity involve Stipe Šimundža; he was really great alkar, considering his efficiency and accuracy possibly one of the best of all time, but he participated in only 6 Alka?! Bear in mind average alkar’s career is 12.5 years long, while some compete for 20+ years, so Stipe’s case is an odd one.

10_apr_alkars
(bars show  average points per run, circled numbers show wins at Alka)

I’ve investigated this further, not with quantitative methods this time, and found out he’s a veterinarian and later became the director of newly founded Alka Stable. Purpose of Alka Stable is to raise specific type of horse, horse that will be well suited for Alka, which means: ability to cross 160m under 13s (not that hard for a horse), but in the same time have very smooth gallop (which affects alkar’s accuracy) and must not be easily startled or spooked (there’s lot of crowd, commotion, cheering, even cannon blasts). Basically, it’s Anglo-Arabian horse breed, and the cross is usually made between an Arabian stallion and Thoroughbred mare. (Btw, how do you rate my subject knowledge? LOL, I don’t know shit about horses, but hey, I had to investigate.) Anyway, where am I going with all of this? We have a suspicion that founding of Alka Stable and better horse training and preparation “technology” helped alkars considerably. This is not some wild hypothesis, because after we circulated draft of this analysis it found a way to alkars and several feedbacks went in that direction: “this is great analysis, but horse is extremely important factor and you should try to include some horse related data if possible, e.g. horse experience…”, which I would be glad to do if there’s data. Anyway, it’s quite possible that with Stipe’s goodbye to Alka we lost all-time great, but because of his love for horses and better training and preparation he enabled alkars to be even better… coincidentally, shortly after his retirement the golden period of Alka began, hmmm.

In the picture above, we didn’t encompass whole careers of both Kelava and Dragaš, thus we are not being 100% accurate, so let’s take into account only alkars that had their debut after 1969. (meaning we have their whole career data) and let’s also forget about 9-run-minimum condition that we had previously. New rankings are shown in the following graph. As we already mentioned, Alen Poljak had fantastic start to his alkar career, winning Alka on his debut. He also has excellent points per run average, but he only participated in two Alka so far, which is very limited sample. Still, this is very interesting – are we witnessing the birth of new all-time great, one who will join Preost, Filipović Grčić and Šimundža on the throne or his accuracy will decline with time? Only time will tell, but keep your eye on Alen following Sunday. Another fun fact are four Filipović Grčićs in the TOP20 (out of 56 alkars): Alen, Goran, Mladen and Ivica, and all of them won Alka except Goran who compete for only four years. Considering how good he was he should have continued, with time he should have won Alka.

11_top20_apr_alkars

We mentioned peculiar case of Stipe Šimundža, but what about Ognjen Preost (OP) and Alen Filipović Grčić (AFG) “mystery”? Have you noticed it on the second last graph? Guys have virtually identical average points per run, however Preost has won 5 Alka (from 11 participations) while Filipović Grčić only 1 (from 10 participations). What about that? I had to investigate this further, with quantitative methods this time! We know point probabilities for Preost and Filipović Grčić (following table) and we also know these distributions for all other alkars, both from OP and AFG era. Basically, we could run Monte Carlo simulations and simulate their careers (in different setups) many times to uncover statistical expectations.

12_AFG_OP

First thing that crossed my mind, Preost (not active) and Filipović Grčić (still active) competed in different eras, Preost in 90s and Filipović Grčić for the last 10 years in so called golden period of Alka. I mean, maybe AFG’s competition is much better than OP’s, maybe it’s much harder to win Alka nowadays? To find out if this is the case, we’ve done bunch of simulations. First, we’ve simulated 100 thousand Alka where OP competes against competition from his era (knowing their point distributions) and found out that OP won in 23.4% of the cases. We already see that OP’s expected efficiency is much lower than the one he recorded in his career (45%), which tells us that apart from being most accurate alkar he also had lot of luck. In simulations where OP competes against alkars from AFG era his efficiency is bit lower and stands at 22.5% which tells us that indeed competition in AFG era is better. It is indeed harder to win Alka nowadays but by not so big margin. Vice versa, in simulation where AFG competes against competition from OP’s era his efficiency is 22.5% – bit lower than OP’s which is understandable because AFG average points per run is bit lower which makes difference in 100 thousand simulations. In simulations where AFG competes against his present competition he wins Alka in 21.4% of cases, which in contrary to OP’s case is much higher than what he recorded in real life (10%, 1 in 10 Alka). From these simulations we see that competition difference is not the reason for big discrepancy in OP’s and AFG’s wins – it is luck that played a big role; OP had luck and won far above expectations (45% vs 23%), while AFG was unlucky and won far below expectations (10% vs 21%).

We’ve done another simulation to show how far from expectations their results (wins) fall. By simulating whole careers of OP (11 years) and AFG (10 years) many times (10 thousand is enough), we can find out how probable for them was to win certain number of times. As we can see probability for OP to win Alka 5 times from 11 participations was little bit higher than 5%. As shockingly as it might sound, it was more probable for him not to win a single Alka, than to win it 5 times. It was most probable for him to win 2 or 3 Alka, which is around 23.4% mark (expected efficiency) that we already got in the previous simulation. AFG has one participation less, so number differ a bit, but for him it was most probable to win 1 or 2 Alka. However, cumulative probability of winning more than one Alka for AFG was around 65%, but unfortunately it did not happen, his record still stands at one. Cumulative probability of winning 5 or more Alka for OP was less than 10% and it materialized… as we said, expectations are one thing, but in sports it takes a little bit of luck.

13_OP

14_AFG

To illustrate this further, we checked how both of them did year by year (points total) and whether they’ve won Alka. We see that Preost won both Alka when he had 7 points (remember previous post? historical probability to win Alka with 7 points is 38.9%) and one of two Alka when he had 6 points (historical probability is 14.5%). Same as Preost, Filipović Grčić twice had 7 points and twice had 6 points but didn’t win single Alka on those occasions.

15_godine_usporedba_OP_AFG

As we saw numerous times, statistical techniques help us to make a sense of uncertainty, but real life is very often more peculiar and messier from pure statistical expectations (especially when bells grow tails)… and I’m not only talking about Preost vs Filipović Grčić case, I’m also talking about 1715 when few hundred Croats defeated heavily favored Ottomans (historical sources say more than 20 to 1) out of which whole this beautiful tradition started. I hope you enjoyed these posts, we did some number crunching on a subject no one would consider as worthy candidate but why not popularize less known sports or traditions… I feel this one certainly deserved it.

 

Addendum: This peculiar analytics exercise, something that started as a fun thing to do, got us on national TV (statistics in bottom right corner)…

alka_na_TV

…and #1 regional/#3 national newspaper 🙂

clanak_zajedno