Abstract:
Random forest models are constructed to predict the teams' medal numbers in different Olympic events based on a large data set of 1992-2021 Summer Olympic Games. It is found that apparent differences exist in the predictability of Olympic events. For the forecast of medals, the top three most predictable events are table tennis, badminton and swimming, while the bottom three are water polo, modern pentathlon and volleyball. With interpretable machine learning methods, the social-economic features are further investigated which have important effects on the performance of Olympic Games. The results show that: (1) For the same event, the prediction accuracy of women's events is usually higher than that of men's; (2) Factors like population, GDP per capita, and the game hosting have some influences on the medal numbers; (3) For specific Olympic events, some traditionally advantageous events like table tennis in China and athletics of the USA have a large impact on the medal forecast.