Abstract:
Objectives This study aims to apply a variety of machine learning algorithms to build a risk prediction model for knee valgus and varus in school-aged children and adolescents. By comparing and selecting the optimal model, it aims to provide a scientific explanation to contribute to the exploration and development of intelligent models for assessing and predicting children and adolescents' posture health.
Methods 514 primary school students from major cities in Zhejiang Province were selected for the study. Comprehensive data, including demographics, anthropometrics, body composition, posture, and both static and dynamic plantar pressure distribution, were collected. The sample was divided into a training set (n = 360) and a validation set (n = 154) using simple random sampling in a 7∶3 ratio. 6 machine learning algorithms were employed to construct predictive models for knee valgus and varus: K-Nearest Neighbors (KNN), Light Gradient Boosting Machine (LGBM), Extreme Gradient Boosting (XGBoost), Random Forest (RF), Multiple Logistic Regression (LM), and Support Vector Machine (SVM). The predictive performance of each model was evaluated using the Receiver Operating Characteristic (ROC) curve, and the Shapley Additive Explanations (SHAP) algorithm was utilized to assess the influence of various data dimensions on the model outputs.
Results The study identified 190 cases of knee valgus and 80 cases of knee varus among the subjects. The XGBoost model demonstrated the highest area under the ROC curve (AUC) at 0.738, indicating the superior predictive performance for knee valgus. Conversely, the RF model achieved the highest AUC at 0.824 for knee varus, marking it as the best predictive model. The SHAP analysis revealed that the key features influencing the XGBoost model's predictions for knee valgus were age, leg length difference, and ear-shoulder distance, while for the RF model's predictions of knee varus, the most significant factors were knee extension angle, leg length difference, ear-shoulder distance, dynamic plantar arch index, arch status deformation, and age.
Conclusion The model demonstrated certain superior predictive performances, validating that the findings can guide the construction of early intervention tools for managing children and adolescents' postural health.