Identification of key influencers for secondary distribution of HIV self-testing among Chinese MSM: a machine learning approach


BACKGROUND: HIV self-testing (HIVST) has been rapidly scaled up in several countries, but additional strategies are needed to enhance case finding. Secondary distribution has people apply for multiple kits and pass these kits to people in their social networks, but identifying key influencers can be difficult. This study aimed to develop and validate an innovative machine learning approach to identify key influencers among men who have sex with men (MSM) for HIVST secondary distribution in China.
METHODS: Indexes applied for HIVST kits for distribution. Alters were those who received these kits. We defined some indexes as key influencers in three types: (1) key distributors who are more likely to distribute more kits; (2) key promoters who can contribute to finding first-time testing alters; (3) key detectors who can help to find positive alters. In our identification system, four machine learning models (logistic regression, support vector machine, decision tree, and random forest) were trained to predict key influencers for secondary distribution. An ensemble learning approach was adopted to combine the predictions of these four models for the final prediction. A simulation experiment was run based on ensemble machine learning identification results compared with human identification (i.e., self-reported leadership scales cut-off method) to validate the higher intervention efficiency of our approach.
RESULTS: A total of 309 indexes in the HIVST distributed kits to 269 alters. Our ensemble model outperformed human identification, exceeding by an average accuracy of 11"·0%. Additionally, if identifying the same number of key influencers such as key-distributors, the ensemble machine learning could distribute 18"·2% (95% CI: 9"·9%-26"·5%) more kits, find 13"·6% (95% CI: 1"·9%-25"·3%) more first-time testing alters, and 12"·0% (95% CI: -14"·7%-38"·7%) more positive-testing alters than the human identification approach. Simulation experiments also revealed that the intervention efficiency of ensemble machine learning model increased by 17"·7% (95% CI: -3"·5%-38"·8%) than self-reported scales cut-off method.
CONCLUSIONS: We built machine learning models to identify key influencers among Chinese MSM population who were more likely to engage in HIVST secondary distribution and our novel approach outperformed the conventional human identification approach.