Identification of key influencers for secondary distribution of HIV self-testing among Chinese MSM: a machine learning approach

Title

Identification of key influencers for secondary distribution of HIV self-testing among Chinese MSM: a machine learning approach

Presenter

Fengshi Jing

Authors

F. Jing * (1), Y. Ye (2), Y. Ni (1), X. Yan (1), Y. Lu (1), Y. Zhou (3), J.J Ong (4), J.D Tucker (5), D. Wu (4), C. Xu (1), Y. Xiong (1), X. He (6), X. Li (3), S. Huang (3), C. Wang (7), W. Dai (3), L. Huang (3), W. Cheng (8), Q. Zhang (2), W. Tang (1)

Institutions

(1) University of North Carolina at Chapel Hill Project-China, Guangzhou, China, (2) City University of Hong Kong, School of Data Science, Hong Kong SAR, China, (3) Zhuhai Center for Diseases Control and Prevention, Zhuhai, China, (4) London School of Hygiene and Tropical Medicine, Faculty of Infectious and Tropical Diseases, London, United Kingdom, (5) University of North Carolina at Chapel Hill, Institute for Global Health and Infectious Diseases, NC, United States, (6) Zhuhai Xutong Voluntary Services Center, Zhuhai, China, (7) Dermatology Hospital of South Medical University, Guangzhou, China, (8) Guangdong Second Provincial General Hospital, Institute for Healthcare Artificial Intelligence, Guangzhou, China

BACKGROUND: HIV self-testing (HIVST) has been rapidly scaled up in several countries, but additional strategies are needed to enhance case finding. Secondary distribution has people apply for multiple kits and pass these kits to people in their social networks, but identifying key influencers can be difficult. This study aimed to develop and validate an innovative machine learning approach to identify key influencers among men who have sex with men (MSM) for HIVST secondary distribution in China.
METHODS: Indexes applied for HIVST kits for distribution. Alters were those who received these kits. We defined some indexes as key influencers in three types: (1) key distributors who are more likely to distribute more kits; (2) key promoters who can contribute to finding first-time testing alters; (3) key detectors who can help to find positive alters. In our identification system, four machine learning models (logistic regression, support vector machine, decision tree, and random forest) were trained to predict key influencers for secondary distribution. An ensemble learning approach was adopted to combine the predictions of these four models for the final prediction. A simulation experiment was run based on ensemble machine learning identification results compared with human identification (i.e., self-reported leadership scales cut-off method) to validate the higher intervention efficiency of our approach.
RESULTS: A total of 309 indexes in the HIVST distributed kits to 269 alters. Our ensemble model outperformed human identification, exceeding by an average accuracy of 11"·0%. Additionally, if identifying the same number of key influencers such as key-distributors, the ensemble machine learning could distribute 18"·2% (95% CI: 9"·9%-26"·5%) more kits, find 13"·6% (95% CI: 1"·9%-25"·3%) more first-time testing alters, and 12"·0% (95% CI: -14"·7%-38"·7%) more positive-testing alters than the human identification approach. Simulation experiments also revealed that the intervention efficiency of ensemble machine learning model increased by 17"·7% (95% CI: -3"·5%-38"·8%) than self-reported scales cut-off method.
CONCLUSIONS: We built machine learning models to identify key influencers among Chinese MSM population who were more likely to engage in HIVST secondary distribution and our novel approach outperformed the conventional human identification approach.

Go to Session