I identified 20 discriminatory biomarkers present in the breath of TB positive pediatric patients, that can be used to identify children with pulmonary M. tuberculosis (TB) infections with 82% accuracy. The analysis was conducted by using the area under the chromagraph for each compound as features. After cleaning and removing compounds contributed from room air, I was left with 302 features from the breath of 11 TB positive and 22 TB negative pediatric patients. Using random forest (RF) and linear support vector machines (SVM), I constructed a model that determined the contribution of different compounds in predicting TB status. It was found that an optimized suite of 20 molecules performed as well as the suite of features.
Dendrogram of top 20 compounds
The full thesis is available here.