*Corresponding author:
Danielle Baghernejad, Intermedix, Nashville TN 37219, Tennessee, USAReceived: September 16, 2017; Published: October 12, 2017
DOI: 10.26717/BJSTR.2017.01.000431
To view the Full Article Peer-reviewed Article PDF
In this paper we explore variable importance within tree-based modeling, discussing its strengths and weaknesses with regard to medical inference and action ability. While variable importance is useful in understanding how strongly a variable influences a tree, it does not convey how variables relate to different classes of the target variable. Given that in the medical setting, both prediction and inference are important for successful machine learning, a new measure capturing variable importance with regards to classes is essential. A measure calculated from the paths of training instances through the tree is defined, and initial performance on benchmark datasets is explored.
Keywords : Machine learning; Tree-based modeling; Decision trees; Variable importance; Class Variable Importance
Abbreviations : CART: Classification and Regression Trees; CVI: Class Variable Importance; ET: Extra Trees; RF: Random Forests; GBT: Gradient Boosted Trees; AUC: Area under the Curve; ROC: Receiver Operating Characteristic
Abstract| Introduction| Generating a Decision Tree| Interpreting a Tree| Class Variable Importance| Performance on Benchmark Data| Conclusion| Acknowledgement| References|