Forests of Stumps (Cancelled)

Event Start
Event End
Location
KAUST

Abstract

Decision stump algorithms are one of the simplest classifiers. A decision stump is a tree has a root that is directly connected to leaves and it is an example of a weak learner, which is a learner performs poorly, the accuracy of which is barely above 50%. A stump uses a single feature variable to split training data and it can be used for prediction on the testing. For continuous feature variables, the most common approach is that a feature variable and a corresponding threshold value “split point” are selected to create a stump with two leaves for values below and above that threshold. Decision stumps perform poorly due to their simplicity. However, many numerical studies indicate that decision stumps perform accurately within ensemble methods like bagging, as in this work. We will investigate two approaches to create a forest of stumps for classification. The first method is Bagging with stumps in which changing the bootstrap sample size is investigated. This method works in a way such that a stump fitted on each bootstrap sample and the decisions of these trees are combined by using one of two different aggregating methods (Majority vote or Weighted vote). The second method Gini-sampled stumps is that instead of the ordinary way of constructing trees, we divide the data by sampling split-points with probability proportional to Gini index. The technique of this method is to sample split-points according to the Gini index values, splitting the training dataset into two subsets greater and smaller than this split point or “threshold” by using this set of split-points, and then combining the results by using the same aggregation methods mentioned previously. Joint work with Charles C Taylor and Jochen Voss.

Brief Biography

Bio text.