Decision tree is a non-parametric supervised machine learning algorithm, which can used for categorical as well as regression problem. It uses a tree structure to represent a number of possible decision paths and an outcome for each path. At each node, a question is asked and based on the answer of the question the path is selected.
Let’s take an example: –
Suppose you have to buy a mobile based on RAM. Since, this is your first mobile; you go to your older brother and ask him for help. And he gives you the above decision tree.
This is what you identify by your own:
- Feature set = [RAM, Brand, Type]
- RAM = [4GB, 6GB, 8GB]
- Brand = [Apple, Blackberry, Samsung]
- Type = [Static, Dynamic]
- Class/Label = [buy(yes), do not buy(no)]
Here, the tree was already given, and then you did the classification of data point, but this is not the case every time, you have to build your own decision tree and then you have to perform classification. So, how to build?
ID3 algorithm to build a decision tree
- Calculate the entropy of each data point of given data
- Split the set into subset using the data point having entropy minimum and information gain maximum
- Make the decision tree node containing that data point
Entropy is a measure of the uncertainty in the information being processed. The Lower the entropy, the easier it is to draw any conclusions from that information.
Information gain is the measure of the difference in entropy from before to after the set is split on an attribute. In other words, how much uncertainty in was reduced after splitting S set on attribute A. It measures the reduction in entropy. It decides which feature should be selected as the decision node.
Now, let’s try to make decision tree by own from the given dataset:
In this case, there are 9 yes rows and 5 no rows, which leads to an entropy of:
From the calculation we can see that RAM has highest gain; it’s placed at root node of the decision tree. Afterwards, the decision tree is expanded to cover RAM’s possible values. the RAM can be 4GB, 6GB, or 8GB. Then again, the Brand has second highest gain so it’s placed as shown in figure and as Apple has information gain higher than Samsung; it is placed first as shown in figure and so on.
Thus, ID3 decides the best root attribute based on our entire data set (all 14 rows). Calculating and comparing the information gain values for the rest of the tree is left as an exercise for the reader.
Here we have considered the small dataset so it’s become possible to calculate everything manually. What if the dataset contains 1 million rows and 300 columns?
The Scikit-Learn library of Python does this everything for you. You just have to import: – sklearn.tree import DecisionTreeClassifier and pass following key parameters.
Key Parameters for Decision tree
- max_depth: controls maximum depth (number of split points). Most common way to reduce tree complexity and over-fitting.
- min_samples_leaf: threshold for minimum number of data instances a leaf can have to avoid further splitting.
- max_leaf_nodes: limits total number of leaves in the tree.