# Decision Tree for Advanced Learners

Decision trees

Important points to remember:

• Decision trees are used to detect non-linear interactions and cannot map linear relationship.
• In DT, Model can be deployed and used for feature selection in addition to being effective classifiers.
• A binary structure where each node best splits the data to classify a response variable.
• Tree starts with Root (1st node) and ends with final nodes (Leaves of the tree)
• Repeatedly splits the data set that maximizes Information Gain of each split.
• Best use of Decision tree is when your solution required Representation.

• Decision trees can handle both nominal and numerical attributes and datasets that may have errors and also missing values
• Decision trees representation is rich enough to represent any discrete-value classifier.
• Decision trees are considered to be a nonparametric method. This means that they have no assumptions about the space distribution and the classifier structure.

• Most of the algorithms (like ID3 and C4.5) require that the target attribute will have only discrete values.
• As decision trees use the “divide and conquer” method, they tend to perform well if a few highly relevant attributes exist, but less so if many complex interactions are present.
• The greedy characteristic of decision trees leads to another disadvantage that should be pointed out. This is its over-sensitivity to the training set, to irrelevant attributes and to noise.

Pruning

Simplify the tree after the learning algorithm terminates and also complements early stopping. It helps to avoid overfitting.

Pruning: Intuition

Train a complex tree, simplify later

After Pruning

Pruning: Motivation

More leaves in splitting, more complexity

Simple measure of complexity of tree

L(T) = # of leaf nodes (number of leaf nodes) decides complexity of tree

Balance Simplicity and predictive power

• Too complex, risk of overfitting.
• Too simple, high classification error.

For balancing, one should check

• How well tree fits the data
• Complexity of tree

Total cost = measure of fit + measure of complexity

= Classification error + number of leaf nodes

Total Cost C(T) = Error (T) + α L(T), where α is tuning parameter

If α=0, Standard decision tree learning

If α=∞, a tree with no decision in it.

If α in between: Balance fit and complexity of the tree

When to use Decision tree

1. When you want your model to be simple/explainable.
2. When you don’t have to be worried about feature selection or regularization and/or Multicollinearity.
3. You can overfit the tree and build a model if you are sure of validation or test data set is going to be subset of training data set.