Feature Scaling in ML

Hamad S. AlAssafi
3 min readMar 1, 2022

Each numerical feature has two parts:

  1. Unit

2. Magnitude

Let takes as an example the height and weight of individual, for example we have individual has height of (180 cm) and weight of (90 kg), in this example the unit of his height is (cm), and the magnitude of his height is (180), and the weight’s unit is (kg), also the magnitude is (90)

Some of ML algorithms are sensitive to scaling of the data, and the scaling can influence in positive way to the performance and the speed of the algorithm, because it uses distance between features or level of convergence, as an example of this type of algorithms is K-Nearest Neighbors (KNN), and Neural Networks

The main idea behind feature scaling in ML, is to standardize the way we deal with the features within certain range, and feature scaling is performed during the preprocessing stage

Going back to the height and weight example, lets imagine that we need to build a binary classifier that classify whether an individual is woman or man based on his/her height and weight, and we want to use Support Vector Classifier (SVC) which is sensitive to feature scaling, if we do not use scaling technique to our features magnitude the distance will be highly different, which will effect in negative way to the classification process, and the performance because the distance is too large between the feature’s magnitude.

I will discuss briefly three type of scaling technique that implemented in Sci-Kit Learn library, alongside of charts to make it more understandable:

  1. Standard Scaler: It ensures that the mean of each feature is (0), and the variance is (1), which will bring all features to the same magnitude, but it does not ensure particular maximum value(s), or minimum value(s) for the features

2. Robust Scaler: It similar to Standard Scaler, but it uses the median and quartiles, instead of mean and variance. Which makes Robust Scaler ignore data pointes that are different from the rest (outliers)

3. Min Max Scaler: It scale the data points of each feature to be between the range of (0 as minimum, and 1 as maximum)

Note: Feature Scaling works only on numerical features, because categorical features are encoded to zeros and ones when it will be used in ML task, so it does not require any type of scaling, also as we mentioned before that feature scaling required when we used distance-based or convergence-based algorithms, so algorithms that are tree-based (e.g., Decision Tree, Random Forest, etc…) does not require any feature scaling, and it performance will not be effected because it is not sensitive to variance of the data points, or the outliers.

Many thanks for your careful reading, and I hope it is useful blogpost.

Reference:

Introduction to Machine Learning with Python: A Guide for Data Scientists

by Andreas C. Müller and Sarah Guido (2016)

Contact me through:

Twitter: @HamadAlassafi

E-mail: AlassafiHamad@gmail.com

--

--