What is StandardScaler in Python

Can someone explain StandardScaler to me?


Reply:


The idea behind this is that your data will be transformed so that its distribution has a mean of 0 and a standard deviation of 1.
In the case of multivariate data, this is done according to characteristics (ie independently for each data column). .
Given the distribution of the data, the mean value is subtracted for each value in the data set and then divided by the standard deviation of the entire data set (or of the characteristic in the multivariate case).




Intro: I'm assuming you have a matrix in which each Line / line a Sample / observation and each column a Is a variable / characteristic (This is the expected input for any ML function, by the way - it should be).


Core of the method : The main idea is to normalize / standardized dh and your functions / variables / columns, individually , before apply any machine learning model.

becomes normalizes the features i.e. every column of X, INDIVIDUALLY so that each column / feature / becomes variable and.


PS: I think the top rated answer on this page is wrong. I quote, "For every value in the data set, the sample mean is subtracted" - this is neither true nor correct.


See also: How and Why to Standardize Your Data: A Python Tutorial


Example:

Make sure that the mean of each feature (column) is 0:

Make sure the default for each feature (column) is 1:


Maths:


UPDATE 08/2019 : After entering the input parameters and to / I gave an answer here: StandardScaler difference between "with_std = False or True" and "with_mean = False or True"







How to calculate it:

You can read more here:


StandardScaler performs the task of Standardization off . Usually a data set contains variables with different scales. For example, an employee record contains an AGE column with values on a scale of 20-70 and a SALARY column with values on the scale 10000-80000 .
Because these two columns are different in scale, they are standardized to have a common scale when building a machine learning model.


This is useful when you want to compare data that corresponds to different units. In this case, you want to remove the units. To do this consistently for all data, transform the data so that the variance is uniform and the mean of the series is 0.



The above answers are great, but I needed a simple example to address some concerns I had in the past. I wanted to make sure that each column was actually treated individually. I am now at ease and cannot find which example has troubled me. All columns ARE scaled separately as described above from those.

CODE

OUTPUT



In the following, a simple working example is used to explain how the standardization calculation works. The theoretical part is already well explained in other answers.

calculation

As you can see in the output, the mean is [6. , 2.5] and the standard deviation is [1.41421356, 0.8660254].

Data is (0.1) position is 2 standardization = (2 - 2.5) /0.8660254 = -0.57735027

Data in (1,0) position is 4 standardization = (4-6) /1.41421356 = -1.414

Result after standardization

Check the mean and standard deviation after standardization

Note: -2.77555756e-17 is very close to 0.

References

  1. Compare the effects of different scalers on data with outliers

  2. What is the difference between normalization and standardization?

  3. The mean of the data scaled with sklearn StandardScaler is not zero


After applying has every column in X a mean of 0 and a standard deviation of 1.

Formulas are listed by others on this page.

Reason: Some algorithms require data that looks like this (see sklearn documents).

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from.

By continuing, you consent to our use of cookies and other tracking technologies and affirm you're at least 16 years old or have consent from a parent or guardian.

You can read details in our Cookie policy and Privacy policy.