one hot encoding missing values
one hot encode python
Label encoding encodes categories to numbers in a data set that might lead to comparisons between the data , to avoid that we use one hot encoding.
Brief about video How to implement One Hot Encoding on Categorical Data | Dummy Encoding :
Simple approach is to use interger or label encoding but when categorical variables are nominal, using simple label encoding can be problematic. One hot encoding is the technique that can help in this situation. In this tutorial, we will use pandas get_dummies method to create dummy variables that allows us to perform one hot encoding on given dataset. Alternatively we can use sklearn.preprocessing OneHotEncoder as well to create dummy variables.
in this video we will discuss how we can convert our categorical variables to integer.
at the end we will also see how we can save the encoder object to file using joblib library in python and reuse it.
code for this video:
import pandas as pd
from sklearn.preprocessing import OneHotEncoder
data = pd.read_csv('titanic.csv')
data.head()
data_cat = data[['Sex','Embarked']]
data_cat
pd.get_dummies(data_cat, dummy_na=True, drop_first=True)
df_2 = data_cat
df_2
ohe = OneHotEncoder(categories='auto', drop = 'first')
ohe.fit(df_2.fillna('Missing'))
ohe.get_feature_names(['Gender','Embarked'])
df_3 = ohe.transform(df_2.fillna('Missing')).toarray()
pd.DataFrame(df_3, columns=ohe.get_feature_names())
df_3 = pd.DataFrame(df_3, columns=ohe.get_feature_names(['Gender','Embarked']))
df_3
import joblib
joblib.dump(ohe, filename='ohe.pkl')
saved_imp = joblib.load('ohe.pkl')
saved_imp.get_feature_names(['Gender','Embarked'])
tags:
label encoding in r,
one hot encoding in python,
one hot encoding python numpy,
one hot encoding python pandas,
one hot encoding vs dummy variables,
categorical encoding,
label encoding in python,
related tags:
feature engineering python
machine learning
python
data science
data analytics
data analysis
label encoding in python
How to handle categorical features
How can i convert my categorical features to integer
How to do one hot encoding in python
why do i need to encode my categorical variables
what if i do not encode my categorical variables
one-hot encoding python pandas
one hot encoding in r
one-hot encoding nlp
one-hot encoding python pandas
one hot encoding vs label encoding
one hot encoding in r
one-hot encoding nlp
one hot encoding definition
Thanks