In this blog we’ll discuss about few techniques while doing data analysis.

# 1. Check for null values

Once we read the dataset, we should always check for null values. Following line of code can be used for it:

`df.isnull().sum(axis=0)`

It gives the total number of null values for each column in the dataset.

# 2. Check for duplicate values

We can check whether there are duplicate values in the dataset using the following command:

`df.duplicated().sum()`

It gives us the total number of duplicated rows. We can drop using following command:

`df.drop_duplicates(keep=False)`

# 3. Check for unique values

It is useful for categorical data. Number of unique values can be checked in a particular column using following code:

`df['column_name'].unique()`

# 4. Replace/Drop Null values

Dealing with null values is also very important for data analysis. We can either remove the column if there are too many null values

`df['column_name'].dropna(inplace=True)`

or we can impute them with mean/median/mode.

`df['column'].fillna(df['column'].mean(),inplace=True)`

# 5. Correlation Matrix

It gives us the information about the correlation between different features. Following command can be used for it:

``````import seaborn as sns
corrmat = df.corr()
f, ax = plt.subplots(figsize=(12, 9))
sns.heatmap(corrmat, square=True);
``````

We are using seaborn package to plot the matrix.

# 6. Check distribution of a variable

We can also check for the distribution of a particular column. By plot we can understand whether the data is following a normal distribution or some other. Following is the command:

``````import seaborn as sns
sns.distplot(df['column'])
``````

# 7. Check datatypes of columns

We can check the datatype of columns in the data. It helps us to confirm whether data is having correct data type or not.

`df.dtypes`

# 8. Deal with datetime columns

Many times we get dates in a column but not in datetime format. We can convert it into one by using following code:

`df["Date.of.Birth"] = df['Date.of.Birth'].astype('datetime64[ns]')`

# 9. Value count

We can check the frequency of different categories in a categorical column:

`df['column'].value_counts()`

# 10. Max/Min value of a column

We can find out the maximum and minimum values in a particular column using max() and min() command:

`df['column'].max()` `df['column'].max()`

You can checkout my sample notebook with all these commands in action with a dataset.

Jupyter Notebook

That’s it for now. Next time we’ll try to find out some more tips and tricks for data analysis. :)