30 things you can do with Pandas

Hello everyone!  Today I want to write about the Pandas library and here are the 30 things you can do with Pandas to better understand the data!

First thing first, lets import pandas library:

import pandas as pd
df=pd.read_csv(‘test.csv’) # read a test file to dataframe

(1) Read in a CSV dataset

pd.DataFrame.from_csv(“csv_file”)
or
pd.read_csv(“csv_file”)

(2) Read in an Excel dataset

pd.read_excel(“excel_file”)

(3) Write your data frame directly to csv

df.to_csv(“data.csv”, sep=”,”, index=False)

(4) Create a dataframe from data with column names

pd.DataFrame(data,columns=[])

(5)  Get Data type for all the columns

df.dtypes

(6) Basic dataset feature info

df.info()

(7) Basic dataset statistics

print(df.describe())

(8) List the column names

df.columns

(9) Drop missing data

df.dropna(axis=0, how=’any’)

(10) Replace missing data

df.replace(to_replace=None, value=None)

(11) Check for NANs

pd.isnull(object)

(12) Drop a feature

df.drop(‘feature_variable_name’, axis=1)

(13) Convert object type to float

pd.to_numeric(df[“feature_name”], errors=’coerce’)

(14) Convert data frame to numpy array

df.as_matrix()

(15) Get first “n” rows of a data frame

df.head(n)

(16) Get last “n” rows of a data frame

df.tail(n)

(17) Get data by feature name

df.loc[feature_name]

(18) Apply a function to a data frame

df[“height”].apply(lambda height: 2 * height)

(19) Renaming a column

df.rename(columns = {df.columns[2]:’size’}, inplace=True)

(20) Count categories of categorical variable

df[“job”].value_counts()

(21) Get the unique entries of a column

df[“name”].unique()

(22) Accessing sub-data frames

new_df = df[[“name”, “size”]]

(23) Summary information about your data

# Sum of values in a data frame
df.sum()
# Lowest value of a data frame
df.min()
# Highest value
df.max()
# Index of the lowest value
df.idxmin()
# Index of the highest value
df.idxmax()
# Statistical summary of the data frame, with quartiles, median, etc.
df.describe()
# Average values
df.mean()
# Median values
df.median()
# Correlation between columns
df.corr()
# To get these values for only one column, just select it like this#
df[“size”].median()

(24) Sorting your data

df.sort_values(ascending = False)

(25) Boolean indexing

df[df[“size”] == 5]

(26) Selecting values

df.loc([0], [‘size’])

(27 Cross frequency tables between two variables

pd.crosstab(df[“y”],df[“z”])

(28) Plot function for numeric columns

df[“size”].plot()

(29) Get shape (row,columns) of the DataFrame

df.shape

(30) Get Randomly selected n rows from DataFrame

df.sample(n)

There are many more useful things in pandas. We’ll see more about them in upcoming posts.

pandas.jpeg

“Happy Reading, Happy Learning”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s