December, 07, 2015

Python is a fantastic tool, in association with many libraries, for working with data. In this post, I will cover some of the basic commands used for data analysis via Python. The following serves as a quick reference guide to commonly used commands in Pandas, NumPy, and Matplotlib.


Pandas

Pandas is an open source, python library for data manipulation that provides data structures and data analysis tools.

Import

Import as 'pd', the common abbreviation for Pandas
import pandas as pd

Reading Data

You can read data directly from a cvs or Excel file into a data frame
df = pd.read_csv("sample.csv")
df = pd.read_excel("sample.xlsx")

Handeling Data Frames

Create a data frame
df = pd.DataFrame (data, index, columns)

See the first or last N items of a data frame
df.head(N)
df.tail(N)

Work with Columns

Rename columns
df.columns = ["Column1", "Column2", "Column3"]
df.rename(columns={"Old Column Name":"New Column Name"}, inplace = True)

View column names
df.columns

Select a column
df["column name"]

Create a new column as a function of old ones
df["new column"] = df["column1"]/df["column2"]

Sort a column
df.sort_index(axis=1)

Basic Operations

Summary statistics
df.describe

Find the sum of a column
df.sum()

Find a minimum or maximum value
df.min()
df.max()

Find the mean or median
df.mean()
df.median()

Find unique values
df["column name"].unique()

Manipulating Data

Pivot data
df_pivot = df.pivot("index name","column name", "values name")

Fill missing values with a new value
df.fillna(value = 0)

Drop missing row values
df.dropna()

Transpose a data frame
df.T

NumPy

NumPy is a python library for numeric computations ad is great for working with arrays and matrices .

Import

We import as the abbreviation 'np'
import numpy as np

Create an array

arr = np.array([1,2,3,4])

Work with an array

np.sum(arr)
np.mean(arr)
np.std(arr)

Create a matrix

m = np.array([[1,2,3,4],[5,6,7,8]])

Reshaping an array into a specific matrix size
m = np.array([1,2,3,4,5,6]).reshape(2,3)

Create a random matrix of a specific size
m = np.random.rand(4,4)

Create a zero matrix
m = np.zeros((5,5))

Create an identity matrix of size n x n
m = np.eye(n)

Matrix Operations

Scalar operations where m is the matrix
m * 2

Multiplication of matrix m1 and m2
np.dot(m1, m2)

Raise elements in matrix m to the power n
np.power(m,n)

Matrix m to the power n
np.linalg.matrix_power(m,n)

Matplotlib

Matplotlib is a python plotting library for generating various data visualizations. It works well with Pandas data frames.

Import

import matplotlib
from matplotlib import pyplot as pet

Basic Plots

Generate a plot for a data frame
df.plot()

Name the x and y labels
plt.xlabel("x")
plt.ylabel("y")

Name the title
plt.title("title")