Finance Data Project¶

Tyler Chia¶

In this data project we will focus on exploratory data analysis of stock prices for the following banks:

Bank of America
CitiGroup
Goldman Sachs
JPMorgan Chase
Morgan Stanley
Wells Fargo

We'll focus on bank stocks and see how they progressed throughout the financial crisis all the way to early 2016.

Question 1: Getting the Data¶

# import libraries

from pandas_datareader import data, wb
import pandas as pd
import numpy as np
import datetime
%matplotlib inline

# create a start and end date

start = datetime.datetime(2006,1,1)
end = datetime.datetime(2016,1,1)

# read in the data using the pandas datareader library

# Bank of America
BAC = data.DataReader('BAC', 'yahoo', start, end)
# CityGroup
C = data.DataReader('C','yahoo',start,end)
# Goldman Sachs
GS = data.DataReader('GS', 'yahoo', start, end)
# JPMorgan Chase
JPM = data.DataReader('JPM', 'yahoo', start, end)
# Morgan Stanley
MS = data.DataReader('MS', 'yahoo', start, end)
# Wells Fargo
WFC = data.DataReader('WFC', 'yahoo', start, end)

MS

# create a list of the ticker symbols in alphabetical order

tickers = ['BAC','C','GS','JPM','MS','WFC']

# concatenate the bank dataframes together to a single data frame and set the keys argument equal to the tickers list

bank_stocks = pd.concat([BAC,C,GS,JPM,MS,WFC], axis = 1, keys = tickers)

# set the column name levels

bank_stocks.columns.names = ['Bank Ticker','Stock Info']

bank_stocks.head()

Question 2: Exploratory Data Analysis¶

# finding the max Close price for each bank's stock throughout the time period

for tick in tickers:
    print(tick,bank_stocks[tick]['Close'].max())

BAC 54.900001525878906
C 564.0999755859375
GS 247.9199981689453
JPM 70.08000183105469
MS 89.30000305175781
WFC 58.52000045776367

returns = pd.DataFrame()

# use pandas pct_change() method on the Close column to create a column representing this return value. 
# create a for loop that goes and for each Bank Stock Ticker creates this returns column and sets it as a 
# column in the returns DataFrame.

for tick in tickers:
    returns[tick+' Return'] = bank_stocks[tick]['Close'].pct_change()
returns.head()

# import visualization libraries

import seaborn as sns
import matplotlib.pyplot as plt

# create a pairplot using seaborn of the returns dataframe

sns.pairplot(returns[1:])

<seaborn.axisgrid.PairGrid at 0x11732e5e0>

# use idxmin and max to find out on what dates each bank stock had the best and worst single day returns

returns.idxmin()

BAC Return   2009-01-20
C Return     2009-02-27
GS Return    2009-01-20
JPM Return   2009-01-20
MS Return    2008-10-09
WFC Return   2009-01-20
dtype: datetime64[ns]

Note: January 20, 2009 was actually Obama's inauguration date.

returns.idxmax()

BAC Return   2009-04-09
C Return     2008-11-24
GS Return    2008-11-24
JPM Return   2009-01-21
MS Return    2008-10-13
WFC Return   2008-07-16
dtype: datetime64[ns]

# take the standard deviation of the returns to find the riskiest stock over this time period

returns.std()

BAC Return    0.036647
C Return      0.038672
GS Return     0.025390
JPM Return    0.027667
MS Return     0.037819
WFC Return    0.030238
dtype: float64

It looks like the riskiest would have to be CityGroup as their standard deviation was the highest.

# check to see the riskiest stock for the year 2015

returns.loc['2015-01-01':'2015-12-31'].std()

BAC Return    0.016163
C Return      0.015289
GS Return     0.014046
JPM Return    0.014017
MS Return     0.016249
WFC Return    0.012591
dtype: float64

# Morgan Stanley was the riskiest bank stock for 2015 as they had the highest standard deviation

# create a distplot using seaborn of the 2015 returns for Morgan Stanley

sns.distplot(returns.loc['2015-01-01':'2015-12-31']['MS Return'], color = 'green', bins = 50)

<matplotlib.axes._subplots.AxesSubplot at 0x11b49cd60>

# create a distplot using seaborn of the 2008 returns for CitiGroup

sns.distplot(returns.loc['2008-01-01':'2008-12-31']['C Return'], color = 'red', bins = 50)

<matplotlib.axes._subplots.AxesSubplot at 0x11ba2b1c0>

Question 3: More Visualization¶

# more imports

import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')
%matplotlib inline

# Optional Plotly Method Imports
import plotly
import cufflinks as cf
cf.go_offline()

# create a line plot showing Close price for each bank for the entire index of time

bank_stocks.xs(key='Close',axis=1,level='Stock Info').iplot()

# analyze the moving averages for these stocks in the year 2008. 

# plot the rolling 30 day average against the Close Price for Bank Of America's stock for the year 2008

plt.figure(figsize=(12,4))
BAC['Close'].loc['2008-01-01':'2009-01-01'].rolling(window=30).mean().plot(label='30 Day Moving Avg')
BAC['Close'].loc['2008-01-01':'2009-01-01'].plot(label='BAC Close')
plt.legend()
plt.title('30 Day AVG & Close Price for BAC Stock in 2008')

Text(0.5, 1.0, '30 Day AVG & Close Price for BAC Stock in 2008')

# create a heatmap of the correlation between the stocks Close Price

sns.heatmap(bank_stocks.xs(key='Close', axis=1, level='Stock Info').corr(), annot=True)

<matplotlib.axes._subplots.AxesSubplot at 0x11cbe8ac0>

# create a clustermap to cluster the correlations together

sns.clustermap(bank_stocks.xs(key='Close', axis=1, level='Stock Info').corr(), annot=True)

<seaborn.matrix.ClusterGrid at 0x11cd7d730>

	High	Low	Open	Close	Volume	Adj Close
Date
2006-01-03	58.490002	56.740002	57.169998	58.310001	5377000.0	37.140598
2006-01-04	59.279999	58.349998	58.700001	58.349998	7977800.0	37.166069
2006-01-05	58.590000	58.020000	58.549999	58.509998	5778000.0	37.267979
2006-01-06	58.849998	58.049999	58.770000	58.570000	6889800.0	37.306194
2006-01-09	59.290001	58.619999	58.630001	59.189999	4144500.0	37.701099
...	...	...	...	...	...	...
2015-12-24	32.709999	32.439999	32.570000	32.480000	2798200.0	28.914894
2015-12-28	32.360001	31.950001	32.360001	32.169998	5420300.0	28.638926
2015-12-29	32.700001	32.330002	32.439999	32.549999	6388200.0	28.977213
2015-12-30	32.650002	32.200001	32.500000	32.230000	5057200.0	28.692335
2015-12-31	32.299999	31.770000	31.910000	31.809999	8154300.0	28.318438

Bank Ticker	BAC						C				...	MS				WFC
Stock Info	High	Low	Open	Close	Volume	Adj Close	High	Low	Open	Close	...	Open	Close	Volume	Adj Close	High	Low	Open	Close	Volume	Adj Close
Date
2006-01-03	47.180000	46.150002	46.919998	47.080002	16296700.0	34.811729	493.799988	481.100006	490.000000	492.899994	...	57.169998	58.310001	5377000.0	37.140598	31.975000	31.195000	31.600000	31.900000	11016400.0	20.818630
2006-01-04	47.240002	46.450001	47.000000	46.580002	17757900.0	34.442013	491.000000	483.500000	488.600006	483.799988	...	58.700001	58.349998	7977800.0	37.166069	31.820000	31.365000	31.799999	31.530001	10870000.0	20.577162
2006-01-05	46.830002	46.320000	46.580002	46.639999	14970700.0	34.486385	487.799988	484.000000	484.399994	486.200012	...	58.549999	58.509998	5778000.0	37.267979	31.555000	31.309999	31.500000	31.495001	10158000.0	20.554317
2006-01-06	46.910000	46.349998	46.799999	46.570000	12599800.0	34.434616	489.000000	482.000000	488.799988	486.200012	...	58.770000	58.570000	6889800.0	37.306194	31.775000	31.385000	31.580000	31.680000	8403800.0	20.675053
2006-01-09	46.970001	46.360001	46.720001	46.599998	15619400.0	34.456806	487.399994	483.000000	486.000000	483.899994	...	58.630001	59.189999	4144500.0	37.701099	31.825001	31.555000	31.674999	31.674999	5619600.0	20.671791

	BAC Return	C Return	GS Return	JPM Return	MS Return	WFC Return
Date
2006-01-03	NaN	NaN	NaN	NaN	NaN	NaN
2006-01-04	-0.010620	-0.018462	-0.013812	-0.014183	0.000686	-0.011599
2006-01-05	0.001288	0.004961	-0.000393	0.003029	0.002742	-0.001110
2006-01-06	-0.001501	0.000000	0.014169	0.007046	0.001025	0.005874
2006-01-09	0.000644	-0.004731	0.012030	0.016242	0.010586	-0.000158