Finance Data Project

Tyler Chia

In this data project we will focus on exploratory data analysis of stock prices for the following banks:

  • Bank of America
  • CitiGroup
  • Goldman Sachs
  • JPMorgan Chase
  • Morgan Stanley
  • Wells Fargo

We'll focus on bank stocks and see how they progressed throughout the financial crisis all the way to early 2016.

Question 1: Getting the Data

In [1]:
# import libraries
In [2]:
from pandas_datareader import data, wb
import pandas as pd
import numpy as np
import datetime
%matplotlib inline
In [3]:
# create a start and end date
In [4]:
start = datetime.datetime(2006,1,1)
end = datetime.datetime(2016,1,1)
In [5]:
# read in the data using the pandas datareader library
In [6]:
# Bank of America
BAC = data.DataReader('BAC', 'yahoo', start, end)
# CityGroup
C = data.DataReader('C','yahoo',start,end)
# Goldman Sachs
GS = data.DataReader('GS', 'yahoo', start, end)
# JPMorgan Chase
JPM = data.DataReader('JPM', 'yahoo', start, end)
# Morgan Stanley
MS = data.DataReader('MS', 'yahoo', start, end)
# Wells Fargo
WFC = data.DataReader('WFC', 'yahoo', start, end)
In [7]:
MS
Out[7]:
High Low Open Close Volume Adj Close
Date
2006-01-03 58.490002 56.740002 57.169998 58.310001 5377000.0 37.140598
2006-01-04 59.279999 58.349998 58.700001 58.349998 7977800.0 37.166069
2006-01-05 58.590000 58.020000 58.549999 58.509998 5778000.0 37.267979
2006-01-06 58.849998 58.049999 58.770000 58.570000 6889800.0 37.306194
2006-01-09 59.290001 58.619999 58.630001 59.189999 4144500.0 37.701099
... ... ... ... ... ... ...
2015-12-24 32.709999 32.439999 32.570000 32.480000 2798200.0 28.914894
2015-12-28 32.360001 31.950001 32.360001 32.169998 5420300.0 28.638926
2015-12-29 32.700001 32.330002 32.439999 32.549999 6388200.0 28.977213
2015-12-30 32.650002 32.200001 32.500000 32.230000 5057200.0 28.692335
2015-12-31 32.299999 31.770000 31.910000 31.809999 8154300.0 28.318438

2517 rows × 6 columns

In [8]:
# create a list of the ticker symbols in alphabetical order
In [9]:
tickers = ['BAC','C','GS','JPM','MS','WFC']
In [10]:
# concatenate the bank dataframes together to a single data frame and set the keys argument equal to the tickers list
In [11]:
bank_stocks = pd.concat([BAC,C,GS,JPM,MS,WFC], axis = 1, keys = tickers)
In [12]:
# set the column name levels 
In [13]:
bank_stocks.columns.names = ['Bank Ticker','Stock Info']
In [14]:
bank_stocks.head()
Out[14]:
Bank Ticker BAC C ... MS WFC
Stock Info High Low Open Close Volume Adj Close High Low Open Close ... Open Close Volume Adj Close High Low Open Close Volume Adj Close
Date
2006-01-03 47.180000 46.150002 46.919998 47.080002 16296700.0 34.811729 493.799988 481.100006 490.000000 492.899994 ... 57.169998 58.310001 5377000.0 37.140598 31.975000 31.195000 31.600000 31.900000 11016400.0 20.818630
2006-01-04 47.240002 46.450001 47.000000 46.580002 17757900.0 34.442013 491.000000 483.500000 488.600006 483.799988 ... 58.700001 58.349998 7977800.0 37.166069 31.820000 31.365000 31.799999 31.530001 10870000.0 20.577162
2006-01-05 46.830002 46.320000 46.580002 46.639999 14970700.0 34.486385 487.799988 484.000000 484.399994 486.200012 ... 58.549999 58.509998 5778000.0 37.267979 31.555000 31.309999 31.500000 31.495001 10158000.0 20.554317
2006-01-06 46.910000 46.349998 46.799999 46.570000 12599800.0 34.434616 489.000000 482.000000 488.799988 486.200012 ... 58.770000 58.570000 6889800.0 37.306194 31.775000 31.385000 31.580000 31.680000 8403800.0 20.675053
2006-01-09 46.970001 46.360001 46.720001 46.599998 15619400.0 34.456806 487.399994 483.000000 486.000000 483.899994 ... 58.630001 59.189999 4144500.0 37.701099 31.825001 31.555000 31.674999 31.674999 5619600.0 20.671791

5 rows × 36 columns

Question 2: Exploratory Data Analysis

In [15]:
# finding the max Close price for each bank's stock throughout the time period
In [16]:
for tick in tickers:
    print(tick,bank_stocks[tick]['Close'].max())
BAC 54.900001525878906
C 564.0999755859375
GS 247.9199981689453
JPM 70.08000183105469
MS 89.30000305175781
WFC 58.52000045776367
In [17]:
returns = pd.DataFrame()
In [18]:
# use pandas pct_change() method on the Close column to create a column representing this return value. 
# create a for loop that goes and for each Bank Stock Ticker creates this returns column and sets it as a 
# column in the returns DataFrame.
In [19]:
for tick in tickers:
    returns[tick+' Return'] = bank_stocks[tick]['Close'].pct_change()
returns.head()
Out[19]:
BAC Return C Return GS Return JPM Return MS Return WFC Return
Date
2006-01-03 NaN NaN NaN NaN NaN NaN
2006-01-04 -0.010620 -0.018462 -0.013812 -0.014183 0.000686 -0.011599
2006-01-05 0.001288 0.004961 -0.000393 0.003029 0.002742 -0.001110
2006-01-06 -0.001501 0.000000 0.014169 0.007046 0.001025 0.005874
2006-01-09 0.000644 -0.004731 0.012030 0.016242 0.010586 -0.000158
In [20]:
# import visualization libraries
In [21]:
import seaborn as sns
import matplotlib.pyplot as plt
In [22]:
# create a pairplot using seaborn of the returns dataframe
In [23]:
sns.pairplot(returns[1:])
Out[23]:
<seaborn.axisgrid.PairGrid at 0x11732e5e0>
In [24]:
# use idxmin and max to find out on what dates each bank stock had the best and worst single day returns
In [25]:
returns.idxmin()
Out[25]:
BAC Return   2009-01-20
C Return     2009-02-27
GS Return    2009-01-20
JPM Return   2009-01-20
MS Return    2008-10-09
WFC Return   2009-01-20
dtype: datetime64[ns]

Note: January 20, 2009 was actually Obama's inauguration date.

In [26]:
returns.idxmax()
Out[26]:
BAC Return   2009-04-09
C Return     2008-11-24
GS Return    2008-11-24
JPM Return   2009-01-21
MS Return    2008-10-13
WFC Return   2008-07-16
dtype: datetime64[ns]
In [27]:
# take the standard deviation of the returns to find the riskiest stock over this time period
In [28]:
returns.std()
Out[28]:
BAC Return    0.036647
C Return      0.038672
GS Return     0.025390
JPM Return    0.027667
MS Return     0.037819
WFC Return    0.030238
dtype: float64

It looks like the riskiest would have to be CityGroup as their standard deviation was the highest.

In [29]:
# check to see the riskiest stock for the year 2015
In [30]:
returns.loc['2015-01-01':'2015-12-31'].std()
Out[30]:
BAC Return    0.016163
C Return      0.015289
GS Return     0.014046
JPM Return    0.014017
MS Return     0.016249
WFC Return    0.012591
dtype: float64
In [31]:
# Morgan Stanley was the riskiest bank stock for 2015 as they had the highest standard deviation
In [32]:
# create a distplot using seaborn of the 2015 returns for Morgan Stanley
In [33]:
sns.distplot(returns.loc['2015-01-01':'2015-12-31']['MS Return'], color = 'green', bins = 50)
Out[33]:
<matplotlib.axes._subplots.AxesSubplot at 0x11b49cd60>
In [34]:
# create a distplot using seaborn of the 2008 returns for CitiGroup 
In [35]:
sns.distplot(returns.loc['2008-01-01':'2008-12-31']['C Return'], color = 'red', bins = 50)
Out[35]:
<matplotlib.axes._subplots.AxesSubplot at 0x11ba2b1c0>

Question 3: More Visualization

In [36]:
# more imports
In [37]:
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')
%matplotlib inline

# Optional Plotly Method Imports
import plotly
import cufflinks as cf
cf.go_offline()
In [38]:
# create a line plot showing Close price for each bank for the entire index of time
In [39]:
bank_stocks.xs(key='Close',axis=1,level='Stock Info').iplot()
In [40]:
# analyze the moving averages for these stocks in the year 2008. 

# plot the rolling 30 day average against the Close Price for Bank Of America's stock for the year 2008
In [41]:
plt.figure(figsize=(12,4))
BAC['Close'].loc['2008-01-01':'2009-01-01'].rolling(window=30).mean().plot(label='30 Day Moving Avg')
BAC['Close'].loc['2008-01-01':'2009-01-01'].plot(label='BAC Close')
plt.legend()
plt.title('30 Day AVG & Close Price for BAC Stock in 2008')
Out[41]:
Text(0.5, 1.0, '30 Day AVG & Close Price for BAC Stock in 2008')
In [42]:
# create a heatmap of the correlation between the stocks Close Price
In [43]:
sns.heatmap(bank_stocks.xs(key='Close', axis=1, level='Stock Info').corr(), annot=True)
Out[43]:
<matplotlib.axes._subplots.AxesSubplot at 0x11cbe8ac0>
In [44]:
# create a clustermap to cluster the correlations together
In [45]:
sns.clustermap(bank_stocks.xs(key='Close', axis=1, level='Stock Info').corr(), annot=True)
Out[45]:
<seaborn.matrix.ClusterGrid at 0x11cd7d730>