911 Calls Project¶

Tyler Chia¶

For this project we will be analyzing some 911 call data from Kaggle. All the data is specific to Montgomery County, Pennsylvania.

Question 1: Data and Setup¶

# Import Pandas and Numpy

import pandas as pd
import numpy as np

# Import visualization libraries and set %matplotlib inline.

%matplotlib inline

# Read in csv file

df = pd.read_csv('911.csv')

# Check the info of the dataframe

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 99492 entries, 0 to 99491
Data columns (total 9 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   lat        99492 non-null  float64
 1   lng        99492 non-null  float64
 2   desc       99492 non-null  object 
 3   zip        86637 non-null  float64
 4   title      99492 non-null  object 
 5   timeStamp  99492 non-null  object 
 6   twp        99449 non-null  object 
 7   addr       98973 non-null  object 
 8   e          99492 non-null  int64  
dtypes: float64(3), int64(1), object(5)
memory usage: 6.8+ MB

# Check the head of the dataframe

df.head()

Question 2: Basic Information From the Dataframe¶

# What are the top 5 zipcodes for 911 calls?

df['zip'].value_counts().head()

19401.0    6979
19464.0    6643
19403.0    4854
19446.0    4748
19406.0    3174
Name: zip, dtype: int64

The top five zipcodes for 911 calls are 19401, 19464, 19403, 19446, and 19406 with the total number of calls to the right of the zipcode number.

# What are the top 5 townships for 911 calls?

df['twp'].value_counts().head()

LOWER MERION    8443
ABINGTON        5977
NORRISTOWN      5890
UPPER MERION    5227
CHELTENHAM      4575
Name: twp, dtype: int64

The top five townships for 911 calls are Lower Merion, Abington, Norristown, Upper Merrion, and Cheltenham with the number of 911 calls to the right of the township name.

# How many unique title codes are there?

df['title'].nunique()

110

There are 110 unique title codes in the dataframe for 911 calls.

Question 3: Creating New Features¶

# Adding a new column to the dataframe that shows the reason for the 911 call

# Creating a function that splits the title column and takes the string before the colon 
# in order to extract the reason for the call

def reason(title):
    return title.split(':')[0]

df['Reason'] = df['title'].apply(lambda x: reason(x))

df['Reason']

0            EMS
1            EMS
2           Fire
3            EMS
4            EMS
          ...   
99487    Traffic
99488    Traffic
99489        EMS
99490        EMS
99491    Traffic
Name: Reason, Length: 99492, dtype: object

# What is the most common Reason for a 911 call based off of this new column?

df['Reason'].value_counts()

EMS        48877
Traffic    35695
Fire       14920
Name: Reason, dtype: int64

The most common reason for a 911 call in Montgomery County, Pennsylvania is EMS.

Question 4: Visualization of the Data¶

# Import both seaborn and matplotlib for data visualization

import seaborn as sns

import matplotlib.pyplot as plt

sns.countplot(x = 'Reason', data=df)

<matplotlib.axes._subplots.AxesSubplot at 0x118301eb0>

Above is a countplot that shows the total number of calls based on each of the three reasons: EMS, Fire, and Traffic.

# Convert timeStamp variable from a string to DateTime objects

type(df['timeStamp'].iloc[0])

str

The data type of the objects in the timeStamp column are strings.

df['timeStamp'] = pd.to_datetime(df['timeStamp'])

# Create three new colummns that display the hour, month, and day respectively

df['Hour'] = df['timeStamp'].apply(lambda x: x.hour)
df['Month'] = df['timeStamp'].apply(lambda x: x.month)
df['Day'] = df['timeStamp'].apply(lambda x: x.dayofweek)
df.head()

The Day column displays the day as an integer from 0 to 6. I then created a dictionary to map each of the integers to a string name for each day of the week.

dmap = {0:'Mon',1:'Tue',2:'Wed',3:'Thu',4:'Fri',5:'Sat',6:'Sun'}

df['Day'] = df['Day'].apply(lambda x: dmap[x])

df.head()

# Creating a countplot that counts the number of calls per day of the week
# and splits them based on the reason for the call

sns.countplot(x = 'Day', data = df, hue = 'Reason')
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

<matplotlib.legend.Legend at 0x118301730>

As displayed by the graph above, EMS is the main reason for 911 calls in Montgomery County, Pennsylvania on any day of the week with fires and traffic following respectively.

# Creating the same countplot for months

sns.countplot(x = 'Month', data = df, hue = 'Reason')
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

<matplotlib.legend.Legend at 0x11824e460>

We can see that the number of 911 calls in Montgomery County, Pennsylvania start to drop later in the year as the lowest number of calls was made in December.

Looking at the plot above, it is clear that certain months are missing from the dataset. September, October, and November are all not included in the dataset.

# Create a groupby object that groups the data by month and aggregate the data

byMonth = df.groupby('Month').count()
byMonth.head()

# Create a simple lineplot that shows the number of calls based on the month

byMonth['desc'].plot()

<matplotlib.axes._subplots.AxesSubplot at 0x118981160>

# Create a lmplot() to create a linear fit on the number of calls per month

byMonth = byMonth.reset_index()
sns.lmplot(x = 'Month', y = 'desc', data = byMonth)

<seaborn.axisgrid.FacetGrid at 0x1158db040>

Even though September, October, and November are missing from the dataset, we can use an lmplot to predict these numbers based on the months preceding this patch as well as the months following it.

# Create a new column that shows the date

df['Date'] = df['timeStamp'].apply(lambda x: x.date())
df.head()

# Perform a groupby on the data that groups together by date

byDate = df.groupby('Date').count()
byDate['desc'].plot()
plt.tight_layout()

# Create a plot that shows the number of traffic related calls based on date

byTraffic = df[df['Reason'] == 'Traffic']
byTraffic.groupby('Date').count()['desc'].plot()
plt.title('Traffic')
plt.tight_layout()

According to the graph, it seems that in the middle of January of 2016, there was a drastic increase in the number of 911 calls in Montgomery County, Pennsylvania.

# Create a plot that shows the number of fire related calls based on date

byFire = df[df['Reason'] == 'Fire']
byFire.groupby('Date').count()['desc'].plot()
plt.title('Fire')
plt.tight_layout()

# Create a plot that shows the number of EMS related calls based on date

byEMS = df[df['Reason'] == 'EMS']
byEMS.groupby('Date').count()['desc'].plot()
plt.title('EMS')
plt.tight_layout()

# Restructure the dataframe so that the columns become the Hours and the Index becomes the Day of the Week.

dayandhour = df.groupby(by=['Day','Hour']).count()['Reason'].unstack()
dayandhour.head()

# Create a heatmap using seaborn

sns.heatmap(dayandhour, cmap='coolwarm')

<matplotlib.axes._subplots.AxesSubplot at 0x11a2f2cd0>

It seems that the majority of 911 calls in Montgomery County, Pennsylvania occur between 4 and 5 pm.

# Create a clustermap based on the data

sns.clustermap(dayandhour, cmap='coolwarm')

<seaborn.matrix.ClusterGrid at 0x11b0f3ca0>

# Create a heatmap using seaborn that is based on month as opposed to hour

dayandmonth = df.groupby(by=['Day','Month']).count()['Reason'].unstack()
dayandmonth.head()

sns.heatmap(dayandmonth, cmap='coolwarm')

<matplotlib.axes._subplots.AxesSubplot at 0x11b043640>

From the heatmap above, there is a drastic increase in the number of 911 calls on Saturdays in January in Montgomery County, Pennsylvania.

# Create a new clustermap

sns.clustermap(dayandmonth, cmap='coolwarm')

<seaborn.matrix.ClusterGrid at 0x11b868130>

	lat	lng	desc	zip	title	timeStamp	twp	addr	e
0	40.297876	-75.581294	REINDEER CT & DEAD END; NEW HANOVER; Station ...	19525.0	EMS: BACK PAINS/INJURY	2015-12-10 17:40:00	NEW HANOVER	REINDEER CT & DEAD END	1
1	40.258061	-75.264680	BRIAR PATH & WHITEMARSH LN; HATFIELD TOWNSHIP...	19446.0	EMS: DIABETIC EMERGENCY	2015-12-10 17:40:00	HATFIELD TOWNSHIP	BRIAR PATH & WHITEMARSH LN	1
2	40.121182	-75.351975	HAWS AVE; NORRISTOWN; 2015-12-10 @ 14:39:21-St...	19401.0	Fire: GAS-ODOR/LEAK	2015-12-10 17:40:00	NORRISTOWN	HAWS AVE	1
3	40.116153	-75.343513	AIRY ST & SWEDE ST; NORRISTOWN; Station 308A;...	19401.0	EMS: CARDIAC EMERGENCY	2015-12-10 17:40:01	NORRISTOWN	AIRY ST & SWEDE ST	1
4	40.251492	-75.603350	CHERRYWOOD CT & DEAD END; LOWER POTTSGROVE; S...	NaN	EMS: DIZZINESS	2015-12-10 17:40:01	LOWER POTTSGROVE	CHERRYWOOD CT & DEAD END	1

	lat	lng	desc	zip	title	timeStamp	twp	addr	e	Reason	Hour	Month	Day
0	40.297876	-75.581294	REINDEER CT & DEAD END; NEW HANOVER; Station ...	19525.0	EMS: BACK PAINS/INJURY	2015-12-10 17:40:00	NEW HANOVER	REINDEER CT & DEAD END	1	EMS	17	12	3
1	40.258061	-75.264680	BRIAR PATH & WHITEMARSH LN; HATFIELD TOWNSHIP...	19446.0	EMS: DIABETIC EMERGENCY	2015-12-10 17:40:00	HATFIELD TOWNSHIP	BRIAR PATH & WHITEMARSH LN	1	EMS	17	12	3
2	40.121182	-75.351975	HAWS AVE; NORRISTOWN; 2015-12-10 @ 14:39:21-St...	19401.0	Fire: GAS-ODOR/LEAK	2015-12-10 17:40:00	NORRISTOWN	HAWS AVE	1	Fire	17	12	3
3	40.116153	-75.343513	AIRY ST & SWEDE ST; NORRISTOWN; Station 308A;...	19401.0	EMS: CARDIAC EMERGENCY	2015-12-10 17:40:01	NORRISTOWN	AIRY ST & SWEDE ST	1	EMS	17	12	3
4	40.251492	-75.603350	CHERRYWOOD CT & DEAD END; LOWER POTTSGROVE; S...	NaN	EMS: DIZZINESS	2015-12-10 17:40:01	LOWER POTTSGROVE	CHERRYWOOD CT & DEAD END	1	EMS	17	12	3

	lat	lng	desc	zip	title	timeStamp	twp	addr	e	Reason	Hour	Month	Day
0	40.297876	-75.581294	REINDEER CT & DEAD END; NEW HANOVER; Station ...	19525.0	EMS: BACK PAINS/INJURY	2015-12-10 17:40:00	NEW HANOVER	REINDEER CT & DEAD END	1	EMS	17	12	Thu
1	40.258061	-75.264680	BRIAR PATH & WHITEMARSH LN; HATFIELD TOWNSHIP...	19446.0	EMS: DIABETIC EMERGENCY	2015-12-10 17:40:00	HATFIELD TOWNSHIP	BRIAR PATH & WHITEMARSH LN	1	EMS	17	12	Thu
2	40.121182	-75.351975	HAWS AVE; NORRISTOWN; 2015-12-10 @ 14:39:21-St...	19401.0	Fire: GAS-ODOR/LEAK	2015-12-10 17:40:00	NORRISTOWN	HAWS AVE	1	Fire	17	12	Thu
3	40.116153	-75.343513	AIRY ST & SWEDE ST; NORRISTOWN; Station 308A;...	19401.0	EMS: CARDIAC EMERGENCY	2015-12-10 17:40:01	NORRISTOWN	AIRY ST & SWEDE ST	1	EMS	17	12	Thu
4	40.251492	-75.603350	CHERRYWOOD CT & DEAD END; LOWER POTTSGROVE; S...	NaN	EMS: DIZZINESS	2015-12-10 17:40:01	LOWER POTTSGROVE	CHERRYWOOD CT & DEAD END	1	EMS	17	12	Thu

	lat	lng	desc	zip	title	timeStamp	twp	addr	e	Reason	Hour	Day
Month
1	13205	13205	13205	11527	13205	13205	13203	13096	13205	13205	13205	13205
2	11467	11467	11467	9930	11467	11467	11465	11396	11467	11467	11467	11467
3	11101	11101	11101	9755	11101	11101	11092	11059	11101	11101	11101	11101
4	11326	11326	11326	9895	11326	11326	11323	11283	11326	11326	11326	11326
5	11423	11423	11423	9946	11423	11423	11420	11378	11423	11423	11423	11423

	lat	lng	desc	zip	title	timeStamp	twp	addr	e	Reason	Hour	Month	Day	Date
0	40.297876	-75.581294	REINDEER CT & DEAD END; NEW HANOVER; Station ...	19525.0	EMS: BACK PAINS/INJURY	2015-12-10 17:40:00	NEW HANOVER	REINDEER CT & DEAD END	1	EMS	17	12	Thu	2015-12-10
1	40.258061	-75.264680	BRIAR PATH & WHITEMARSH LN; HATFIELD TOWNSHIP...	19446.0	EMS: DIABETIC EMERGENCY	2015-12-10 17:40:00	HATFIELD TOWNSHIP	BRIAR PATH & WHITEMARSH LN	1	EMS	17	12	Thu	2015-12-10
2	40.121182	-75.351975	HAWS AVE; NORRISTOWN; 2015-12-10 @ 14:39:21-St...	19401.0	Fire: GAS-ODOR/LEAK	2015-12-10 17:40:00	NORRISTOWN	HAWS AVE	1	Fire	17	12	Thu	2015-12-10
3	40.116153	-75.343513	AIRY ST & SWEDE ST; NORRISTOWN; Station 308A;...	19401.0	EMS: CARDIAC EMERGENCY	2015-12-10 17:40:01	NORRISTOWN	AIRY ST & SWEDE ST	1	EMS	17	12	Thu	2015-12-10
4	40.251492	-75.603350	CHERRYWOOD CT & DEAD END; LOWER POTTSGROVE; S...	NaN	EMS: DIZZINESS	2015-12-10 17:40:01	LOWER POTTSGROVE	CHERRYWOOD CT & DEAD END	1	EMS	17	12	Thu	2015-12-10

Hour	0	1	2	3	4	5	6	7	8	9	...	14	15	16	17	18	19	20	21	22	23
Day
Fri	275	235	191	175	201	194	372	598	742	752	...	932	980	1039	980	820	696	667	559	514	474
Mon	282	221	201	194	204	267	397	653	819	786	...	869	913	989	997	885	746	613	497	472	325
Sat	375	301	263	260	224	231	257	391	459	640	...	789	796	848	757	778	696	628	572	506	467
Sun	383	306	286	268	242	240	300	402	483	620	...	684	691	663	714	670	655	537	461	415	330
Thu	278	202	233	159	182	203	362	570	777	828	...	876	969	935	1013	810	698	617	553	424	354

Month	1	2	3	4	5	6	7	8	12
Day
Fri	1970	1581	1525	1958	1730	1649	2045	1310	1065
Mon	1727	1964	1535	1598	1779	1617	1692	1511	1257
Sat	2291	1441	1266	1734	1444	1388	1695	1099	978
Sun	1960	1229	1102	1488	1424	1333	1672	1021	907
Thu	1584	1596	1900	1601	1590	2065	1646	1230	1266