I: Introduction

Starting in the Spring of 2021 all the way until the present, I have worked as an undergraduate research assistant in the Hegarty Spatial Thinking Lab headed by Professor Hegarty of UCSB’s psychology department. My main tasks were to assist various experiments with their data processing, cleaning, and visualizations. The majority of my work has been focused on assisting an honors student named Nikita Gupta who graduated in 2021 after completing her honor’s thesis for the psychology department. The main goal of her project was to test the spatial ability and working memory of STEM learners by having them perform various tasks such as the cube comparison and paper folding tasks. The entirety of her thesis can be found here.


II: Data

My first task involved cleaning the data collected by Nickita so that I would later be able to score the tasks performed by each of the participants in her experiments. Processing the data was a little rough as it was not collected in a very clean way. However, after some time, I was able to format the data in a way that I could analyze it easily in R. After separating the participants into “novices” and “naives” which represents STEM learners and non-STEM learners respectively, I was able to start scoring the tasks. The first image below shows the first ten participants’ scores for the cube comparison task which had a penalization of 1 point for every incorrect answer. The second image shows the first ten participants’ scores for the paper folding task which had a penalization of 0.25 points for every incorrect answer.



After obtaining the scores for both the tasks, I was asked to calculate an overall spatial score from the scores of the two tasks and compare the average spatial score between novies and naives. In order to do this, I first normalized the scores by calculating the z-scores for all the participants for both tasks. Then, a spatial score was created by averaging the z-scores for both the tasks. I then created a column that differentiates between novice and naive and looked at the average spatial score for each group. The first image below shows the first ten participants’ spatial scores along with their z-scores for each task. The second image shows that novices (STEM learners), had a higher average spatial score in comparison to naives (non-STEM learners).



After completing this, I passed the results on to Nickita who was now able to continue with her writeup for her thesis project.


III: Visualization

In the following weeks, I was asked to do some analysis of the response times for another aspect of Nickita’s experiment known as the rotation task which had participants look at molecular structures and how much it rotated. More information about this task can be read in her paper. I began by calculating the percentage of “no response” questions and the percentage of incorrect answers for each participant. I was told to remove all participants who had an incorrect rate of over 50% which could skew the data as they were probably just guessing their way through the task. I then created a histogram that displays the general response times for the rotation tasks. Histogram shown below.


I then performed further analysis by filtering to only correct answers to see the response times for questions that were answered correctly. As seen below, the distribution looks similar to that of the general response time histogram.



Next, I wanted to see the mean response times based on the type of rotation task that the participant was shown. First image shows the general mean response times, while the next two show the mean response times specific to whether or not they are a STEM learner or not.




As seen above, Single Swap rotations tend to have the highest average response time. In addition, novices seem to have a higher average response time in comparison to naives for most types of rotation. This could be explained by the fact that naives were truly guessing their way through the tasks while the STEM-learners who took organic chemistry were truly trying to figure out the correct answer.

I then ran a mixed linear regression in order to see all the variables, along with their interaction terms that were statistically significant in determining response time. As seen below, the four interaction terms that were significant in determining response time were detection:rotation, detection:expertise, rotation:expertise, and detectSwap:expertise.


One thing that I was pretty surprise about was the fact that expertise itself was not statistically significant in determining response time even though I graphically showed that novices have a higher average response time in comparison to naives.


IV: Update: March 22, 2022

During Winter quarter of my fourth year at UCSB, I was tasked with doing data analysis for another honors thesis student in the psychology department, Mable Zhau. The topic of her project surrounded way-finding and spatial thinking given a learned route. In short, her experiment included teaching the participant a “learned route” around family-student housing at UCSB while pointing out several landmarks that the participant should remember such as a statue or a sign. Next, she had the participant go through 10 different trials which required them to walk from one landmark to the next one stated by Mable. For example, she would tell them to walk from the statue to the fire hydrant and the gps attached to the participant would track their coordinates to see the route in which they took.

My first task was read in all 63 gpx files for each participant into a python notebook which was relatively difficult as the data was hard to extract from the gpx files. It was tedious to get the data into a clean pandas-like format. After reading in each individual file and cleaning it, I was able to concatenate them into a single data frame that contained a row for each coordinate that the gps recorded for each trial for each participant. In total, the dataset contained over 3000 coordinates/rows. An example of the data frame can be seen below for the first participant’s first trial.



As seen above, for this specific participant, 14 coordinates were recorded by the gps machine for the first trial of the experiment. Each row contains the exact time, latitude, longitude, altitude, participant number, and trial number. After this was completed, I was asked to calculate the total distance traveled along with the total time elapsed for each trial for each participant. An example of this data frame for the first participant can be seen below.



Each row contains the participant number, trial number, total time elapsed, and total distance traveled in meters. Distance was calculated using python’s geopy package which allows us to calculate distance between coordinates. Calculating total distance and time gave Mable a better understanding of how each participant traveled from landmark to landmark.

My next job was to plot each participant’s trajectory for each trial. I went about this by using matplotlib and plotting each trial in its own subplot using standard coordinate scales for each trial. An example for participant 3 is shown below.



I was then asked to do the same thing but plot all the trials for a single participant on one map, separating each trial by color, while also including points for the landmarks. In addition, I was asked to plot the learned route over the map. Example for participant 3:



The last task for plotting that I was presented with was to plot each trial for each participant over an image raster that corresponded to each trial which Mable provided to me. One image was created for each trial for each participant so in total there are 630 plots. An example of trial 1 for participant 3 can be seen below.



Each image specifies the trial number as well as the specific landmarks traveled to and from. The learned route and shortcut are labeled while the actual route taken is plotted over them.

Finally, I was tasked with calculating shortcut and learned route efficiency by taking the distance traveled the participant for each trial and dividing it by shortcut distance and then learned route distance. Obviously, this would allow us to see the efficiency that each participant traveled for each trial. For example, if participant 3’s efficiency for the learned route is 2.5, that means this participant traveled 2.5 times the distance of the learned route which is the route that they were originally taught to take. On the other hand if the shortcut efficiency is 2.5, that means the participant traveled 2.5 times the shortcut distance (route that they were not taught). An example of participant 3’s efficiency for all 10 trials shown below.



Overall this project has been interesting to me as I get to see how different individuals think spatially and how we all navigate differently depending on personality traits or spatial ability. For example, people with lower spatial ability may stick to the learned route that they were taught instead of attempting to take a shortcut. In terms of personality traits, someone who is more risk averse may decide not to risk taking a shortcut and follow the learned route instead. I am interested to see the final report of this thesis and I will update this page with a link once done.