Data Analysis and Machine Learning

"Listening to the data is important… but so is experience and intuition. After all, what is intuition at its best but large amounts of data of all kinds filtered through a human brain rather than a math model?” – Steve Lohr, New York Times

In the Spring semester of my sophomore year at Boston College (before I began studying Computer Science) I took an Information Systems course with Prof. Ransbotham called Analytics and Business Intelligence. Slightly panicked as I still didnt know what I wanted to study, I found myself the odd one out among a number of business and MBA students. As I began to find my niche within the class as the this is going to change everything apocalyptist, it became increasingly clear to me that taking this course would wind up defining the rest of my studies.

Although in this course we focused on the managerial implications of data mining and business-realted outcomes of  data-driven decision making, I was intrigued by the possibilities. We ended up doing some hands-on coursework such as this project in which we anayzed a subset of the Yelp Academic dataset using R and Tableau, but I craved more technical work in this field. I latched onto Prof. Ransbotham and stayed close with him as I began down the Computer Science path, disregarding the future demise of my GPA.

The rest is history. And although I'd always had an affinity for understanding machinines, ripping electronics apart and even jailbreaking others' iPhones in high school for extra cash, studying CS gave me a new appreciation for the impressiveness of the devices and applications we use each day, how incredible the technical infastructure our society is already built upon is, and what the future has in store given how much data everyone/thing produces each day.

Piggy-backing off my similar interests in the realm of Virtual Reality (VR), I decided to build a project for Big Data Research Day at Boston College that would help explain how sensors in smartphones and VR headsets make this technology possible. While most projects at BIg Data research day focused on big data as a reference to the volume of data being analyzed, this project took a look at the speed at which data needs to be processed to maintain a suitable framerate and not induce motion sickness.

More recently, I had the opportunity to take Machine Learning (CSCI 3345 Fall '16) with Prof. Alvarez where I was able to take my understanding of the theory, methods and applications of data science to the next level. We covered topics ranging from generalization bounds and error measures to gradient decsent, radial basis functions and deep learning. The commulation of my coursework in ML resulted in a projected that studied the effectivness of Recurrent Neural Networks for text generation compared to a more simple Markov Model using 600 episodes of The Simpsons as training data.

While studying the intracacies of different types of learning models and the trade-offs between them, I was working on a semester-long project for Algorithms (CSCI 3383). The project was simple: given an image, a dataset of images and an integer k, find the k most similar images in the dataset to the given images. We were allowed to assume that all of the images were binary matrices ( just 1s and 0s) and that were to be graded on a tradeoff between speed and accuracy with speed being most important. Given the fact that my mind was in ML mode, our group ended up taking an approach synthesizing techniques from Computer Vision, Machine Learning and Algorithmic Analysis and can be found here.

My realtionship with data science and anlytics predated and even pushed me towards taking the step towards studying Computer Science. But even with years of academic and practical knowledge using and implementing different machine learning models, the gap between what I know and what I seek to understand in this field continues to grow.