Graduate /
Thoughts about my MS Data Science Statement of Purpose [3]
to achieve academic success as a graduate student at the University of California
In a rapidly developing world with an enormously increasing number of data and computational efficiency, data science appears to be a key to solve complex issues the world is facing and improve our society further. I discovered my love for data science through academic training and how I could transform the knowledge I gained from classes to improve modern technology. My goal is to deploy more advanced products or services with data to benefit as many people as possible. Developing models to process with various types and inconsistent data is a challenge that I am determined to undertake. I believe the advancements in Data Science will be revolutionary in how data can be exploited in many different and various industries and how data can be designed to mitigate risk in each industry.
My passion for mathematics started in high school. I always questioned myself "why do we need to learn about concepts of derivatives and integrals?" With this intense curiosity, I decided to major in Applied Mathematics to find the right answer. As I took abstract courses, 'Numerical Analysis' and 'Linear Algebra', I could adequately find the satisfying answer and applications by constantly asking questions about my misinterpretation to professors and classmates until I fully comprehended. In addition, I passionately supported and taught other classmates who had the same question I had by explaining what they misunderstood step by step.
Hence, I could improve my mathematical knowledge more rigorously, such as how Newton's method quickly approximates optimal solutions for different models, Matrix Factorization estimates hidden or latent features in models, and the concept of Separating Hyperplane Theorem is applied in Support Vector Machine. As I found more mathematical applications and grasped the abstract concepts, I successfully transformed mathematical knowledge into Machine Learning theory to optimize the model and communicated effectively.
By choosing Applied Mathematics as my major at the University of California, Berkeley, I had great exposure to the field of statistics and immediately fell in love with it because the application of statistics was more discernible and I could intertwine the two fields to get my hands on the data. As I underwent advanced and project-based statistical coursework, I became competent in Python and R and produced statistically appropriate projects. In my first group programming project, I practiced collaborative programming and project management through Github. I was responsible for building a stratified random permutation test to verify computational answers to scientific questions using Hypothesis Test and building test functions with 99% code coverage in Python. Throughout the project, I improved quantitative storytelling and deeply understood how important it is to produce consistent code and reproducible results. I also learned to design robust statistical data analysis from data collection, data validation using statistical tests, and Poisson regression modeling within a team efficiently.
In addition, after graduating from my university, I participated in a Data Science Certificate program to continuously solve real-world data-driven problems and to keep myself with the latest trends in Data Science and Machine Learning. Through the certificate program, I could thoroughly scrutinize my learnings and construct end-to-end Machine Learning projects alone. As I developed the Artists Recommendation System, I could understand how customer churns hugely affect a company's revenue and how important it is to keep users as many as possible to boost a company's profit. By gathering more artists' information using Spotify API and exploring how Netflix implemented its own recommendation system using Matrix Factorization, I was able to determine an optimal way to retain users for digital streaming platforms as well as the characteristics of the recommendation system that would make it more likely to stand out.
Moreover, to gain basic knowledge of how deep learning is applied in a computer vision and to efficiently apply the advanced Machine Learning, I implemented Pokédex, which classified given Pokémons. First, I experimented to summarize each image with the Histogram of Oriented Gradients and then applied the Support Vector Classifier to summarized images. However, the result was not satisfying due to limited data. So, I ambitiously researched about Convolutional Neural Network and learned to implement it with Keras. As I applied image augmentation on the limited data with Keras and trained the CNN model through a cloud computing system, I could boost my accuracy and reduce the high false-positive rate. These projects not only taught me how to effectively answer business-related problems but also how to consistently research to improve the results significantly.
Apart from my academic and research endeavors, I believe delivering knowledge flexibly to non-technical people is as important as the research behind it. As Data Scientists will work with various departments and agencies, I practice explaining highly technical terms in-detail, but more approachable to non-technical peers by publishing presentations with descriptive statistics and clear visualization. During my graduate studies, I want to gain more publishing experience through cooperation with different departments and students.
I aim to deploy many different types of Machine Learning models not only to solve business-related problems for companies but also to benefit as many as people possible to free from criminal concern and live in a safer environment. Since security cameras tend to have poor resolution, it is very challenging to accuse suspects based on the footage. I want to leverage Machine Learning models to identify unsolved criminal cases to improve scientific investigations and to offer comfort to the bereaved family. To achieve this goal, my next long-term project is improving the quality of low-resolution videos to be clearly visible using Deep Learning. To handle the long-term research with my persistent ambition, I need to have more research experience to deeply understand image/video processing and to efficiently use practical tools required to work with data at scale.
With all these valuable skills demonstrated above, I'm confident that I will continue to achieve academic success as a graduate student at the University of California, Berkeley. With compacted data science and machine learning courses, it will reinforce my knowledge in data science and broaden the perspective of data. Moreover, I will rigorously integrate the skills I learn from the Master of Information and Data Science offered by the University of California, Berkeley to realize my dream.