Hi all, it's great to find this place. I'm a computer science student with research experiences mainly in machine learning and I am applying for data science graduate programs at a few US universities. Below is my SOP for one program, I will appreciate it if you can point out any grammar errors or provide any comments. Thank you all in advance!
=====================================================================================
What, exactly, is a hacker? For me, the word "hacker" has more connotations of people who constantly improve themselves to generate good solutions with intellectual curiosity, and less about people who break into others' computers. In the age of data explosion, it's a thrill for hackers to dig into massive data, discovering secrets and solving problems. Therefore, Master Degree in Data science is a logical culmination of my passion, tackling real-world challenges through data science.
The data science and management course I attended at XXX University has triggered my strong interest to solve problems with big data. The program introduced me to fundamental data science knowledge, including database system design, social network analysis as well as statistic tools. I used R to analyze pair panels graphics, generated linear regression models to predict students' GPA scores, based on their GRE scores. One project that I found particularly interesting was looking at data from social media users to find the relationships between followers and the people they follow. By identifying an influencer in a health care forum with PageRank and centralities, companies could deliver healthcare information and related promotions to certain social circles more efficiently.
After returning to college after my visit to XXX, I kept on thinking of ways to analyze social network data and make the most of them. In addition to the field of healthcare, I believe that this technology could benefit many other disciplines, such as business and political elections. So, I started another research project at the XXX lab, where I used CUDA parallel programming to accelerate network centrality calculations. I boosted the efficiency by paralyzing the centrality calculations and using a directed graph with more than 1,000 nodes and edges, the total computation time was reduced from 192 to 2 ms. The power of GPU calculation enabled massive data processing, which can be used to achieve real-time network centrality calculations. In this way, a politician could identify opinion leaders, as well as their attitudes to politics, and adjust her election strategies accordingly.
If computational hardware and data are papers and pigments, then machine learning techniques are the fine brushes to create beautiful artworks. As a research intern in XXX lab in 2016, I trained a classifier to identify obstacles on a road with deep learning framework DIGITS - a useful technique for unmanned vehicles. My team labeled the pictures to be tested and then trained AlexNet to identify them. However, due to a number of mistaken configurations, the system's performance was not ideal. After reading papers discussing deep learning network structures, I enlarged the dataset and adjusted the learning rate and batch size, and with these adjustments, I boosted the accuracy of the model to over 80%. The good understanding of machine learning can promote the better utilization of data.
Currently, I'm working on a project about professional basketball, with the new availability of player tracking data. Traditional methods of tactical analysis largely rely on the knowledge and manual labor of domain experts; Being a huge basketball fan, I have wondered why I am unable to identify frequently appearing patterns in professional basketball matches that describe teams' tactics and behaviors without manually label all offensive tactics. I plan to generate features that differentiate plays from one another, then use clustering techniques to extract frequently appeared player trajectories, and thereby help teams to better understand how both they and their opponents play. Even I'm an amateur basketball fan, by utilizing the power of machine learning, I believe I can discover the tactics of professional plays as domain experts, but even faster and more creative.
The advent of big data presents both opportunities and challenges for hackers, and only those willing to educate themselves and to use the cutting-edge technology will benefit from it. My experiences of machine learning and high-performance computation will allow me to approach these developments with creative and rigorous thinking. Your program would be valuable to me in several ways. First, I can deepen my understanding of statistical models, hone up skills of selecting right strategies for different problems through statistical modules. Further, the project-focused courses can provide chances using computational techniques to work through real-world challenges. Ultimately, ***'s core value to promote the betterment of society, and further our understanding of the nature of the universe will guide me throughout my life-long career. I believe all these experience will be hugely beneficial for me to be a real hacker, solving problems skillfully with real-world implications, as I always want to be.
=====================================================================================
big data - opportunities and challenges for hackers
What, exactly, is a hacker? For me, the word "hacker" has more connotations of people who constantly improve themselves to generate good solutions with intellectual curiosity, and less about people who break into others' computers. In the age of data explosion, it's a thrill for hackers to dig into massive data, discovering secrets and solving problems. Therefore, Master Degree in Data science is a logical culmination of my passion, tackling real-world challenges through data science.
The data science and management course I attended at XXX University has triggered my strong interest to solve problems with big data. The program introduced me to fundamental data science knowledge, including database system design, social network analysis as well as statistic tools. I used R to analyze pair panels graphics, generated linear regression models to predict students' GPA scores, based on their GRE scores. One project that I found particularly interesting was looking at data from social media users to find the relationships between followers and the people they follow. By identifying an influencer in a health care forum with PageRank and centralities, companies could deliver healthcare information and related promotions to certain social circles more efficiently.
After returning to college after my visit to XXX, I kept on thinking of ways to analyze social network data and make the most of them. In addition to the field of healthcare, I believe that this technology could benefit many other disciplines, such as business and political elections. So, I started another research project at the XXX lab, where I used CUDA parallel programming to accelerate network centrality calculations. I boosted the efficiency by paralyzing the centrality calculations and using a directed graph with more than 1,000 nodes and edges, the total computation time was reduced from 192 to 2 ms. The power of GPU calculation enabled massive data processing, which can be used to achieve real-time network centrality calculations. In this way, a politician could identify opinion leaders, as well as their attitudes to politics, and adjust her election strategies accordingly.
If computational hardware and data are papers and pigments, then machine learning techniques are the fine brushes to create beautiful artworks. As a research intern in XXX lab in 2016, I trained a classifier to identify obstacles on a road with deep learning framework DIGITS - a useful technique for unmanned vehicles. My team labeled the pictures to be tested and then trained AlexNet to identify them. However, due to a number of mistaken configurations, the system's performance was not ideal. After reading papers discussing deep learning network structures, I enlarged the dataset and adjusted the learning rate and batch size, and with these adjustments, I boosted the accuracy of the model to over 80%. The good understanding of machine learning can promote the better utilization of data.
Currently, I'm working on a project about professional basketball, with the new availability of player tracking data. Traditional methods of tactical analysis largely rely on the knowledge and manual labor of domain experts; Being a huge basketball fan, I have wondered why I am unable to identify frequently appearing patterns in professional basketball matches that describe teams' tactics and behaviors without manually label all offensive tactics. I plan to generate features that differentiate plays from one another, then use clustering techniques to extract frequently appeared player trajectories, and thereby help teams to better understand how both they and their opponents play. Even I'm an amateur basketball fan, by utilizing the power of machine learning, I believe I can discover the tactics of professional plays as domain experts, but even faster and more creative.
The advent of big data presents both opportunities and challenges for hackers, and only those willing to educate themselves and to use the cutting-edge technology will benefit from it. My experiences of machine learning and high-performance computation will allow me to approach these developments with creative and rigorous thinking. Your program would be valuable to me in several ways. First, I can deepen my understanding of statistical models, hone up skills of selecting right strategies for different problems through statistical modules. Further, the project-focused courses can provide chances using computational techniques to work through real-world challenges. Ultimately, ***'s core value to promote the betterment of society, and further our understanding of the nature of the universe will guide me throughout my life-long career. I believe all these experience will be hugely beneficial for me to be a real hacker, solving problems skillfully with real-world implications, as I always want to be.