Hi! I'm writing my statement of purpose for the application to the Data Science Master Program at EPFL. I am not a native English speaker and I have tried my best to polish the statement. But I think I still need some help in terms of language and content. Any feedback/help would be greatly appreciated!!
Here is the requirement from the University:
The statement of purpose should not exceed 1000 words. You are required to describe your academic background and your career strategy. Please be precise about the objectives you wish to reach through your studies at EPFL.
Some of my background that may help: undergraduate major in Computer Science, GPA of 91/100, have some research experience on medical data/image processing and some publications.
Below is my statement of purpose:
My desire to study in Data Science derives from my experience in medical image processing and analysis. Discovering different data patterns in medical data streams plays a vital role in healthcare systems. I would like to pursue data science knowledge in this program, working to transform medicine with data-enabled healthcare systems.
Inspired by a lecture when I just entered the university, I joined the CAD Research Center in my sophomore year, working with Dr. XXX for the past two years. In the lecture, a professor contrasted a raw Computed Tomography (CT) image of a lung with a processed CT image. The clarity after processing impressed me-even a layman like me can easily tell where the lesion is. I realized the important role medical data such as medical image plays in the diagnosis of diseases. Therefore, I was motivated to study medical data using deep learning methods for the past two years.
Though it was unfamiliar, my desire to explore medical imaging drove me to learn the basics of deep learning and data processing. Then, I took charge of a National Undergraduate Innovation Project, studying the auto-diagnosis of diabetic retinopathy by segmenting the hard exudates in fundus images that cause blindness. Since it requires proficient expertise to label the images, the dataset we can access is very small with fewer than 100 images. At the first stage, we used a sliding-window method to generate patches from the original images and fed them to CNN. Despite the high accuracy, the time and computation consumption stymie real-world applications. Hence, we switched to U-net backbone, which is more effective for small datasets. I optimized the simple linear iterative clustering (SLIC) algorithm to pre-segment the images and extract patches based on the pre-segmentation to expand the training set. Then, I experimented with the methods carefully and eventually obtained a satisfying result: an accuracy of 97.95% and a sensitivity of 96.38% respectively. I was greatly encouraged when our work was published in IEEE Access with myself as the first author and my work in problem-solving earned me praise from the professors.
With the proposed network, I am leading a team of three to collaborate with a local hospital to develop an automated system, where the doctors can input the original fundus images they collected from the fundus camera and then get the prediction result of hard exudates and reference of the diabetic level. However, we found that it only works well within the same dataset. When the training set and testing input looks intuitively different, which might be due to the different cameras or light conditions and is quite common, the prediction accuracy is low. This drove me to think about the issue of model generalization with different data distribution, since it is impractical to train the model with all kinds of datasets. In order to strengthen the generalization ability of the deep model, we need to find an approach to solve the domain discrepancy from a statistical point of view. We are now researching methods of domain adaption based on data divergence of the images, and I really enjoy this meaningful work as it serves as a bridge between the research and application.
Applying machine/deep learning methods to multi-modal data for better clinical decisions is another direction of my research. My current research is about predicting the progression of age-related macular degeneration (AMD) with multi-modal data. By combining OCT scan, genetic and demographic data, more accurate prediction can be made, and thus appropriate intervention and timely treatment would be introduced for patients to prevent permanent visual damage. I hope to continue to study this field, unlocking the potential of medical data in different patterns.
Based on my knowledge in deep learning and data processing, I worked remotely with the autonomous driving group at the University of California, Berkeley for the INTERPRET Challenge about trajectory prediction. Together with the group, I cleaned the unnecessary features and visualized the data to images by each frame. Then, I used the ResNet50 to extract features and utilized Variational Auto-Encoder to predict the future trajectory of targeted vehicles. Finally, I ranked the top 5 in the first stage of the challenge. This is a valuable experience that I learned how to analyze complicated raw data and extract useful information. In the meantime, I learned how to collaborate with others more efficiently by working with our passionate team.
The difficulties I overcame and the knowledge I learned in the process of research further reinvigorated my interest in data science and its application to healthcare. Yet, I know I need a more systematic study to aid my future research if I want to go deeper into it. This Data Science program dovetails perfectly with my ambitions. The curriculum is attractive not only as it provides core courses to lay a foundation in mathematics but for the freedom to choose the optional courses, like Computer Vision and Deep Learning. And the master project and internship give me a chance to apply what I would have learned to the industry. I wish to be a researcher who is capable of collaborating with industries and clinicians to explore practical machine learning algorithms to benefit patients, not just leaving the methodologies in theories.
Moreover, I always desire to study at EPFL since the SLIC Superpixel algorithm I used in my research, my computer graphics enlightenment algorithm, was proposed by Dr. Radhakrishna Achanta et al. from EPFL. Besides, I have been following the CVLAB at EPFL for a long time. Their researches in deep domain adaption and biomedical imaging are very inspiring to me. I look forward to joining their group in the future.
My career goal is to remain in academia, first as a Ph.D. student, so that I can pursue my research interests while contributing to the field of healthcare with data science technologies. I believe this program resonates well with my passion and would contribute to my ultimate goal: transforming healthcare with data science. In the end, I sincerely hope to be admitted after your full consideration, and I believe that I am well-positioned for this pursuit.
In the end, thanks again for any help and feedback in advance!
your academic background and career strategy
Here is the requirement from the University:
The statement of purpose should not exceed 1000 words. You are required to describe your academic background and your career strategy. Please be precise about the objectives you wish to reach through your studies at EPFL.
Some of my background that may help: undergraduate major in Computer Science, GPA of 91/100, have some research experience on medical data/image processing and some publications.
Below is my statement of purpose:
My desire to study in Data Science derives from my experience in medical image processing and analysis. Discovering different data patterns in medical data streams plays a vital role in healthcare systems. I would like to pursue data science knowledge in this program, working to transform medicine with data-enabled healthcare systems.
Inspired by a lecture when I just entered the university, I joined the CAD Research Center in my sophomore year, working with Dr. XXX for the past two years. In the lecture, a professor contrasted a raw Computed Tomography (CT) image of a lung with a processed CT image. The clarity after processing impressed me-even a layman like me can easily tell where the lesion is. I realized the important role medical data such as medical image plays in the diagnosis of diseases. Therefore, I was motivated to study medical data using deep learning methods for the past two years.
Though it was unfamiliar, my desire to explore medical imaging drove me to learn the basics of deep learning and data processing. Then, I took charge of a National Undergraduate Innovation Project, studying the auto-diagnosis of diabetic retinopathy by segmenting the hard exudates in fundus images that cause blindness. Since it requires proficient expertise to label the images, the dataset we can access is very small with fewer than 100 images. At the first stage, we used a sliding-window method to generate patches from the original images and fed them to CNN. Despite the high accuracy, the time and computation consumption stymie real-world applications. Hence, we switched to U-net backbone, which is more effective for small datasets. I optimized the simple linear iterative clustering (SLIC) algorithm to pre-segment the images and extract patches based on the pre-segmentation to expand the training set. Then, I experimented with the methods carefully and eventually obtained a satisfying result: an accuracy of 97.95% and a sensitivity of 96.38% respectively. I was greatly encouraged when our work was published in IEEE Access with myself as the first author and my work in problem-solving earned me praise from the professors.
With the proposed network, I am leading a team of three to collaborate with a local hospital to develop an automated system, where the doctors can input the original fundus images they collected from the fundus camera and then get the prediction result of hard exudates and reference of the diabetic level. However, we found that it only works well within the same dataset. When the training set and testing input looks intuitively different, which might be due to the different cameras or light conditions and is quite common, the prediction accuracy is low. This drove me to think about the issue of model generalization with different data distribution, since it is impractical to train the model with all kinds of datasets. In order to strengthen the generalization ability of the deep model, we need to find an approach to solve the domain discrepancy from a statistical point of view. We are now researching methods of domain adaption based on data divergence of the images, and I really enjoy this meaningful work as it serves as a bridge between the research and application.
Applying machine/deep learning methods to multi-modal data for better clinical decisions is another direction of my research. My current research is about predicting the progression of age-related macular degeneration (AMD) with multi-modal data. By combining OCT scan, genetic and demographic data, more accurate prediction can be made, and thus appropriate intervention and timely treatment would be introduced for patients to prevent permanent visual damage. I hope to continue to study this field, unlocking the potential of medical data in different patterns.
Based on my knowledge in deep learning and data processing, I worked remotely with the autonomous driving group at the University of California, Berkeley for the INTERPRET Challenge about trajectory prediction. Together with the group, I cleaned the unnecessary features and visualized the data to images by each frame. Then, I used the ResNet50 to extract features and utilized Variational Auto-Encoder to predict the future trajectory of targeted vehicles. Finally, I ranked the top 5 in the first stage of the challenge. This is a valuable experience that I learned how to analyze complicated raw data and extract useful information. In the meantime, I learned how to collaborate with others more efficiently by working with our passionate team.
The difficulties I overcame and the knowledge I learned in the process of research further reinvigorated my interest in data science and its application to healthcare. Yet, I know I need a more systematic study to aid my future research if I want to go deeper into it. This Data Science program dovetails perfectly with my ambitions. The curriculum is attractive not only as it provides core courses to lay a foundation in mathematics but for the freedom to choose the optional courses, like Computer Vision and Deep Learning. And the master project and internship give me a chance to apply what I would have learned to the industry. I wish to be a researcher who is capable of collaborating with industries and clinicians to explore practical machine learning algorithms to benefit patients, not just leaving the methodologies in theories.
Moreover, I always desire to study at EPFL since the SLIC Superpixel algorithm I used in my research, my computer graphics enlightenment algorithm, was proposed by Dr. Radhakrishna Achanta et al. from EPFL. Besides, I have been following the CVLAB at EPFL for a long time. Their researches in deep domain adaption and biomedical imaging are very inspiring to me. I look forward to joining their group in the future.
My career goal is to remain in academia, first as a Ph.D. student, so that I can pursue my research interests while contributing to the field of healthcare with data science technologies. I believe this program resonates well with my passion and would contribute to my ultimate goal: transforming healthcare with data science. In the end, I sincerely hope to be admitted after your full consideration, and I believe that I am well-positioned for this pursuit.
In the end, thanks again for any help and feedback in advance!