Univ. Belgrade Prof L.Č. Popović, Prof. A. Kovacevic, Prof. D. Ilić | Big Data in Space Science (S3, elective, 6 ECTS) |
Learning Outcomes: | This course will address following learning priorities: Communication, Critical Thinking, Information Literacy, Self-Directed Learning, and Technology Use. Upon completion of this course, students will be able to handle and apply the tools and techniques for processing large data in their original research areas as well as for eventual applications in the space industry. |
Knowledge and Understanding: | This course is designed to give students an introduction to methods for getting the most information out of massive data sets using modern machine learning techniques. The core of the course will be based on astronomical data as it is a particularly rich dataset for exploration with machine learning. However, we will use as broad of a data set from physics (and perhaps industry) as possible, and this course should be useful to all students interested in data analysis in the physical sciences and industry |
Applying Knowledge and Understanding: | Every day large amounts of data relating to cosmic research are collected, using ground based and space based telescopes, as well as those collected from missions that observe Earth from space (eg Copernicus program of satellites). Earth observation data from satellites can be used for various human activities on Earth, from sociological (migration monitoring), biological, industrial, telecommunication, to those related to the study of climate change. First, students will know what type of data can be obtained from space research. They will know how to apply exploratory data analysis and generate statistics for unfamiliar data sets. Demonstrate fluence in coding with Python in the Jupyter Notebook/Lab environment. They will be able to apply several machine learning techniques specifically to data sets within their research areas. |
Prerequisites | Experience with Anaconda Python, SciPy, NumPy, as well as GitHub would be ideal, but is not required |
Program | Introduction: The method and technique of collecting data in astronomy using telescopes and satellites. Methods of collecting satellite data for Earth Observation. The aims of these observations and their application in research and practical application. Introduction to large databases and their organization. Platforms of large data bases and storage of large data. Big data in space science. Large Data Providers in Space Science: Large Sky Surveys LSST, EELT. Database mining with SQL and Python, introduction to Flexible Image Transport System (FITS), FITS average and median, effective way of comparing data from different databases (cross-matching data), displaying large data from Earth’s surveying satellites: visualisation of large data on the map Dimensional Reduction: PCA, PCA kernel, PCA as noise filter in data, introduction to Scikit Learn, Hyperparameters and model validation, best model selection, categorical image characteristics, inserting inaccessible data, Bayesian classification, Regression, Classification and Clusters, Application Python in Machine Learning. Data mining algorithms, Training models, Support Vector Machines with the application of recognition of parts of complex images, Decision Trees and Random forest with application, Kernel density estimation with application on recognition of parts of complex images, completed project from Mechanical Learning in space science. |
Description of how the course is conducted | Classes will be roughly evenly divided between lecture and hands-on practice. Students will be working through code examples during the course of the lecture. Contributions of students real-time solutions will contribute to enhancement of their class participation score. There will be a programming/reading assignment for homework in order to to help synthesize the topics covered in lectures. Three or one person groups will research/conduct a project. Students will write a 6+ page paper (not including figures [which are strongly suggested] and references [required]) and give a 20-minute presentation to the class. Presentations are strongly recommended to be in the form of a Jupyter notebook. Groups should clearly delineate the roles of each student in the project. |
Description of the didactic methods | Please refer on the point above |
Description of the evaluation methods | The final exam will be orally and practically defended term project which is defined at the beginning of the semester. |
Adopted Textbooks | Aurélien Géron, Hands-On Machine Learning with Scikit-Learn and TensorFlow Concepts, Tools, and Techniques to Build Intelligent Systems, 2017, O’Reilly Media, 1- 574 Jake VanderPlas, Python Data Science Handbook, 2017, O’Reilly Media, |
Recommended readings | Statistics, Data Mining, and Machine Learning in Astronomy: A Practical Python Guide for the Analysis of Survey Data by Ivezic, Connolly, VanderPlas, and Gray (ISBN: 9780691151687). Students will also be required to gain practical experience with broad types of data sets using the software learning environment at datacamp.com, which can also serve as a refresher for Python programming. Supplemental materials include a more advanced book: The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman. |