Syllabus database for doctoral courses

    Startpage
  • Syllabus database for doctoral courses

SYLLABI FOR DOCTORAL COURSES

Print
Swedish title En introduktion till datavetenskap för registerbaserad forskning
English title An Introduction to Data Science for Register-based Research
Course number 5591
Credits 1.5
Responsible KI department Institutet för miljömedicin
Specific entry requirements Course 5401 “Key Concepts and Principles for Design and Critical Interpretation of Register-based Studies”
Grading Passed /Not passed
Established by The Committee for Doctoral Education
Established 2022-05-20
Purpose of the course The purpose of this course is to, in the context of register-based research, introduce good data practice (node 1), visualization of data and results (node 2), and give an overview of complex analytical tasks, focusing on simulation, and imputation (node 3) in the context of register-based research.
Intended learning outcomes After successfully completing this course, the student is expected to be able to:

Good data practice (node 1):
• Describe the major steps in a data management plan, based on the FAIR data principles.
• Explain the major steps on how to document and submit a data set for storage at a data repository.
• Outline how to find and access data stored at a data repository, including getting all necessary permissions.

Visualization of data and results (node 2):
• Describe the principles of data presentation and how to summarise complex information.
• Based on a specific research question and a specified set of variables, suggest how to visually describe data, overall results, and heterogeneity.

An overview of complex analytical tasks, simulation, and imputation (node 3):
• Suggest how simulations can be used, argue how to approach data problems and detect change in results, and suggest different applications.
• Describe basic concepts in relation to missing data. Outline advantages and limitations of multiple imputation compared with common alternatives, including complete case analysis, a missing data indicator, and single imputation.


Contents of the course This course focuses on applications in register-based research. It is divided into three aligned nodes: good data practice (node 1), visualization of data and results (node 2), and complex analytical tasks, focusing on simulation and imputation (node 3). The course will review and discuss how to apply different systems, processes, approaches, and methods to extract insights and information from register data. It will enable students to complement their current knowledge of statistical analysis and programming with tools commonly used in the field of big data and data science. Specifically, students will learn about processes that optimise the reuse of study data (node 1), ways in which to visualize multivariate data (node 2), and how to use simulation methods in applied research and as a technique to deal with missing data (node 3). The lectures will provide introductory information in these topics, as well as provide the students with a foundation to further develop their understanding in the three nodes.

Node 1 will discuss the FAIR principles, and more specifically how data from register-based studies can be made findable, accessible, interoperable and reusable, and how these principles can be used when formulating a data management plan. An overview of national and international data repositories will be given, including storage and retrieval of research data from such repositories.

Node 2 will start with an overview of commonly used approaches for visualization, suitable for presentation of complex data, overall results and heterogeneity of results. Several empirical examples will be given. The link between the research questions and appropriate visualization techniques will be discussed.

Node 3 will provide an overview of more complex analytical tasks within register-based research with a particular focus on simulation and imputation. Different types of simulation approaches will be discussed together with their basic principles and illustrated with empirical examples. Different types of missing data will be covered - missing completely at random, missing at random and missing not at random - as a basis for how the assumption of type of missingness impacts the analytical decisions. Focus will then be on multiple imputation. Advantages and limitations of multiple imputation compared with common alternatives, including complete case analysis, a missing data indicator, and single imputation, will be outlined.
Teaching and learning activities Lectures, demonstrations, and various forms of group and individual assignments on different topics. The course focuses on active learning, i.e. putting knowledge into practice and critically reflecting upon the knowledge, rather than memorising facts.
Compulsory elements The individual, summative examination.
Examination To pass the course, the student must show that the learning outcomes have been achieved. Assessments methods used are group tasks (formative assessments) along with a written individual task (summative assessment). The examination is viewed as a contribution to the development of knowledge, rather than as a test of knowledge. Students who do not obtain a passing grade in the first examination will be offered a second examination within two months of the final day of the course. Students who do not obtain a passing grade at the first two examinations will be given top priority for admission the next time the course is offered.
Literature and other teaching material There is no mandatory literature. Handouts, links to relevant resources and suggested scientific articles will be distributed during the course.
Course responsible Anita Berglund
Institutet för miljömedicin


Anita.Berglund@ki.se

Contact person Johanna Bergman
Institutet för miljömedicin


johanna.bergman@ki.se

Nobels väg 13

17177
Stockholm