# Data analysis

 Search for Data analysis on Wikipedia.

Data analysis is the process of looking at and summarizing data with the intent to extract useful information, make inferences, and develop conclusions. Using statistical or numerical software applications, data analysis can be pursued using a range of techniques, including statistics.

Note that "data analysis" assumes different aspects, and possibly different names, in different fields.

## Activities

The first activities relate to diagram above and the embedding of data analysis into decision making processes.

• (Decision Making) Explain why data analysis is relevant for evidence based decision making.
• (Use Case) Look at the diagram above and look at your field of expertise. Populate the different steps with a workflow with raw data that you have access to.
• Spatial Decision Support Systems (SDSS) bridge the domain of data analysis and decision support and dealing with spatial data. Look at the domain of Transportation. Create a workflow for the analysis of spatial data for transported goods together with the trucks, ships, trains, planes, ... and identify sustainable ways of transport goods and services to a customer.
• address the data analysis for using the capacities of trucks and trains, so that driving without cargo or minimal usage of the capacity is reduced. Identify indicators in the data analysis to address specific Sustainable Development Goals. Identify which SDGs are addressed and how the definition of SDGs determine the used methodologies for the data analysis. Describe the data analysis foundations that are required to measure the impact of intervention for a sustainable system of transport and delivery.
• (Machine Learning) Data can be processed with machine learning. Compare methodologies of classical statistical analysis and machine learning as one way of performing data analysis. What are similarities, differences, benefits and drawbacks between those approaches?
• (Digital Learning Environment) Consider digital learning environments and the diagram above.
• Consider a specific learning environment in your domain. How would a teacher select appropriate learning tasks tailored for the student in a way, that the exercise is challenging enough and too complex? What are the indicators (required information) for the teacher to the exercise or the support the teacher provides appropriate to the specific requirements and constraints of the student/learner?
• Now we transfer that to data analysis (in this case learner analytics. Identify data that can be collected in a digital learning environment, that could be used to support the teacher in providing tailored teaching and learning material to the student?
• Choose from your current knowledge about data analysis an appropriate methodology to analyze the collected data. Start from very basis methodologies of
• means,
• standard deviation,
• worst case, best case,...
• ...

## Wiki2Reveal Slides

The following Wiki2Reveal presentations can be used by lecturers as Open Educational Resources to support their course work in addition to standard statistical and numerical approaches to process and analyze data.

### Chapter 1 - Introduction

• (1) Identify an application scenario for which you want to apply your data analysis. Write a small summary of your project (e.g. a Bachelor, Master, PhD thesis).
• (2) Describe the experimental design in which the data will be collected.
• (3) Provide one scenario,
• (3.1) in which you have a fixed time for data collection and after data collection the data analysis starts and
• (3.2) in which you get a constant input stream of data that has to be processed in a continuous way with an appropriate methodology for dynamic reporting and dynamic data analysis in real time scenario
in Bachelor, Master, PhD thesis you will have mainly scenario (3.1). In this case it is just an exercise to extrapolate from (3.1) to a scenario (3.2) that handles a constant input stream of data for a dynamic analysis.
• (4) Swarm Intelligence compare the data analysis workflow in the diagram mentioned above. For swarms data is coming in not to a central swarm container and it is analyzed centralized. Individuals in a swarm perceive different information/data and the swarm responds to the perceived to data as a group. Identify analogies and differences in data analysis on a qualitative level.

### Chapter 2 - Data Clean Up - Processing of Raw Data

The section addresses the preprocessing of data.