Project 2 – Data & Methods

Short Description

Similar to Project 6 also Project 2 ‚Data and Methods‘ addresses the development of methods. Hereby methods for data treatment and preparation as well as techniques to improve outcome and validity of computer-simulations are focussed


Core objectives of the project can be summarised by the following research questions:

  • How to validly determine outliers in highly complex health-care data-sets?
    In case a data-base is distorted by outliers, even the most innovative statistical methods are condemned to fail. These outliers distinguish themselves by significantly unusual values and may appear due to data- and measurement-errors. Anyway they vitiate statistics and have to be removed before statistical methods are applied. As health-care data is highly complex and inhomogeneous the determination of outliers is very difficult and requires new technology and methods.

  • How can we standardise population and population data, respectively?
    This question strongly correlates with the reproducibility of data. As calculated research findings for decision support of dexhelpp must always be consistent, the calculations need to be done based on the same underlying virtual population – it needs to be standardised.

  • How can we determine, if a computer simulation model depicts the real-system?
    Regarding a Matchbox-car as a very simple model of a real car, it is very simple to determine the level of detail of the model. It is easy to find the existing (e.g. tires, windows,..) and the missing functionalities (e.g. engine,…) of the model. As a consequence its legit field of application (the area where the model behaves like the real system) can be determined easily as well. For dynamic computer simulation models for the prediction of diseases the situation is a lot more complex: Hereby, in the contrary to the car-example, the mechanics and causal relationships of the real system are not perfectly known.

  • How can we determine simulation-parameters if they are not directly measurable in the real-system?
    Imagine a model of an epidemic spread similar to the water-wave doubtless the speed of the wave is a very important parameter. In order to simulate the model the value of this parameter for a specific disease has to be determined, but unfortunately it can not be measured directly in the real system. Yet, other elements of the system like the length of the wave of disease can be measured. Based on these measurements the unknown parameter can be estimated somehow – by so called calibration.


To treat statistical research questions simulation-based methods like bootstrapping and certain sampling algorithms are applied on the health care datasets gained from Project 1. To deal with the simulation related problems, several methods like Virtual Overlay Multi-Agent System (VOMAS) as well as Genetic- und Great Deluge-algorithms are tested.

Expected Results

Scientific publications, bachelor, diploma and PhD theses are expected key-results of the project. The established methods should contribute to data preparation and data analysis of routine health-care data and increase the quality of simulations. Finally they should lead to more reproducible and stakeholder-oriented decision support.