The following discussion is intended for non-experts of data science to explain the meaning behind the cyrstal ball shown on the
Home
page.
The image of the crystal ball is meant to assist you in answering the
question: "What sets data science apart?" Subsequently, the following
discussion aims to provide an informal response to this inquiry.
Throughout human history, forecasting the future has been a subject of
great fascination due to its potential to prepare for various
misfortunes and disasters. About two thousand years ago, the use of
crystal balls was introduced, adding an element of showmanship to the
practice of fortune-telling, as it was believed to enable a connection
with higher powers. The ideas was to "see" inside the crystal ball the answers to questions raised about the future.
Even if one lacks the power to alter the future, simply possessing
knowledge of future events carries great value. This is exemplified by
the anticipation of forthcoming lottery number draws.
To this day, no one has created a working crystal ball, but the closest approximation we have is prediction models. Specifically, based on mathematics, many different prediction models have been introduced in statistics, machine learning and artificial intelligence summarized by their ability to make predictions.
While many early prediction models, especially in statistics, were
parametric, it became increasingly clear that "data" played a crucial
role in shaping their development. This shift led to the emergence of
non-parametric models, including deep learning models, and the
establishment of data science as a distinct research field.
On a note of caution, we would like to mention that it is clear that
such prediction models will never be perfect crystal balls, allowing us
to forecast the future without error. Instead, a prediction model is
bound by the "problem of induction" [1] and should be viewed as systematic reasoning rather than a magical device.
The latter can be quantified by means of the expected generalization error [2,3].
This means that data science and the predictions it generates may seem
magical, but they are simply the application of mathematics.
In order to emphasize this miraculous aspect of data science The Economist published in May 2017 an article titled "The world’s most valuable resource is no longer oil, but data".
In summary, data science models can be metaphorically viewed as mathematical crystal balls, constrained by the laws of prediction.
In recent years, a model emerged that is even more similar to a crystal ball than a prediction model and that is a digital twin [4]. Briefly, a digital twin is a simulation model, i.e., a complex system, with an updating mechanism that allows to emulate real world objects. Hence, it is a simulation model able to learn. Early examples of a digital twin were simulation models of jet engines or manufacturing processes.
Lately, similar ideas are used for problems in medicine to develop digital twins of patients, e.g., for investigating treatment options.
It is clear that a digital twin can never be a perfect model of "reality" but the idea is to find cases where it can still have utility. In this sence the famous phrase by George Box comes to mind who stated:
“All models are wrong, but some are useful.”
A digital twin is more similar to a crystal ball because its aim is to
make predictions not based on any mathematical model but on a simulation
model that replicates important aspects of a real-world object. This implies also that a simulation model bears some form of explainability that sets it appart from black-box prediction models.
Overall, the combination of models from data science and complex systems
appears to be a promising fusion for obtaining high-quality prediction
models that are also explainable.
We would like to refer to the following publications that provide conceptual information about the expected generalization error and the problem of induction.
created with
Website Builder Software .