Magic

What does the crystal ball mean?

The following discussion is intended for non-experts of data science to explain the meaning behind the cyrstal ball shown on the Home page.

The image of the crystal ball is meant to assist you in answering the question: "What sets data science apart?" Subsequently, the following discussion aims to provide an informal response to this inquiry.

Throughout human history, forecasting the future has been a subject of great fascination due to its potential to prepare for various misfortunes and disasters. About two thousand years ago, the use of crystal balls was introduced, adding an element of showmanship to the practice of fortune-telling, as it was believed to enable a connection with higher powers. The ideas was to "see" inside the crystal ball the answers to questions raised about the future.

Even if one lacks the power to alter the future, simply possessing knowledge of future events carries great value. This is exemplified by the anticipation of forthcoming lottery number draws.

To this day, no one has created a working crystal ball, but the closest approximation we have is prediction models. Specifically, based on mathematics, many different prediction models have been introduced in statistics, machine learning and artificial intelligence summarized by their ability to make predictions.

While many early prediction models, especially in statistics, were parametric, it became increasingly clear that "data" played a crucial role in shaping their development. This shift led to the emergence of non-parametric models, including deep learning models, and the establishment of data science as a distinct research field.

On a note of caution, we would like to mention that it is clear that such prediction models will never be perfect crystal balls, allowing us to forecast the future without error. Instead, a prediction model is bound by the "problem of induction" [1] and should be viewed as systematic reasoning rather than a magical device.

The latter can be quantified by means of the expected generalization error [2,3].

This means that data science and the predictions it generates may seem magical, but they are simply the application of mathematics.

In order to emphasize this miraculous aspect of data science The Economist published in May 2017 an article titled "The world’s most valuable resource is no longer oil, but data".

In summary, data science models can be metaphorically viewed as mathematical crystal balls, constrained by the laws of prediction.

Digital twin

In recent years, a model emerged that is even more similar to a crystal ball than a prediction model and that is a digital twin [4]. Briefly, a digital twin is a simulation model, i.e., a complex system, with an updating mechanism that allows to emulate real world objects. Hence, it is a simulation model able to learn. Early examples of a digital twin were simulation models of jet engines or manufacturing processes.

Lately, similar ideas are used for problems in medicine to develop digital twins of patients, e.g., for investigating treatment options.

It is clear that a digital twin can never be a perfect model of "reality" but the idea is to find cases where it can still have utility. In this sence the famous phrase by George Box comes to mind who stated:

“All models are wrong, but some are useful.”

A digital twin is more similar to a crystal ball because its aim is to make predictions not based on any mathematical model but on a simulation model that replicates important aspects of a real-world object. This implies also that a simulation model bears some form of explainability that sets it appart from black-box prediction models.

Overall, the combination of models from data science and complex systems appears to be a promising fusion for obtaining high-quality prediction models that are also explainable.

We would like to refer to the following publications that provide conceptual information about the expected generalization error and the problem of induction.

Henderson, L. The Problem of Induction (2022). Stanford Encyclopedia of Philosophy.
Emmert-Streib, F., & Dehmer, M. (2019). Evaluation of regression models: Model assessment, model selection and generalization error. Machine learning and knowledge extraction, 1(1), 521-551.
Chapter 18 in Emmert-Streib, F., Moutari, S., & Dehmer, M. (2023). Elements of Data Science, Machine Learning, and Artificial Intelligence Using R. Springer Nature.
Emmert-Streib, F. (2023). Defining a Digital Twin: A Data Science-Based Unification. Machine Learning and Knowledge Extraction, 5(3), 1036-1054.