Data scientists building or using machine learning models and, therefore, AI systems use such models to describe a system. They face various challenges, such as balancing interpretability and accuracy that stem from differences in the three main types of models. In this blog post, we'll look into the three main types, focusing on the two opposites: white box and black box models.
Remember the blog post on rapid control prototyping and forklifts? I had help from Sweden for this blog post - a big thank you for the explanations to the electrical engineers conducting the study at Linköping University! Before we dive further into the world of input-output models, let's briefly examine what such models would do very transparently and explainable in an ideal world (according to Chris Walker from the Dataiku Blog) and useful for:
Critical decisions (e.g., healthcare)
Seldom made or non-routine choices (e.g., M&A work)
Stakeholder justification-required decisions (e.g., strategic business choices)
High-touch human judgment decisions (e.g., portfolio manager-vetted investments)
Situations where interactions matter more than outcomes (e.g., root cause analysis)
How To Choose Which Model To Use?
Basically, when it's possible to describe the whole system using, for example, physical equations, engineers use a white-box model. Such models have understandable, reasonable, and observable behaviors, features, and connections between influencing variables and output predictions. In reality, however, this is very rare and only used on very basic systems. Furthermore, such models don't perform as effectively as black-box models.
The opposite is a black-box model where a model is estimated mathematically from measured input and output. These models have observable input-output exchange, but how the model works "inside" is more or less in the unknown. The third type of model is the grey-box model, which is a mixture where part of the model is estimated, and part is described with formulas. Picking the right model for an application depends on many different factors. However, typically black-box models are used for deep learning applications to depict incredibly complex situations.
Downsides to Black Box Models
Though these models outperform all others (grey-box and white-box models), there are downsides to black-box models in that there could be a large group of concealed issues affecting the output (such as overfitting, spurious correlations). Therefore, black-box models need to be reassessed more frequently than, for instance, grey-box models.
Now that we have a basic understanding of different models, I'd like to look at a vast topic in engineering in my upcoming blog posts: data. How do engineers clean up messy data? Is there such a thing as too much data?
Stay tuned, and as always, stay curious!
If you are further interested, here are the sources I used for this blog post:
White Box vs Black Box Models: Balancing Interpretability and Accuracy - by Chris Walker
Industrial control system simulation routines - by Peng Zhang