Bayesian statis­tics as an exten­sion of machine learning methods

Share post:

Thomas Bayes was born in London at the begin­ning of the 18th century as the son of a vicar and also became a vicar after studying theology. His other interests were logic and statis­tics, which he also resear­ched in his spare time. His major scien­tific contri­bu­tion is the so-called Bayes’ theorem, which was only published three years after his death

$P(A_k|E)=\frac{P(A_k)\cdot P(E|A_k)}{\sum_{i=1}^k P(A_i)\cdot P(E|A_i)}$

As an example for the appli­ca­tion, we take a medical rapid test, which provides a positive test result in 95% of sick people ($P(positive|sick)=0.95$). In 2% of healthy people, the test also falsely leads to a positive result ($P(positive|healthy)=0.02$). The disease has infected 2% of all people ($P(sick)=0.02$ and corre­spon­dingly $P(healthy)=0.98$) and all people could be tested. Question: If a person tests positive, what is the proba­bi­lity that they actually have the disease?

$P(sick|positive)=\frac{P(sick)\cdot P(positive|sick)}{P(sick)\cdot P(positive|sick)+P(healthy)\cdot P(positive|healthy)}=49%$

Based on Bayes’ theorem, Bayesian statis­tics has developed, which is used in the context of induc­tive statis­tics and machine learning to estimate parame­ters and test hypotheses. For this purpose, the parame­ters are initi­ally assigned assumed distri­bu­tions (so-called a priori distri­bu­tions). Itera­tively, the distri­bu­tions are adapted to the problem using statis­tics from samples or the results of experi­ments (the a priori distri­bu­tions become post priori distri­bu­tions).

One example that is frequently used in the litera­ture is the experi­mental deter­mi­na­tion of the proba­bi­lity of winning in one-armed bandits. For example, let’s take three bandits with diffe­rent (unknown) proba­bi­li­ties of winning (the result of a game is only a win or no win with a constant win amount). Since we have no prior knowledge, we assume a beta distri­bu­tion with the parame­ters $a=1$ and $b=1$ (corre­sponds to a uniform distri­bu­tion) for the proba­bi­li­ties of winning. To deter­mine the post-priori distri­bu­tions, we itera­tively select a bandit (depen­ding on the experi­ence already gained) and adjust its win proba­bi­lity curve accor­ding to the outcome of the game. You can cancel the proce­dure if the proba­bi­lity curves of the three bandits no longer change signi­fi­cantly.

The follo­wing figures show the results for the described test after 5, 10, 20, 50, 100 and 200 games. The actual win proba­bi­lity of the blue bandit is 0.2, that of the green bandit 0.5 and that of the red bandit 0.75. You can see the develo­p­ments from the a priori proba­bi­li­ties (all proba­bi­li­ties of winning are equally likely) to the post priori proba­bi­li­ties.

In addition to the estimated proba­bi­li­ties of winning, the figures show the spread in the results. These can be inter­preted as certainty or uncer­tainty for the assump­tion of a profit proba­bi­lity. In order to be able to use this added value of infor­ma­tion for diffe­rent appli­ca­tions, machine learning and artifi­cial intel­li­gence algorithms are extended by Bayesian statis­tical approa­ches.

To illus­trate: Imagine you have a problem that needs to be solved on the basis of data. Experi­ence has shown that empirical data is subject to a certain degree of varia­bi­lity, is flawed, parti­ally incom­plete and, in summary, not unambi­guous. You use this data to train your model and the result is a value that appar­ently repres­ents the correct result for your problem. But how can this be if the data basis is not clear? The solution algorithm must there­fore be adapted so that all data problems are taken into account in the result. To achieve this, adapted solution methods are currently being developed for relevant machine learning algorithms that take into account the scatter in the data in each calcu­la­tion step and output a distri­bu­tion as the result. One example is artifi­cial neural networks, in which not only the outputs are replaced by post-priori distri­bu­tions, but also the network weights. In addition to the solution methods, inter­faces for further proces­sing the results in the form of distri­bu­tions must also be adapted. For example, we used Bayesian statis­tics to deter­mine bid prices in a research project funded by the mFund on the basis of a data base that was weak in places.

Picture of Björn Piepenburg

Björn Piepen­burg

Project request

Thank you for your interest in m²hycon’s services. We look forward to hearing about your project and attach great importance to providing you with detailed advice.

We store and use the data you enter in the form exclusively for processing your request. Your data is transmitted in encrypted form. We process your personal data in accordance with our privacy policy.