A short introduction to belief theory : Part 3
Table of Contents
Expanding topics on belief theory, in this post we’ll discuss how initial shortcomings of the theory were tackled, and explore novel extensions based on state of the art in the field. See parts I and II for a concise introduction. This post, like the two others before, leverages heavily the excellent Classic works of the Dempster-Shafer theory of belief functions: An introduction by Yager and Liu.
Impignment #
Let’s resume with our frame of discernment θ
, a set of possible answers to a question of interest. For the other terms I’ll use here, please refer to the earlier parts, if you’re not familiar with them.
Let us now define the weight of evidence function w
, in the context of separable support functions. Recall that a separable support function is one where the intersection of any 2 focal elements is also a focal element, allowing a top-down ‘separability’ of support for larger subsets of θ
.
Let Si be the separable parts, we then have:
\(w(A) = \sum(w_i | S_i = A) \)
with the boundary conditions w(∅) = 0 and w(θ) = +∞, and A ⊂ θ.
We can now introduce the impignment
function, but before we do a short sidebar on terminology. Impignment will return when we talk about the extensions by Smets et al to DST. The closest definition for the general verb to impugn
that still applies here is to come together
, or with some poetic liberty, to commit
.
We have separable support functions, so now we can also reason about internal conflict
, is the support for subsets in conflict when combined?
\(v(A) = \sum(w(B) \vert A \cap \overline{B} \ne \emptyset) \)
The v(A) function quantifies the total weight of evidence that does not support A. You can invert this function using the Mobious transform:
\(w(A) = \sum((-1)^{\vert B - A \vert}v(B) \vert A \subseteq B ) \)
Then, finally, we can express the weight of conflict:
\(W = -\log(\sum \lbrace (-1)^{\vert A \vert +1 \exp(-v(A))} \vert A \ne \emptyset \rbrace) \)
Conflicting conflict #
There are a number of scenarios where Dempster’s combination rule seems to conflict with intuition. But note, intuition, this hints at what mathematics is in some sense trying to do, model axiomatic the underlying principles of the universe, yet without being counter-intuitive. There is a lot of room for philosophy in this aspect, but for now I’ll leave that for a subsequent post.
What I think you will agree with is that when you model a real life use case with probabalistic reasoning, you’re doing to reason numerically or symbolically about scenarios, and express things in a common languague that is less ambiguous than our own.
Zadeh, of possibility theory fame, noted the following scenario: Let’s say you have symptoms of a disease, which has any of 3 causal factors (A, B, C). You go to a doctor, and they give a highly likely
rating to A, and not at all likely
to B. Numerically, let’s use convention and label these as m1(A)=0.95, m2(B)=0.05.
You want to be sure, so you go for a second opinion. Breaking for a moment with the abstract context, to the dismay of many these days you’re considered lucky to even get a timely first opinion, but let’s imagine things are better for you when you read this.
Your second doctor gives you a highly likely
rating for C (m2(C)=0.95) and not at all likely
for B (m2(C)=0.95).
The discerning reading will see that in both cases the frame discernment is {A, B, C}, but the mass functions are only assigned to A, B and B, C.
Let’s go for the default assumption of setting the missing parts to 0 then.
Now you need to make decision, and so you follow Dempster’s combination rule and end up with B as most likely in the combined scenario.
Hang on though, because neither
doctor would agree with this.
So what is going on? First, Bayesian practitioners would say we’re doing it wrong, and need to use Cromwell
’s rule where no prior probability (mass function) is ever set to 0 or 1 for exactly this reason.
In case you didnt’ do the math yourself, the problem here is that any multiplication of 0 leads to 0, but is 0 really the best way to encode the evidence? Neither doctor even considered the third causal factor.
Smets’ transferable belief model #
Smets in The combination of evidence in the transferable belief model argues that we made an error by modelling m1(C)=0=m2(A).
There is no evidence, yet we assign evidence.
More generally, TBMs work when your frame of discernment is incomplete, that is, there are situations where your enumeration of answers is not complete and the true answer(s) are simply not yet known. Before you respond by blaming the modeller, think how often this is true in scientific discovery, let alone real life.
Arguably, adapting DST so it will work under these circumstances is very valuable.
In short, TBMs split DST into two parts: credal belief
and pignistic probabilities
. Again, sounds weird, but think of this way, you have limited information, yet you have to make a decision. That exact point is when you go from credal belief
, the evidence you have so far, to pignistic probability
, where you do commit (impugn
) numerical values to your arguably incomplete but best effort model.
In terms of axioms, the most striking difference is that 0 <= m(∅) <= 1, because the empty set represent the things we missed
.
Coarsening #
The level at which you define or assign belief functions is a key freedom you have in DST. Shafer et al illustrate this with a simple example, imagine your question of interest is: what will be the weather?
, and you have a set of discernment: θ = {r, s, n}
, with (r)ain, (s)now, and (n)ormal.
What if we only cared about normal versus non-normal?
\(\theta’ = \lbrace n, \overline{n} \rbrace\)
In part 4 we’ll further explore how this fits in DST.
Conclusion #
We deepened the usage of weights of evidence, and specifically saw how you can compute conflict of evidence for a set of propositions. We then introduced Smets’ transferable belief models for one potential solution for scenarios where DST give counterintuitive answers. In part 4 we’ll cover more of the coarsening and inference, as well discuss related work.
Sources and further reading #
Classic works of the Dempster-Shafer theory of belief functions: An introduction