
Paul Johnson June10,2013 1 How likely is it that In my view,the thinks the chances are People disagree and 0.4or0.5or whatnot. The Beta density or probabilities.It is which work together to interval and whether it is symmetrical. The Beta can be used to describe not only the variety observed across people,but it can also describe your subjective degree of belief(in a Bayesian sense).If you are not entirely sure that the probability is0.22,but rather you think that is the most likely value but that there is some chance that the value is higher or lower,then maybe your personal beliefs can be described as a Beta distribution. 2Mathematical Definition The standard Beta distribution gives the probability density of a value x on the interval (0,1): Beta(α,β):prob(x|α,β)=xα−1(1−x)β−1 B(α,β) (1) where B is the beta function B(α,β)= 1 tα−1(1−t)β−1dt 2.1Don’t let all of those betas confuse you. It is disappointingly confusing,but the word“beta”is used for3completely different mean-ings. 1.Beta(α,β)“Beta”is the name of the probability distribution2.B(α,β)“Beta”is the name of a function that appears in the denominator of the density function 3.β“Beta”is the name of the second parameter in the density function 2.2About the Beta function B The Beta function B in the denominator plays the role of a“normalizing constant”which assures that the total area under the density curve equals1. The Beta function is equal to a ratio of Gamma functions: B(α,β)=Γ(α)Γ(β)Γ(α+β) Keeping in mind that for integers,Γ(k)=(k−1)!,one can do some checking and get an idea of what the shape might be. A3dimensional graph of the Beta function can be found in Figure1. 3Moments of the Beta The expected value of a variable that is Beta distributed is: E(x)=µ= α α+β (2) and the variance is V ariance(x)= αβ (α+β)2(α+β+1) (3) People who are familiar with the Generalized Linear Model will notice that V(µ)= β (α+β)(α+β+1) ·µ is a variance function,V(µ),which indicates the dependence of the observed variance on the mean.For afixed pair of parameters(α,β),the variance is proportional toµ.A graph illustrating the Variance function is presented in Figure2. The third and fourth moments are: Skewness(x)=2(β−α) √ 1+α+β √ α+β(2+α+β) (4) Kurtosis(x)=6[α3+α2(1−2β)+β2(1+β)−2αβ(2+β)] αβ(α+β+2)(α+β+3) (5)B e ta F u n c ti o n 6 8 10 2.0 2.0 2.0 2.0 M u lt ip li e r 0.5 0.0 0.20.4 0.60.8 1.0 0.0 0.20.40.60.81. x p r o b a b i l i t y Figure 3:Beta(1,1)is the Uniform distribution 3.1The Mode If α>1and β>1,the peak of the density is in the interior of [0,1]and mode of the Beta distribution is mode =γ=α−1 α+β−2 (6) If αor β<1,the mode may be at an edge. As we will illustrate below,if α=β=1,then the Beta is identical to a Uniform distri-bution. 4Illustration One advantage of the Beta distribution is that it can take on many different shapes.If one believed that all scores were equally likely,then one could set the parameters α=1and β=1,as illustrated in Figure 3,this gives a “flat”probability density function. In models of elections,one may need a distribution of ideal points to resemble a single-peaked distribution on the interval [0,1].The Beta can be very useful in this kind of exercise.Consider Figure 4. At one point,it fascinated me that the mode did not equal the mean and that the variance ends up characterizing the “slack”between those two things.Various densities in Figure 5might be entertaining.In these examples,the Beta parameters are chosen to keep the mode constant at 0.3.Note how the mean and variance change across the illustrations. 2040 6080100 0.0 0.51.01.52.0 2. 5Beta( 3 , 5.67 ) x p r o b a b i l i t y d e n s i t y 2040 6080100 0.0 0.51.01.52.0 2. 5Beta( 3 , 3 ) x p r o b a b i l i t y d e n s i t y 2040 6080100 0.0 0.51.01.52.0 2. 5Beta( 5.67 , 3 ) x p r o b a b i l i t y d e n s i t y Figure 5:Beta Distributions with Mode=0.3 20 40 60 80 100 0.01.5 3. Beta(1.1, 1.23) mode=0.3 mean=0.47, var=0.075 ideal point d e n s i t y 020********* 0.0 1.5 3. 0Beta(1.76, 2.76) mode=0.3 mean=0.39, var=0.043 ideal point d e n s i t y 20 40 60 80 100 0.01.5 3. Beta(2.41, 4.29) mode=0.3 mean=0.36, var=0.03 ideal point d e n s i t y 020********* 0.01.5 3. 0Beta(3.07, 5.82) mode=0.3 mean=0.34, var=0.023 ideal point d e n s i t y 20 40 60 80 100 0.01.5 3. Beta(3.72, 7.35) mode=0.3 mean=0.34, var=0.018 ideal point d e n s i t y 020********* 0.01.5 3. 0Beta(4.38, 8.88) mode=0.3 mean=0.33, var=0.016 ideal point d e n s i t y 20 40 60 80 100 0.01.5 3. Beta(5.03, 10.41) mode=0.3 mean=0.33, var=0.013 ideal point d e n s i t y 020********* 0.01.5 3. 0Beta(5.69, 11.94) mode=0.3 mean=0.32, var=0.012 ideal point d e n s i t y 20 40 60 80 100 0.0 1.53. Beta(6.34, 13.47) mode=0.3 mean=0.32, var=0.01 ideal point d e n s i t y 20406080100 0.0 1.5 3. Beta(7, 15) mode=0.3 mean=0.32, var=0.009 ideal point d e n s i t y 5About the connection between the mean,the mode, and the variance In the pictures displaying the Beta density,one’s eye is drawn to the peak of the frequency distribution,which is the mode.We can set the Beta’s parameters in order to generate a distribution with a desired mode.Let the mode be represented byγ. Here’s a simple starting point:Suppose the mode is.50.That is the same as the mean (its symmetric),and the mode formula(6)implies: .50= α−1 α+β−2 (7) and .50α+.50β−1=α−1 .5α=.5β α=β(8) If one wants the mode to be in the middle,one can choose any value forα,as long as one chooses the same value forβ.(Whew!What a relief.This exactly matched my intuition.) If the mode is in the center,we knowαandβare equal,but we don’t know their values.The selection,it turns out,depends on how much diversity there is.If one wants a distribution to have points“tightly bunched”around the mode,then one should choose a large value forα,say10.0, varianceof Beta(10,10)=0.01190(9) In contrast,ifα=1.5,the variance is much greater: varianceof Beta(1.5,1.5)=0.0625(10) Seen in this light,the parameterαis a“homogeneity indicator.”Asαgets bigger,the distribution collapses around the mode. Although this particular calculation works only for a mode in the center,it does outline the process that we can use to assignαandβfor all other values of the mode. Suppose the mode is.4.From equation6 .40= α−1α+β−2 0.0 0.2 0.4 0.6 0.8 1.0 0.0 1.02.03.0 Beta(0.7,0.2 mean=0.78 var=0.09x d e n s i t y 0.0 0.2 0.4 0.6 0.8 1.0 0.0 1.02.03. Beta(0.7,0.5 mean=0.58 var=0.11x d e n s i t y 0.0 0.2 0.4 0.6 0.8 1.0 0.0 1.02.03. Beta(0.7,0.75 mean=0.48 var=0.1x d e n s i t y 0.0 0.2 0.4 0.6 0.8 1.0 0.0 1.02.03. Beta(0.7,1.1 mean=0.39 var=0.08x d e n s i t y 0.0 0.20.4 0.60.8 1.0 0.0 1.02.03.0 Beta(1.2,0.2 mean=0.86 var=0.05 x d e n s i t y 0.0 0.20.4 0.60.8 1.0 0.0 1.02.03.0 Beta(1.2,0.5 mean=0.71 var=0.08 x d e n s i t y 0.0 0.20.4 0.60.8 1.0 0.0 1.02.03.0 Beta(1.2,0.75 mean=0.62 var=0.08 x d e n s i t y 0.0 0.20.4 0.60.8 1.0 0.0 1.02.03.0 Beta(1.2,1.1 mean=0.52 var=0.08 x d e n s i t y Figure 6:Some Unpleasant Betas β=3 2 α− 1 2 (11) or .60α=.2+.40β α=1 3 + 2 3 β(12) It is quite possible to calculate one parameter as a function of another,after specifying the mode,even if the mode is offcenter. Generally speaking,for any value of the mode,γ∈(0,1)(keeping in mind the original stipulation thatα,β>1): γ= α−1 α+β−2 (13) γα+γβ−2γ=α−1(14) (1−γ)α=γβ−2γ+1(15) α=γβ−2γ+1 (1−γ) = β−2+1 γ (1 γ −1) = γ 1−γ β− 2γ−1 1−γ (16) Soαis a linear function ofβ.(Note:2013-10-25;reader notified me of typographical error in equation16.Sorry!) And γβ=α−1−γα+2γ(17) β=α−γα+2γ−1 γ = α−γ(α−2)−1 γ = (1−γ) γ α− 1−2γ γ (18) This indicates that if we begin with the mode,and then take as given eitherαorβ,we can calculate the missing parameter(βorα,as the case may be).As a result,instead of thinking of the Beta’s shape as determined by parametersαandβ,sometimes it is easier to think of it in terms of the mode(most likely value)and the homogeneity.
