Mon 2021-Sep-13

On the ratio of Beta-distributed random variables

Tagged: CatBlogging / Math / Statistics / TheDivineMadness

[Warning: Post contains full frontal nerdity. Bug reports appreciated!] I finally got a copy of Pham-Gia’s paper on the distribution of the ratio of 2 independent Beta-distributed random variables. While I still have some childhood trauma around hypergeometric functions like 2F1() and its even scarier big brother 3F2()… it’s time to face my fears.

The three B’s: Bernoulli, Binomial, and Beta

Suppose you flip a loaded coin that has probability p of coming up heads. That’s a Bernoulli distribution, with just 2 discrete values:

{Pr(heads)=pPr(tails)=1p

Now suppose you do that N times, and observe that heads comes up k times. That’s a binomial distribution:

Pr(k|N,p)=(Nk)pk(1p)(Nk)

Next, suppose you don’t know how much the coin is loaded. Somebody lets you take N flips, and you observe k heads.

What should you believe about p, the probability the coin comes up heads? A naïve estimate would just give the single point value p=k/N. A better approach is to regard p as a random variable, whose distribution you’re inferring from experiment. The Bayesian way to do this is to start with a prior distribution that summarizes your knowledge before the experiment. If you don’t know anything, then it’s hard to beat a uniform distribution on [0,1]. This can conveniently be done with the Beta distribution of the first kind:

Pr(p|α,β)=pα1(1p)β1B(α,β)

where the normalization is B(α,β) is the complete Beta function.

It’s pretty clear that the uniform distribution is α=1,β=1.

At that point it’s a pretty straightforward application of Bayes Rule to show that the posterior distribution for p will be Beta-distributed with parameters α=k+1 (successes + 1), and β=Nk+1 (failures + 1). So all you have to do is count the number of trials and successes to get a posterior probability distribution that completely reflects your knowledge (and uncertainty!) of p.

Why we’re interested: vaccine efficacy confidence intervals

The reason we’re interested in this is vaccine efficacy confidence intervals. (Hey, COVID-19 pandemic, right?) Basically you have Nv people enrolled in the vaccine arm of the trial, and see Iv infections. At the same time, you have Nc people enrolled in the control arm, and observe Ic infections.

The coin we flipped above is here: heads you get the disease, tails you don’t. We’d like to know how much the vaccine lowers your risk of disease.

So point estimates of the probability of infection in each arm are:

pv=Iv/Nvpc=Ic/Nc

(We’d of course like to use a posterior Beta distribution instead of a point estimate here.)

Then the vaccine efficacy E is how much the risk is lowered, compared to the unvaccinated control arm:

E=100%×pcpvpc=100%×(1pvpc)

Now if we believe that pv and pc are Beta-distributed, then given the clinical trial as a bunch of disease-catching coin flips, the vaccine efficacy is distributed as (a trivial linear function of) the ratio of a couple of independent Beta variables.

Ok, so what’s the distribution of a ratio of independent Beta variables? There are a variety of ways to approach this, and we’ll compare several of them in an upcoming post. For now, we’re going to fight our way through a paper which gives the exact Bayesian result.

Exact solution: the probability distribution function (PDF) of a ratio of 2 Beta-distributed random variables

We’ll look specifically at the application of all this to the case of vaccine efficacies in a later post. For now, let’s just examine the mathematical question of what the distribution is for the ratio of two Beta-distributed variables. The exact solution was published in 2000 by Pham-Gia. [1] It lives behind an execrable paywall, and was thus devilishly difficult to acquire without paying ransom. Fortunately, I know people who know people who know people; somebody or other in that chain found it or had institutional access, and passed it back up the chain. Whoever you are, my thanks.

The problem

Consider 2 Beta-distributed variables p1 and p2:

Pr(p1)=pα111(1p1)β11B(α1,β1)Pr(p2)=pα212(1p2)β21B(α2,β2)

We then ask: if we compute their ratio R=p1/p2, then what are the PDF & CDF of Pr(R)? Ifwe knew the answer, particularly the CDF (or even better the quantile function, which is the functional inverse of the CDF!), we could calculate 95% confidence intervals on the ratio.

Ranges

It’s important to understand the ranges of each of the variables p1, p2, and R so that as we transform variables we don’t accidentaly step outside reality. This will help us keep the integration limits straight. Because p1 and p2 are from the standard Beta distribution, we have:

0p1,p21

(Usually these are proportions or probabilities, so we certainly want to stay in [0,1]!)

R, on the other hand, is a bit more gnarly. Since both p1 and p2 are non-negative, then surely R is also non-negative, i.e., 0 is a lower bound. But the denominator p2 can of course be arbitrarily near 0, so if at the same time the numerator p1 is finite (stepping carefully around the land mine at 0/0), then the upper bound must be infinite:

0R+

(Values of R>1 will, when we apply this to vaccine efficacies, result in negative efficacies. Is that meaningful? Yes: your vaccine could make things worse, making the vaccinees more susceptible to disease… which is surely negative efficacy, no?)

Changes of variables

Our strategy is to start with the joint density Pr(p1,p2) and then do various tortured changes of variables to eliminate one of p1 or p2 in favor of R, and express the residual integral in terms of a hypergeometric function 2F1().

How complicated can it be? Well, there’s lots of detail coming up, but really it comes down to the fact that with 3 variables p1,p2,R there are only 2 ways to eliminate one of them in favor of R. Either we substitute out p1 in favor of p2,R via:

p1=p2Rdp1=p2dR

This is appropriate for 0R1, since if R1 it could force a value of p1>1, which takes us out of its allowed range.

Or we substitute out p2 in favor of p1,R via:

p2=p1Rdp2=p1R2dR

This is correspondingly appropriate for 1<R, as it guarantees p21, as the range requires. (We’ll eventually lose the minus sign, either taking absolute value of Jacobians, or more reasonably, keeping careful track of the limits of integeration and swapping them when appropriate to cancel a minus sign.)

So we’ll need to do both transformations, piecewise over the range of R.

Double the workload. Le sigh… who coulda seen that coming?

Reading off the distribution of R

I like to do these changes of variable by looking at the normalization integral for the joint distribution. That way, as you change variables, the integration measure will force you to pick up the Jacobian properly. The joint distribution of p1 and p2 is, since they’re assumed independent, just the product of their individual distributions. So the normalization integral is pretty straightforward to write down:

1=1B(α1,β1)B(α2,β2)10dp110dp2pα111(1p2)β11pα212(1p2)β21

Next, we’ll use both the transformations above to get the integral in 2 pieces, one using (p2,R) and another using (p1,R). The first will involve an integral over R from 0 to 1, while the second will integrate from 1 to +. Then we’ll do a little razzle-dazzle high school algebra to pull the R integrations to the left, and thus be able to read off the distribution of R. It’ll be a piecewise function, with one piece for 0R1 and another for R>1:

1=1B(α1,β1)B(α2,β2)[10dp210dRp2(Rp2)α11(1Rp2)β11pα212(1p2)β21+10dp1+1dRp1R2pα111(1p1)β11(p1R)α21(1p1R)β21]=1B(α1,β1)B(α2,β2)[10dRRα1110dp2pα1+α212(1p2)β21(1Rp2)β11++1dR1Rα2+110dp1pα1+α211(1p1)β11(1p1R)β21]

From this we can directly read off the PDF for R, piecewise for 0R1 and similarly for R>1, respectively from each of the 2 terms:

{Pr(R|0R1)=1B(α1,β1)B(α2,β2)Rα1110dp2pα1+α212(1p2)β21(1Rp2)β11Pr(R|R>1)=1B(α1,β1)B(α2,β2)1Rα2+110dp1pα1+α211(1p1)β11(1p1R)β21

These still contain a residual p-integral each, but we’ll see next how to interpret those in terms of the hypergeometric function 2F1() with various tortured arguments.

Hypergeometric functions

Ok, so what’s this hypergeometric thingummy I’ve been mumbling about? I approach this subject with some caution, due to some childhood trauma around hypergeometric functions. [2]

Hypergeometric functions got their start in the 1600s, but hit it big in the 1800s with major players like Euler, Gauss, Riemann, and Kummer coming up for bat. They’re at once mind-numbingly complex (at least to me, since they’re kind of my kryptonite) with a bajillion special cases, and amazingly versatile. You can express almost all the special functions of theortical physics (Bessel functions and the like) in terms of hypergeometric functions.

So there’s a temptation: if you can just learn everything about hypergeometric functions, you can master nearly all of 19th century physics. The bug, of course, is that nobody can master all of the lore of hypergeometric functions. Least of all me!

Like most magical artifacts, they are best approached by wizards and avoided by mundanes. I am not a wizard in these matters, and thus approach with some fear and considerable respect for the virtue of keeping one’s fingers out of the gears.

As a matter of definition, in the unit disk |z|<1 in the complex plane, the hypergeometric function of interest at the moment is defined by a power series (and by analytic continuation outside the disk, stepping carefully around the branch points at 1 and infinity):

2F1(a,b;c;z)=+n=0(a)n(b)n(c)nznn!=1+abcz1!+a(a+1)b(b+1)c(c+1)z22!+

If c is a non-positive integer, it’s undefined or infinite. The funny parenthetical dingus is the rising Pochammer symbol:

(q)n={1n=0q(q+1)(q+n1)n>0=Γ(q+n)Γ(q)

An interesting special case for us will be when a or b are nonpositive integers (as with counts in a clinical trial), in which case the Pochammer symbols eventually include a 0 and the series thus terminates, resulting a polynomial. True, it will be a polynomial of very high order (say 15,000 participants in a trial arm), but a polynomial nonetheless!

That’s all… fine, if you like that sort of thing. But what does it have to do with the integrals we have to do above? Well, it turns out that 2F1() has an integral representation, as well, apparently due to Euler:

2F1(a,b;c;x)=1B(a,ca)10duua1(1u)ca1(1xu)b

This is the trick that Pham-Gia used to get the density distribution in closed form (at least, if you regard 2F1() as “closed”…), by recognizing the integrals above as special cases of this.

Expressing the residual p-integrals in hypergeometric terms

Basically we take the above equations for Pr(R) with residual integrals , and recognize that the annoying integral in them can, with a suitable change of variables, be made identical to the integral representation of 2F1().

Case 0R1:

The dictionary of variables to recognize the hypergeometric integral is:

u=p2a=α1+α2b=1β1c=α1+α2+β2

That gives the final result for small R of:

Pr(R|0R1)=B(α1+α2,β2)B(α1,β1)B(α2,β2)Rα112F1(α1+α2,1β1;α1+α2+β2;R)

And that’s Pham-Gia’s first result on p. 2698.

Case 1R:

Here the dictionary is slightly different:

u=p1a=α1+α2b=1β2c=α1+α2+β1

That gives the final result for large R of:

Pr(R|1<R)=B(α1+α2,β1)B(α1,β1)B(α2,β2)1Rα2+12F1(α1+α2,1β2;α1+α2+β1;1/R)

And that’s Pham-Gia’s second result on p. 2699.

To summarize the PDF result:

{Pr(R|0R1)=B(α1+α2,β2)B(α1,β1)B(α2,β2)Rα112F1(α1+α2,1β1;α1+α2+β2;R)Pr(R|1<R)=B(α1+α2,β1)B(α1,β1)B(α2,β2)1Rα2+12F1(α1+α2,1β2;α1+α2+β1;1/R)

Moments of the ratio, including the mean

We can also directly calculate the moments of R (where the 1st moment is of course the familiar mean). We do this not by heroic integration against the distribution above, but from the properties of the Beta-distributed p1,p2 that go into the ratio R.
Because p1 and p2 are statistically independent, the moment integral factors into 2 separate pieces:

E[Rk]=10dp110dp2(p1p2)kpα111(1p1)β11B(α1,β1)pα212(1p2)β21B(α2,β2)=10dp1pk1pα111(1p1)β11B(α1,β1)10dp21pk2pα212(1p2)β21B(α2,β2)=B(α1+k,β1)B(α1,β1)10dp1pα1+k11(1p1)β11B(α1+k,β1)1B(α2k,β2)B(α2,β2)10dp2pα2k12(1p2)β21B(α2k,β2)1=B(α1+k,β1)B(α1,β1)B(α2k,β2)B(α2,β2)

where the integrals have each been recognized as the normalization integral of a Beta distribution, and hence are 1. The remaining Beta functions can be simplified by expanding into Gamma functions, and using the Gamma recurrence relation:

B(α,β)=Γ(α)Γ(β)Γ(α+β)Γ(n+1)=nΓ(n)

So we then get:

E[Rk]=Γ(α1+k)Γ(β1)Γ(α1+β1+k)Γ(α1+β1)Γ(α1)Γ(β1)Γ(α2k)Γ(β2)Γ(α2+β2k)Γ(α2+β2)Γ(α2)Γ(β2)=Γ(α1+k)Γ(α1)Γ(α1+β1)Γ(α1+β1+k)Γ(α2k)Γ(α2)Γ(α2+β2)Γ(α2+β2k)=(α1)k(α1+β1)kΓ(α2k)Γ(α2)Γ(α2+β2)Γ(α2+β2k)

where we’ve recognized in the first 2 Gamma ratios the rising Pochammer symbols defined above. The remaining 2 Gamma ratios will require a bit of thought, but unsurprisingly they turn out to be expressible in terms of Pochammer symbols as well:

Γ(ak)Γ(a)=Γ(ak)(a1)Γ(a1)=Γ(ak)(a1)(a2)Γ(a2)=Γ(ak)(a1)(a2)(ak)Γ(ak)=1(ak)k

So our final expression for the kth moment of R is:

E[Rk]=(α1)k(α1+β1)k(α2+β2k)k(α2k)k

In particular, the case k=0 gives us the correct answer of 1 for the 0th moment, and the case k=1 gives us the mean of the ratio distribution:

E[R]=α1α1+β1α2+β21α21

(NB: The median is a bit more interesting than the mean when the distribution is highly skewed, but we couldn’t figure out a closed form result for the median. We’ll just have to be satisfied with using the CDF below and a bit of numerics to find the 50% quantile.)

Continuity at R=1

Pham-Gia did not address in his paper whether the 2 different expressions for Pr(R) matched up at R=1, i.e., that the probability distribution is continuous. We can show that the above expressions for Pr(R|0R1) and Pr(R|1<R) are equal in the left and right limits approaching R=1, establishing continuity at that point.

We need 2 identities:

2F1(a,b;c;1)=Γ(c)Γ(cab)Γ(ca)Γ(cb)
  • Also, we need to decompose complete Beta functions into Gamma functions, via the standard identity we all learned at our mother’s knees:
B(a,b)=Γ(a)Γ(b)Γ(a+b)

Case 0R1:

Pr(R|0R1)R1B(α1+α2,β2)B(α1,β1)B(α2,β2)Γ(α1+α2+β2)Γ(β1+β21)Γ(β2)Γ(α1+α2+β1+β21)=1B(α1,β1)B(α2,β2)Γ(α1+α2)Γ(β2)Γ(α1+α2+β2)Γ(α1+α2+β2)Γ(β1+β21)Γ(β2)Γ(α1+α2+β1+β21)=1B(α1,β1)B(α2,β2)Γ(α1+α2)Γ(β1+β21)Γ(α1+α2+β1+β21)=B(α1+α2,β1+β21)B(α1,β1)B(α2,β2)

Case 1R:

Pr(R|0R1)R1+B(α1+α2,β1)B(α1,β1)B(α2,β2)Γ(α1+α2+β1)Γ(β1+β21)Γ(β1)Γ(α1+α2+β1+β21)=1B(α1,β1)B(α2,β2)Γ(α1+α2)Γ(β1)Γ(α1+α2+β1)Γ(α1+α2+β1)Γ(β1+β21)Γ(β1)Γ(α1+α2+β1+β21)=1B(α1,β1)B(α2,β2)Γ(α1+α2)Γ(β1+β21)Γ(α1+α2+β1+β21)=B(α1+α2,β1+β21)B(α1,β1)B(α2,β2)

These 2 expressions being identical, we have established continuity at R=1.

Smoothness at R=1

We’d like to believe that in addition to being continuous at R=1, the PDF is also smooth, i.e., some rather large number of derivatives are also continuous. There is no particular reason to expect a kink in the PDF here, so it would be nice to know that our piecewise representation of the PDF has not introduced a kink.

This requires differentiating 2F1(a,b;c;z) with respect to z. One can stare at Wikipedia’s hypergeometric function differentiation formulas, or just differentiate the power series above to get the same answer: the derivative of a hypergeometric function is a constant times another hypergeometric function, with +1 added to the parameters:

ddz2F1(a,b;c;z)=abc×2F1(a+1,b+1;c+1;z)d2dz22F1(a,b;c;z)=a(a+1)b(b+1)c(c+1)×2F1(a+2,b+2;c+2;z)dndzn2F1(a,b;c;z)=(a)n(b)n(c)n×2F1(a+n,b+n;c+n;z)

… where the last expression for the derivative in the general case again uses the rising Pochammer symbols, just as above in the series definition of 2F1(a,b;c;z).

While it’s tempting to do the general case of the nth derivative to show it’s C smooth, we’ll content ourselves here with just the first derivative and the knowledge there’s no kink at R=1.

We’ll assemble the goods from 6 identities for the piecewise definition of our distribution, how to differentiate it, a formula for the value at unity of 2F1(;1), and some lore of B() and Γ functions, all assemblere here in one spot for quick reference:

Pr(R|0R1)=B(α1+α2,β2)B(α1,β1)B(α2,β2)Rα112F1(α1+α2,1β1;α1+α2+β2;R)Pr(R|1<R)=B(α1+α2,β1)B(α1,β1)B(α2,β2)1Rα2+12F1(α1+α2,1β2;α1+α2+β1;1/R)ddz2F1(a,b;c;z)=abc×2F1(a+1,b+1;c+1;z)2F1(a,b;c;1)=Γ(c)Γ(cab)Γ(ca)Γ(cb)B(α,β)=Γ(α)Γ(β)Γ(α+β)Γ(n+1)=nΓ(n)

Case 0R1:

ddRPr(R|0R1)=B(α1+α2,β2)B(α1,β1)B(α2,β2)×ddR[Rα112F1(α1+α2,1β1;α1+α2+β2;R)]=B(α1+α2,β2)B(α1,β1)B(α2,β2)×[(α11)Rα122F1(α1+α2,1β1;α1+α2+β2;R)+Rα11(α1+α2)(1β1)α1+α2+β22F1(α1+α2+1,2β1;α1+α2+β2+1;R)]R1B(α1+α2,β2)B(α1,β1)B(α2,β2)×[(α11)Γ(α1+α2+β2)Γ(β1+β21)Γ(β2)Γ(α1+α2+β1+β21)+(α1+α2)(1β1)α1+α2+β2Γ(α1+α2+β2+1)Γ(β1+β22)Γ(β2)Γ(α1+α2+β1+β21)]=B(α1+α2,β2)B(α1,β1)B(α2,β2)×Γ(α1+α2+β2)Γ(β1+β22)Γ(β2)Γ(α1+α2+β1+β21)×[(α11)(β1+β22)+(α1+α2)(1β1)]=1B(α1,β1)B(α2,β2)×Γ(α1+α2)Γ(β2)Γ(α1+α2+β2)×Γ(α1+α2+β2)Γ(β1+β22)Γ(β2)Γ(α1+α2+β1+β21)×[(α11)(β1+β22)+(α1+α2)(1β1)]=B(α1+α2,β1+β21)B(α1,β1)B(α2,β2)×(α11)(β1+β22)+(α1+α2)(1β1)β1+β22=B(α1+α2,β1+β21)B(α1,β1)B(α2,β2)×α1β2α2β1+α2α1β1β2+2β1+β22=B(α1+α2,β1+β21)B(α1,β1)B(α2,β2)×[α1β2α2β1+α2α1β1+β221]

Case 1R:

ddRPr(R|1<R)=B(α1+α2,β1)B(α1,β1)B(α2,β2)×ddR[1Rα2+12F1(α1+α2,1β2;α1+α2+β1;1/R)]=B(α1+α2,β1)B(α1,β1)B(α2,β2)×[α2+1Rα2+22F1(α1+α2,1β2;α1+α2+β1;1/R)+1Rα2+1(α1+α2)(1β2)α1+α2+β12F1(α1+α2+1,2β2;α1+α2+β1+1;1/R)(1R2)]R1+B(α1+α2,β1)B(α1,β1)B(α2,β2)×[(α2+1)Γ(α1+α2+β1)Γ(β1+β21)Γ(β1)Γ(α1+α2+β1+β21)(α1+α2)(1β2)α1+α2+β1Γ(α1+α2+β1+1)Γ(β1+β22)Γ(β1)Γ(α1+α2+β1+β21)]=B(α1+α2,β1)B(α1,β1)B(α2,β2)×Γ(α1+α2+β1)Γ(β1+β22)Γ(β1)Γ(α1+α2+β1+β21)×[(α2+1)(β1+β22)(α1+α2)(1β2)]=1B(α1,β1)B(α2,β2)×Γ(α1+α2)Γ(β1)Γ(α1+α2+β1)×Γ(α1+α2+β1)Γ(β1+β22)Γ(β1)Γ(α1+α2+β1+β21)[(α2+1)(β1+β22)(α1+α2)(1β2)]=B(α1+α2,β1+β21)B(α1,β1)B(α2,β2)×(α2+1)(β1+β22)(α1+α2)(1β2)β1+β22=B(α1+α2,β1+β21)B(α1,β1)B(α2,β2)×α1β2α2β1+α2α1β1β2+2β1+β22=B(α1+α2,β1+β21)B(α1,β1)B(α2,β2)×[α1β2α2β1+α2α1β1+β221]

Those two derivative expressions being identical, we have established equality of the first derivatives at R=1, so our distribution is first-order smooth.

Ok, but what about the cumulative distribution function?

That gives us the PDF (probability distribution function); if we want the CDF (cumulative distribution function) to calculate quantiles, we’ll have to go beyond Pham-Gia’s paper. That’s the accumulated probability from 0 to R0, Pr(<R0), which we get by integrating. Since the PDF is piecewise, so is the CDF. We get the piece for R<R0 by integrating from 0 to R0, and the piece for R0>1 by integrating back from + to R0, and subtracting from 1:

Pr(<R0|0R01)=R00dRPr(R|0R1)=B(α1+α2,β2)B(α1,β1)B(α2,β2)R00dRRα112F1(α1+α2,1β1;α1+α2+β2;R)Pr(<R0|1<R0)=1+R0dRPr(R|1<R)=1B(α1+α2,β1)B(α1,β1)B(α2,β2)+R0dR1Rα2+12F1(α1+α2,1β2;α1+α2+β1;1/R)

That leaves us with the riddle of how to integrate powers times 2F1()’s, and the inverse version of that (which is probably equivalent to the first integral, after a change of variables).

We address the first integral by hitting up the power series:

y<10dxxd2F1(a,b;c;x)=y<10dxxd+n=0(a)n(b)n(c)nxnn!=+n=0(a)n(b)n(c)n1n!y<10dxxn+d=+n=0(a)n(b)n(c)n1n!xn+d+1n+d+1|y<10=yd+1d+1+n=0(a)n(b)n(c)nd+1n+d+1ynn!=yd+1d+1+n=0(d+1)n(a)n(b)n(d+2)n(c)nynn!=yd+1d+13F2(d+1,a,b;d+2,c;y)

where the lower limit of the integral vanishes if d>1 (as assumed here), has a constant from 3F2(0) if d=1, and has a pole if d<1.

We’ve recognized in the series the generalized hypergeometric function 3F2(). The subscripts remind us that there are 3 Pochammer symbols in the numerator and 2 in the denominator, vs 2 in the numerator and 1 in the denominator for 2F1().

Ok, we have a little fear and trembling at the sight of 3F2() (having summoned up that which which we might not be able to put down). Nonetheless, we swallow our fears and proceed recklessly to the second integral via a change of variables:

x=1/ux=du/u2

which turns out to just transform this back into the first case:

+y>1dx1xd2F1(a,b;c;1/x)=(1/y)<10du1u2ud2F1(a,b;c;u)=(1/y)<10duud2ud2F1(a,b;c;u)=ud1d13F2(d1,a,b;d,c;u)|(1/y)<10=1(d1)yd13F2(d1,a,b;d,c;1/y)

where the lower limit vanishes at 0 if d>1 (as assumed here), is a constant from if 3F2(0) if d=1, and has a pole if d<1.

Armed with those 2 hypergeometric integrals, we can now read off the piecewise CDF from the definitions above:

Pr(<R0|0R01)=R00dRPr(R|0R1)=B(α1+α2,β2)B(α1,β1)B(α2,β2)R00dRRα112F1(α1+α2,1β1;α1+α2+β2;R)=B(α1+α2,β2)B(α1,β1)B(α2,β2)Rα10α13F2(α1,α1+α2,1β1;α1+1,α1+α2+β2;R0)R000Pr(<R0|1<R0)=1+R0dRPr(R|1<R)=1B(α1+α2,β1)B(α1,β1)B(α2,β2)+R0dR1Rα2+12F1(α1+α2,1β2;α1+α2+β1;1/R)=1B(α1+α2,β1)B(α1,β1)B(α2,β2)1α2Rα203F2(α2,α1+α2,1β2;α2+1,α1+α2+β1;1/R0)R0+1

The appropriate limits as R00 and as R0+ are manifest, due to the way we structured the integrals.

However, since we have a piecewise CDF, we have to show it’s continuous at the piece boundary at R=1. (It must be, since it’s the integral of the PDF which we showed above is continuous and first-order smooth. We just want to be sure!)

That amounts to proving the following equality, joining the values of the CDF from below and above 1:

B(α1+α2,β2)B(α1,β1)B(α2,β2)1α13F2(α1,α1+α2,1β1;α1+1,α1+α2+β2;1)+B(α1+α2,β1)B(α1,β1)B(α2,β2)1α23F2(α2,α1+α2,1β2;α2+1,α1+α2+β1;1)=1

So we need to hunt down some identities for 3F2(1) at various parameter values. We have not as yet figured out how to do that. …TBD…

To summarize the CDF result:

{Pr(<R0|0R01)=B(α1+α2,β2)B(α1,β1)B(α2,β2)Rα10α13F2(α1,α1+α2,1β1;α1+1,α1+α2+β2;R0)Pr(<R0|1<R0)=1B(α1+α2,β1)B(α1,β1)B(α2,β2)1α2Rα203F2(α2,α1+α2,1β2;α2+1,α1+α2+β1;1/R0)

A Second Opinion

Now, it turns out that Julian Saffer has already worked this out, and what’s more put a library in Python on Github. [3] Now, I’m not so much with the Python; I’m more of an R guy. But let’s have a look.

In Saffer’s notation, what we call the ratio R he calls w. His integrals agree with ours:

"Saffer: integral of 2F1() times power" "Saffer: integral of 2F1() times inverse power"

Also, his use of those integrals gets a piecewise CDF which is also identical to ours. For 0w1:

"Saffer: CDF for 0 < w < 1"

And for w>1:

"Saffer: CDF for w > 1"

I’m a bit suspicious of his Python code, since:

  • Hey, I’m an R guy. Of course I’m a bit uneasy with Python stuff.
  • He calculates things in log space and then exponentiates. This makes some sense, to avoid large factorials, but it also exponentiates roundoff errors in a numerically unstable way. Better to use a recursion relation on the series coefficients, and transformations to get the argument to be as small as possible on the real line. (We’re not so concerned about the complex plane here.)
  • He does not address what happens when the parameters a,b,c of the hypergeometric functions become large. To analyze the Pfizer or Moderna vaccine trials, these can be order of 10s of thousands! Clearly some sort of recurrence relation is needed to lower the order there.

Saffer: example of 2 beta distributions and their ratio distribution Us: same example of 2 beta distributions and their ratio distribution But we can test against his Python code on a low order example and see if we agree. Fortunately, Saffer provides exactly such an example. (This is how good science proceeds, by seeing if independent assessment agree.)

Saffer’s repository shows a graph with an example of a numerator Beta distribution with α1=3,β1=6 and a denominator Beta distribution with α1=12,β1=7. These values won’t trigger any of our concerns about large-parameter evaluation of hypergeometric or generalized hypergeometric functions. So let’s compare.

The top graph here is from Saffer’s repository. He shows:

  • the PDF of the numerator, its 90% confidence interval, and its mean (blue),
  • the PDF of the denominator, its 90% confidence interval, and its mean (orange),
  • the PDF of the ratio, its 90% confidence interval, and its mean (green),
  • the CDF of the ratio (red)

The second graph here uses our formulae above and a naive implementation using the R package hypergeo, to reproduce the graph as best we can. [4]

We note with some satisfaction that the graphs are more or less identical. Apparently we’re calculating the same thing: we may be wrong, but if so, we’re wrong together. So, at least for relatively smallish values of the hypergeometric parameters a,b,c we agree. It will be another matter to make this practical for numeric stability for large values of a,b,c.

Computational realization for practical use

At some point soon, I’d like to implement this in R. There are some gnarly issues with numerical roundoff. Even though the hypergeometric series terminates as a polynomial for integer β’s, one simply cannot naïvely compute a polynomial of order 15,000 for a large clinical trial and expect to get anything other than nonsense!

That’s work for another day.

The Weekend Conclusion

Weekend Publisher, mid-critique, providing purr review In a spirit of proper collegiality, I wish to acknowledge warmly the assistance of the Weekend Publisher at several points in this analysis. He provided encouragement when I wanted to give up. He is shown here in mid-critique, providing valuable purr review.

That acknowledgement having been made, we’ve worked our way through the relevant parts of Pham-Gia’s paper (though there’s a lot more there!), and confirmed the primary result that the PDF of the ratio of 2 independent Beta-distributed random variables is a variety of hypergeometric function 2F1().

Somewhat beyond Pham-Gia’s paper, we’ve also proven the continuity of the PDF at R=1, i.e., that the expression for 0R1 and the one for R>1 match up at R=1.

We’ve also derived the CDF in a similarly piecewise way, to be used for calculating quantiles. Our results match those of Saffer, so we’re probably on the right track here.

However, there are several things we haven’t done:

  • While we’ve shown continuity and first-order smoothness (non-kink) at R=1, we suspect a much stronger condition of C smoothness holds. To prove that, we’d have to match all derivatives, but we’ve only done orders 0 and 1 here.
  • While we have the CDF function, we haven’t demonstrated continuity at R=1. That requires some identities about 3F2(1) for various parameter values.
  • We also haven’t wrestled with all the numeric issues of implementing this, say in R. However, for relatively small values of the parametes, we’ve managed to reproduce convincingly the example Saffer reports in his repository.
  • Furthermore, we haven’t investigated the quantile function (functional inverse of the CDF) which would let us read off quantiles directly. Looking at Saffer’s code, he hasn’t either: he’s using Newton’s method to solve the relevant equation directly from the CDF, which is totally fair.

So we’ve got a bit more work to do to make this useable in a practical sense.


Notes & References

1: T Pham-Gia, “Distributions of the ratios of independent beta variables and applications”, Comm Stat: Theory & Methods, 29:12, 2693-2715. DOI: 10.1080/03610920008832632. Since it was so hard to get, I’ve archived a copy here.

NB: We believe there are several errata in this paper which make it much harder to read than need be. We’ve worked through the details, and with these corrections, obtain the same eventual result in terms of 2F1(). Specifically:

  • p. 2696, in the definition of the Pochammer coefficients: K should be
  • p. 2698, in the equation for the marginal density f(w):
    • The upper limit of integration should be +1, not +
    • The exponent of (1x2) should be β21, not β2
  • p. 2698, in the integral for f(w) for 0w1:
    • the integration measure is missing, and should be dx2
    • in the rightmost expression, the exponent of (1x2) should again be β21, not β2
  • p. 2703, in the variables for a Dirichlet distribution, K should again be

While there may or may not be similar typos (almost certainly due to journal typesetting, not Pham-Gia, who seems to be a pretty good guy!) in the rest of the paper, we haven’t checked carefully since it does not bear directly on our interests. But with the corrigenda above, we were able to reproduce Pham-Gia’s main result, the piecewise PDF on pp. 2698-2699.

2: OK, the truth is: I was actually a mere 23 years old and in my first year of physics grad school at MIT. I got wrapped around the axle pretty tight, because the notation between various texts was subtly and confusingly different. I thought I’d suddenly become stupid! It took years to get past that. Now, even 45 years later, it’s still a sensitive spot. But… time to face my fears.

3: J Saffer, “Beta Quotient Distribution”, GitHub Repository, last committed 2020-Jun-06, retrieved 2021-Sep-13.

4: Weekend Editor, R script for naive comparison with J Saffer’s example, Some Weekend Reading blog, 2021-Sep-13.

Published Mon 2021-Sep-13

Gestae Commentaria

Comments for this post are closed pending repair of the comment system, but the Email/Twitter/Mastodon icons at page-top always work.