Copyright (c) 2001 by Paul Godfrey Intersil Corp. Palm Bay, FL 32940 (321) 729-5359 pgodfrey@intersil.com A note on the computation of the convergent Lanczos complex Gamma approximation. ------------------------------------------- Abstract The convergent approximation of the Gamma function as developed by Lanczos is examined. It is found that the coefficients in the series expansion can be written as the product of 4 matrices. This allows greater accuracy in computing the function. The coefficients of various 2 parameter implementations were computed and the best approximation was found. A 15 term expansion was found that provides an accuracy of about 15 significant digits along the real axis and about 13 digits in the rest of the complex plane. I. Equation Reformulation Lanczos (1) has derived a convergent approximation for the Gamma function valid in the complex half plane Re(z)>0. That, coupled with the reflection formula pi*z (-z)! = ------------ z!*sin(pi*z) allows the Gamma function to be determined everywhere in the complex plane. In (1) Lanczos gives equations 32 through 35 to determine the coefficients in the expansion. This is a convergent approximation instead of the more widely known asymptotic approximation of Stirling (2) (6). Lanczos gives error bounds for g=1 through 5 with the greatest accuracy given as |e| < 2*10E-10 for g=5 using a 7 term expansion. The equations that Lanczos (1) gives; (32) z! = sqrt(2*pi)*(z+g+1/2)^(z+1/2)*e^-(z+g+1/2)*Ag(z) 1 z z*(z-1) (33) Ag(z) = p(0)*- + p(1)*----- + p(2)*----------- + ... 2 z + 1 (z+1)*(z+2) (34) p(k) = sum_from n=0 to k of ( C(2k,2n)*F(g,n) ) (35) F(g,n) = sqrt(2)/pi * (n-1/2)!*(n+g+1/2)^-(n+1/2)*e^(n+g+1/2) with g a small scalar and k a positive integer are subject to roundoff error if computed as written by Lanczos. The values of g and k allow an infinitely sized 2 parameter family of implementations of equation 32. Lanczos and Spouge (3) examined, essentially, the 1 parameter family of implementations along the line k = floor(g) + 1 Much greater accuracy can be achieved by examining the 2 parameter solution space 0< g <16 and 0< k <20 with g and k selected to provide the greatest accuracy when implemented in a double precision programming language. The error bounds derived by Spouge in (3) for his approximate Lanczos coefficients are somewhat pessimistic since we are calculating the exact Lanczos coefficients. For accuracy better than the best error (2*10E-10) that Lanczos gives, more thought must be given as to how to compute equations 32 through 35. The widespread availability of programs such as Matlab, Maple, and Mathematica allowing double precision (or better) accuracy as well as vector operations provides motivation for a highly accurate, vectorized computation. Worked examples will use k=6 and g=5, the largest used in (1) and in (4). Equation (35) can be written as (2*n-1)!! e^(n+g+1/2) F(g,n) = sqrt(2/pi) * ------------------------ n=0,1 ..., k 2^n (n+g+1/2)^(n+1/2) If we bring in the sqrt(2*pi) from equation 32 and cancel the common factor exp(g) then further manipulation yields, (n + 1/2) ( e ) F(g,n) = 2*(2n-1)!! * sqrt(2)*(----------) n=0,1, ..., k (2(n+g) + 1) we define Dc(n+1,n+1) = 2*(2n-1)!! n=0,1,...,k as a diagonal matrix. For g=5 and k=6 this is Dc = diag([2 2 6 30 210 1890 20790]) We then have F(g,n) = Dc(n)*f(g,n) with (n + 1/2) ( e ) (36) f(g,n) = sqrt(2)*(----------) n=0,1, ..., k (2(n+g) + 1) Equation 32 then becomes, (z + 1/2) (z + g + 1/2) (37) z! = (-----------) * Ag'(z) ( e ) with a modified Ag'(z) function. By canceling the common exp(g) factor and bringing the sqrt(2Pi) into the definition of f(g,n) we have scaled the coefficients c and p in (1) by W = exp(g)/sqrt(2Pi) Comparisons of the a and r terms in this paper with the c and p terms in the Lanczos (1) paper show that c = W*a and p = W*r It is worth noting that the computations with f(g,n) are the only floating point calculations required when computing the coefficients. For g=5, k=6 we have f(g,n) as a (k+1,1) sized vector: T f(g,n) ~= [703 135.2 19.77 2.311 0.2241 0.01849 0.001325]/1000 with W ~= 59.208284133962556 Equation (34) is recognized as an inner product multiply operation or row*vector multiply. The coefficients C(2k,2a) are the coefficients of the Chebyshev polynomials of the first kind as found in the even columns of Table 22.3 of (2) page 795. The coefficients come from the expansion of cos(2k*x) in terms of cos(x). Or, more concisely, C(1,1) = 1/2 and { (-1)^(row+col)*4^(col-1)*(row-1)*(row+col-3)! { ---------------------------------------------- C(row,col) = { (row-col)!*(2*col-2)! { for row=2 to (k+1) and col=1 to row { 0 otherwise By defining C(1,1) to be 1/2 we have transfered the 1/2 scaling factor of Lanczos' p(0) term, that first appears in his equation #23, into the C matrix. This allows us to refer to all of the p(k) elements in the same manner. For g=5 this becomes: [0.5 0 0 0 0 0 0] [-1 2 0 0 0 0 0] [ 1 -8 8 0 0 0 0] C(2k,2a) = [-1 18 -48 32 0 0 0] = C [ 1 -32 160 -256 128 0 0] [-1 50 -400 1120 -1280 512 0] [ 1 -72 840 -3584 6912 -6144 2048] Thus, r(k) = C*Dc*f(g,n) = p(k)/W Where Dc is seen to scale the columns of C. Equation (33) must first be expanded into its constituent partial fractions, that is: z(z-1) 2 -6 ----------- = 1 + ----- + ----- (z+1)*(z+2) z + 1 z + 2 etc. The partial fraction expansion is accomplished with the aid of the Binomial coefficients as found in Table 24.1 of (2) page 828. The resulting expansion actually uses scaled Binomial coefficients. Accordingly, we compute (33) as a diagonal scaling matrix multiplied by a Binomial matrix as: Ag(z) = Z*Dr*B*r(k) where 1 1 1 Z = [1 ----- ----- ----- ...] z + 1 z + 2 z + 3 and the diagonal entries of D are, essentially, -Sloane sequence A002457 (7), which gives: Dr(1,1) = 1 -(2n+2)! Dr(n+2,n+2) = ---------- n=0, ... ,k-1 2*n!(n+1)! which is Dr = diag([1 -1 -6 -30 -140 -630 -2772]) and finally, [1 1 1 1 1 1 1] [0 1 -2 3 -4 5 -6] [0 0 1 -4 10 -20 35] B = [0 0 0 1 -6 21 -56] [0 0 0 0 1 -8 36] [0 0 0 0 0 1 -10] [0 0 0 0 0 0 1] where the rows of B come from the odd columns of Table 24.1 of (2) page 828 with alternating signs. Or more concisely, { 1 for row=1 B(row,col) ={ (-1)^(col-row)*Binomial(col+row-3, row+row-3) for row=2,...,k+1 and col=row,...,k+1 { 0 otherwise Ag(z) is thus found to be: Ag(z) = Z*Dr*B*C*Dc*f(g,n) which has dimensions (for k=6) of (1,7)*(7,7)*(7,7)*(7,7)*(7,7)*(7,1) In terms of the coefficients of Z, a = Dr*B*C*Dc*f(g,n) = Dr*B*r(k) = c/W which need be pre-computed only once. We compute and define (38) M = (Dr*B)*(C*Dc) first and find that all of its elements are integers. The factoring of the Lanczos equations into a product of 4 precomputed integer matrices allows the elimination of most, but not all, of the roundoff errors that would have occurred in computing the coefficients if we had not done so. Even so, the final computation of the coefficients (39) a = (Dr*B*C*Dc)*f(g,n) = (Dr*B)*r(k) = M*f(g,n) needs to be done in at LEAST quadruple precision to maintain double precision accuracy in c. For example, compare the Gamma implementations in the 1st and 2nd editions of (4). For g=5 and k=6 the above will generate slightly more accurate coefficients, c, than found in (1) and (4). a = M*f(g,n) T a ~= [0.01689 1.2866 -1.461 0.4055 -0.02080 2.0413E-05 -9.1123E-08] The values in Dr, B, C, and Dc are not functions of g or z. Indeed, they are only a function of k in that the size of Dr, B, C, and Dc is k+1 by k+1. To compute (Dr*B*C*Dc) one merely adjusts the sizes of Dr, B, C, and Dc beforehand to be k+1 by k+1. Then, using formulas 36 through 39, the values of c are easily computed. Thus, the approximate formula of Spouge (3) is not needed. Dr serves as a scaling matrix for the rows of B and Dc serves as a scaling matrix for the columns of C. We could have initially defined the B and C matrices to include this row and column scaling to avoid actually multiplying the Dr*B and C*Dc matrices together. Defining S = Z*a The final expression for the convergent Lanczos complex Gamma approximation is (z + 0.5) (z + g + 0.5) Gamma(z+1) = (-----------) * S ( e ) with a(1) a(2) a(3) a(4) a(5) a(6) S ~= [a(0) + ----- + ----- + ----- + ----- + ----- + ----- + ... ] z + 1 z + 2 z + 3 z + 4 z + 5 z + 6 and T a(0:6) ~= [0.01689 1.2866 -1.461 0.4055 -0.02080 2.0413E-05 -9.1123E-08] for g=5, k=6 as in (1) and (3). With c = W*a = Dr*B*W*r = Dr*B*p p = W*r II. Error bounds Lanczos gives the error bound on equation 33, Ag(z) as err <= |pi/2*(e^g/sqrt(2) - (p(0)/2 + sum_from_n=1_to_k of (-1)^n*p(n)))| For this paper we can write it more concisely as Define s=[1 -1 1 -1 ...] then err <= |pi/2*( e^g/sqrt(2) - s*p(k) )| or err <= |pi/2*(e^g/sqrt(2) - s*inv(B)*inv(Dr)*W*a)| which is err <= |pi/2*W*( sqrt(pi) - u*a )| where u = s*inv(B)*inv(Dr) which simplifies into 2 u = -------------------- [2 1 3 5 7 9 11 ...] As the desired error goes to zero we have sqrt(pi) ~= u*a with a = M*f(g,n) Luke (5) also discusses the Lanczos error bound. Spouge further points out that this error translates into an error bound on equation 32, z! as E <= sqrt(pi/e)*err which is pi*exp(g-1/2) E = ------------- * |(sqrt(pi) - u*a)| 2*sqrt(2) This error bound serves to provide minimum values of g and k. For example, for |E|<10E-14 we find from figure 1 that g>4 and k>9. If one considers only the family of 1 parameter implementations along the line k = floor(g) + 1 as in (1) and (3), then the "best" implementation is found for g = 9 using an 11 term expansion. Mean relative error for this implementation was 10E-14 with a worst case error of about 10E-13. See figure 2 at http://winnie.fit.edu/~gabdo/fig2.jpg A plot of log(|err|) for the 2 parameters g and k is shown in figure 1. http://winnie.fit.edu/~gabdo/fig1.jpg The vertical lines indicate the location of the g=5 implementation of Lanczos and the 2 parameter implementation recommended in this paper. In theory, any combination of g and k that provide an error less than the desired accuracy would be acceptable. However, as g gets larger the ratio of the largest c(k) to the smallest c(k) becomes greater than the minimum double precision resolution. For example; with g=32 and k=33 c(7) ~= 9.1326E+16 and c(33) ~= 1.13526E-27 Clearly, any double precision series summation with coefficients spanning 43 orders of magnitude will be subject to severe roundoff error. For the implementations examined in (1) and (3) taking g<9 results in loss of accuracy due to an insufficient number of terms. Taking g>9 results in loss of accuracy due to coefficient span roundoff error. Although, to be fair, the minimum mean error from g=7 to g=10 is still better than 10E-13. It is interesting to observe that using the Real only Gamma function of Cody (10), as implemented in Matlab, resulted in about the same mean error performance, 10E-14 as the Complex Lanczos Gamma function with g=9. Although the worst error for both was still about 10E-13. To select the "best" values for g and k, an exhaustive search was carried out over the 2 parameter solution space of 0< g <16 in steps of 1/128, and 0