现充|junyu33

Rathee's trilogy — CrypTFlow2, SIRNN, and SecFloat

Background

Notation: Let λ be the security parameter, and the length of the secret-shared integer be l-bits.

Assume having the following basic knowledge in Multi-Party Computation:

For floating-point storage, we use IEEE 754 Float32 (single-precision standard):

IEEE 754 Addition

Without loss of generality, assume xy. To compute z=x+y, where:

x=(1)Sx(1.Mx)2Ex,y=(1)Sy(1.My)2Ey

The steps are as follows:

  1. Exponent alignment. If Ex>Ey, right-shift the mantissa of y: My1.My2ExEy,EEx,Mx(1.Mx).
  2. Mantissa addition. Consider the signs of Sx and Sy. If they have the same sign, calculate M=Mx+My. If they have different signs, subtract the smaller from the larger to get the absolute value of the difference M=|MxMy|, and attach the correct sign S.
  3. Normalization. Clearly, max{Mx+My,|MxMy|}<4, so M only needs to be right-shifted at most once (corresponding to adding 1 to the exponent E). If M=0, skip directly to step 4. Otherwise, repeatedly left-shift and decrement E until M[1,2).
  4. Write the result. Finally, remove the leading 1 of M, and use the final S,E,M as the new 32-bit floating-point result.
  5. Rounding issues may arise in the following four steps: My1.My2ExEy, M=Mx+My, M=|MxMy|, and right-shifting M by one bit.

IEEE 754 Multiplication

Multiplication is relatively simple because it doesn't involve comparing the magnitudes of numbers.

  1. XOR the sign bits. S=SxSy.
  2. Add the exponents. E=Ex+Eybias.
  3. Multiply the mantissas. M=(1.Mx)(1.My), the result is in the range [1,4).
  4. Normalize. If the result 2, increment E and right-shift M by one bit, then remove the most significant bit of M.
  5. Rounding operations are involved in the steps of M=(1.Mx)(1.My) and right-shifting M by one bit.

Rounding Method

Let d,g,f represent the least significant bit (decision bit), the most significant discarded bit (guard bit), and the remaining discarded bits (sticky bit) of M, respectively. The "round half to even" rule can be formally expressed as the following formula:

c=g(df)

Simply put, a carry (c=1) is only possible when g=1. In this case, if f=1 (definitely greater than 0.5) or d=1 (exactly 0.5, but the result must be even), then a carry occurs (c=1).

Note that normalization may be required after rounding.

Building Blocks

Below, we will explain how the entire protocol is constructed using a bottom-up approach.

FMUX

The ideal function we need to implement is MUX(b,x)=(b==1?x:0). That is, for a Boolean secret share [c]=(c0,c1),ci{0,1} and an arithmetic secret share [a]=(a0,a1),aZn, the output should be [ac]. The steps are as follows:

SETUP: P0 holds a0,c0, and P1 holds a1,c1.

We list four possible cases for (c0,c1):

(s0,s1)=(r0+c0a0,r0+a0(1c0))(t0,t1)=(r1+c1a1,r1+a1(1c1))

x0=tc0,x1=sc1

c0 c1 x0 x1 (r0+r1)+(x0+x1)
0 0 r1+c1a1 r0+c0a0 0
0 1 r1+c1a1 r0+a0(1c0) a
1 0 r1+a1(1c1) r0+c0a0 a
1 1 r1+a1(1c1) r0+a0(1c0) 0

The result is exactly equal to [ac], therefore the protocol is correct.

The communication cost is the overhead of two rounds of IKNP-OT, which is 2(λ+2l)=2λ+4l. However, Section 3.1.1 of CryptoFlow2 can optimize the communication overhead to 2λ+2l.

FAND

This is a direct implementation with Beaver triples, the communication overhead is (λ+16)+4. (See Appendix A1 of CryptoFlow2)

FOR

We have [xy]=[xy][xy]. Therefore, let [xy]=(z0,z1), so each party can directly compute xiyizi. The communication cost is the same as for FAND, which is λ+20.

FEQ

We divide x and y into m-bit blocks. Consider comparing a pair of blocks xj,yj. We use 1-out-of-2m Oblivious Transfer (OT):

We can extend the m-bit comparison to 2m-bit using a tree structure and FAND gates, calculating (eq1,j)i=(eq0,j)i(eq0,j+m)i, finally completing the comparison for the entire length. The communication complexity is lm(2λ+2m)+lm(λ+20), and the number of rounds is logl.

FGT/LT

The approach still involves block processing. First, we compute the result 1{xj<yj} for blocks of length m, which can be done using 1-out-of-2m OT. P0 only needs to randomly generate (lt0,j)0 and prepare 2m messages tj,k=(lt0,j)01{xj<k}.

Then, during the merging process, we prioritize the higher-order bits: 1{x<y}=1{xH<yH}(1{xH=yH}1{xL<yL}).

The communication cost is less than λ(4q)+2m(2q)+22q. With m=4 and q=l/4, this becomes λl+13.5l, and the number of rounds is logl.

FLUT

Assume the LUT has 2m entries, each with n bits.

SETUP: P0 randomly selects an index r{0,1}m, and a share T0[i]Z2n for each entry i{0,1}m, which is a masked version of the LUT L.

FWrap

If we want to compute 1{a+b>2n1}, this is equivalent to computing 1{2n1a<b}. Therefore, we can directly use FGT/LT.

FB2A

P0 and P1 each hold a Boolean share of a single bit c=c0c1, where c{0,1}, and ultimately obtain d=d0+d1(mod2n) such that d=c.

Let's verify the result: d0+d1=c0+c12(y0+y1).

The communication cost is one 1-out-of-2 COTn, with a cost of λ+n.

FZExt

Having already defined FWrap and FB2A, constructing FZExt is a natural step. For m-bit additive sharing, we attempt to zero-extend it to n-bit (n>m). First, we check if there is a carry between the two shares (using FWrap), and then convert the resulting Boolean carry w into an arithmetic value in Z2nm using FB2A.

However, since we are only performing a zero-extension operation, the sum of the two shares cannot produce a carry at the m-th bit. Therefore, both parties must subtract a share of 2m in Z2n. Since m is public, this can be done locally.

The cost is Comm(FWrap+FB2A)=λm+14m+λ+(nm)=λ(m+1)+13m+n.

FTR

Since we have FZExt for extending from smaller to larger bit lengths, we naturally also have the reverse operation, FTR. We assume that we truncate the lower s bits from an l-bit number and output the remaining higher ls bits (this is essentially equivalent to x >> s):

SETUP: Pb splits the original share xb into ub||vb, where the former is ls bits and the latter is s bits. It can be proven that:

TR(x,s)=u0+u1+Wrap(v0,v1,s)

The cost is Comm(FWrap+FB2A)=(λs+14s)+(λ+(ls))=λ(s+1)+13s+l.

FCrossTerm

FCrossTerm is similar to secure multiplication using Beaver triples, but the difference is that in the latter, both P0 and P1 know a part of the shares of x and y. The applicable condition for FCrossTerm is that P0 exclusively holds x, and P1 exclusively holds y, and finally each obtains a share of xy of length l=m+n.

The total communication cost is i=0m1(λ+(li))=mλ+mlm(m1)2=O(mλ+mn).

FUMult

The semantics are that both parties hold x=x0+x1(mod2m) and y=y0+y1(mod2n), and the goal is to compute z=xyZ2m+n.

Here, we cannot directly use Beaver triples because they are ring-agnostic, which might lead to unknown wrap-around issues. One approach is to extend both x and y to a sufficiently large ring, perform secure multiplication, and then truncate the result. However, this approach has two problems: first, FTR performs high-order truncation rather than low-order truncation, requiring modifications to the protocol; second, this method is computationally expensive, making it less efficient than the dedicated FUMult protocol described below.

We now need to compute (x0+x1)(y0+y1)=x0y0+x1y1+x0y1+x1y0. The first two terms can be computed offline by P0 and P1 respectively. The latter two terms involve the FCrossTerm mentioned earlier. In short, we have the complete x0y0 stored at P0, x1y1 stored at P1, and a portion of x0y1 and x1y0 held by both parties. The function of FUMult is to securely combine these four terms and handle the wrap-around issue.

Let's try adding the two shares together: b(xbyb+x0y1b+x1y0)=(x0+x1)(y0+y1)

And b(2ngb2mhb)=2ng2mh=2n(wyx)2m(wxy)

Combining them, z=(x0+x1)(y0+y1)wy(x2n)wx(y2m)

=(x+wx2m)(y+wy2n)wy(x2n)wx(y2m)=xy, which cancels out perfectly.

Let ν=max(m,n),l=m+n. The paper provides the specific cost, which is of the order O(λν+ν2). Compared to the Beaver triple method (including the generation of multiplication triples), which has a complexity of O(λl+l2), the order is the same, but the paper claims that FUMult requires 1.5 times less communication.

The principle of FSMult is similar to FUMult, and the communication volume is exactly the same, so we omit the details here.

FDigDec

The function is to split an l-bit share x into c=ld blocks of d bits each, resulting in {zi}i=0c1, where x=zc1||zc2||||z0. Clearly, performing this operation locally would lead to a wrap-around problem.

Of course, the solution to this problem is also straightforward. First, use Fwrap to determine if the two shares of x[0,d1] overflow, and obtain the share of the overflow result c0. Let z0=x[0,d1]2dc0, and add c0 to the higher-order bits x[d,l1] of both parties. Finally, recursively perform this process.

The complexity is equivalent to the cost of (c1) calls to Fwrap, which is (c1)(λd+14d).

FMSNZBP

The detection of the global index corresponding to the highest non-zero bit in the i-th d-bit block can also be understood as log2(zi)+id. The specific implementation method here is to solve it directly using FLUT. However, there is a problem: when zi=0, the expression on the right does not have the same semantics as the implementation on the left. The authors choose to leave the result undefined when zi=0, leaving it to the higher-level protocol to handle.

I don't quite understand this point. The case where zi=0 can also be solved directly using FLUT without increasing communication costs.

In short, the communication volume is equivalent to a 1-out-of-2d OTlog2l implementation, so the communication volume is 2λ+2dlog2l.

FZeros,FOneHot

The former means determining whether a vector of length d (or a d-bit number) is zero. The latter transforms a share k[0,l1] into a vector share of length l, such that the sum is (0,0,,1,,0), where only the k-th element is 1.

These two can be implemented using 1-out-of-2d OT and 1-out-of-l OTl respectively, with costs of 2λ+2d and 2λ+l2. Although the latter has a higher cost, it is only called once at the end of the parent protocol FMSNZB, so the total cost is still acceptable.

FMSNZB

The meaning is that given an input xl, we compute the value k of the highest zero bit of l, and output a vector share such that the sum is (0,0,,1,,0), where only the k-th vector is 1. For simplicity, here I assume that FMSNZBP is well-defined (i.e., it handles the case where zi=0). Let ι=log2l:

Summary

Primitive Dependent Primitives Function Communication Overhead
FMUX (21)OTl Ternary operator of length l bits 2λ+2l
FOR FAND, i.e., Beaver triple Logical OR λ+20
FEQ (2m1)OT,FAND Arithmetic equality of length l bits <34λl+9l
FLT/GT (2m1)OT,FAND Arithmetic comparison of length l bits <λl+14l
FLUT (2m1)OTn Lookup table with m-bit input and n-bit output 2λ+2mn
FZExt FWrap,FB2A Zero extension from m bits to n bits λ(m+1)+13m+n
FTR FWrap,FB2A Truncation of l bits to the high ls bits λ(s+1)+l+13s
FUMult,FSMult FCrossTerm,FWrap,FMUX Unsigned/signed multiplication of length m,n bits O(λl+l2)
FMSNZB FDigDec,FMSNZBP,FOneHot, etc. Index of the highest non-zero bit in a number of length l bits λ(5l4)+l2

Primitives

We have finally built all the basic protocols corresponding to the secfloat paper, and now we move on to the important primitives constructed in this paper.

FFPcheckp,q

This is used to check whether floating-point numbers overflow or underflow given floating-point parameters p and q (the IEEE 754 standard uses p=8,q=23). This is an innovative aspect of this paper: the ability to choose the floating-point parameters independently, without necessarily adhering to the IEEE 754 format.

α = (z, s, e, m)
if 1{e > 2^{p-1} - 1} then
   m = 2^q; e = 2^{p-1}
if 1{z = 1} ∨ 1{e < 2 - 2^{p-1}} then
   m = 0; e = 1 - 2^{p-1}; z = 1
Return (z, s, e, m)

This logic aligns with the paper's approach to handling overflows. Here's a brief description of how the protocol works:

FTRSl,s

FTRS builds upon FTR by adding a carry determination for the floating-point sticky bit S. A carry operation is performed if any of the discarded lower s bits is 1, and LSB(x)=0. This is equivalent to (x >> s) | (x & (2**s - 1) != 0). The right shift is handled by FTR (or a combination of FWrap and FB2A), but an additional FZeros operation is required afterward, which is costly.

Since FZeros is also equivalent to checking whether the sum of two shares is 2s, or whether both are 0, its essence is also a comparison operation (or an FWrap operation). Therefore, the authors chose to combine FWrap and FZeros into FWrap&All0s, with a cost equivalent to only one comparison operation and one FAND operation (there is a typo in the paper; it's not the cost of two FAND operations).

FRNTEl

FRNTE is equivalent to a right shift with rounding, xRr. As discussed previously, the IEEE rounding logic is as follows:

c=g(df)

We first use TRS(x, r-2) to check if the bits after f are 1. If they are, we add them back to ensure the sticky bit logic is correct. At this point, the last three bits of x are d,g,f. Then, we use FLUT to hardcode this 8-bit rounding logic into the result c. Finally, we call TR(x, 2) and add the rounding result c.

FRoundp,q,Q

Given the floating-point number α=(z,s,e,m), round the normalized mantissa m from Q-bit precision to q-bit precision, and handle any carry overflow resulting from the rounding. Note that m is represented in fixed-point form, meaning the mantissa 1.M corresponds to the fixed-point number m=(1.M)×2Q. Therefore, the range of m is [2Q,2Q+1). The protocol logic is as follows:

if 1{m >= 2^{Q+1} − 2^{Q−q−1}} then
   Return (e+1, 2^q)
else
   Return (e, m >>R (Q−q))

The following explains the correctness of the code's logic. The mantissa is considered in two cases:

FFPAddp,q

In the Background Chapter, we summarized the method for ordinary floating-point addition. Below, we will show how this method corresponds to pseudocode:

  1. Exponent alignment. If Ex > Ey, left-shift the mantissa of x: Mx(1.Mx)2ExEy,EEy.
  1. Add the mantissas. Consider the signs of Sx and Sy. If they have the same sign, calculate M=Mx+My. If they have different signs, subtract the smaller one from the larger one to get the absolute value of the difference, M=|MxMy|, and attach the correct sign S.

Here we consider the cases of same and different signs. When the signs are the same, the XOR of the sign bits of β1 and β2 is 0; otherwise, it is 1. Therefore, when β1.sβ2.s=1, we change the sign of m2 to perform the subtraction operation.

  1. Normalization. Here we find the most significant bit in the sum m. To ensure sufficient space for rounding, the protocol zero-extends m1 and m2, which have precision q, to 2q+2 bits in the first step (but note that the precision is 2q+1). Assuming the result of MSNZB is k, normalizing the most significant bit (i.e., aligning it with the 2q+1 bits used in the subsequent Round* protocol) requires a left shift of 2q+1k bits, corresponding to multiplication by K=22q+1k.

Regarding the calculation of the exponent, since it was shifted left by 2q+1k bits, and also because the precision change added q+1, the new exponent becomes e=e(2q+1k)+(q+1)=e+kq.

  1. Writing the result. Finally, a round of rounding is performed using FRoundp,q,2q+1. Since |m1|>|m2|, the sign bit is determined by β1.s. Finally, check if the normalized result is within the valid range of floating-point numbers.

FFPMulp,q

  1. XOR the sign bits. S=SxSy. (Line 7)
  2. Add the exponents. E=Ex+Eybias. (Line 1, the paper sets bias=0 here)
  3. Multiply the mantissas. M=MxMy, the result is in the range [22q,22q+2). (Line 2)
  4. Normalization. If the result 22q+1, increment E and right-shift M by one bit, finally removing the most significant bit of M.

If the result 22q+1, follow the else branch on line 6, otherwise follow the if branch on line 4.

  1. Rounding operations are involved in the steps of M=MxMy and right-shifting M by one bit. (i.e., FRNTEl on lines 4 and 6)

Math Functions

In the book これでなっとく!数学入門――無限・関数・微分・積分・行列を理解するコツ written by 瀬山士郎, a following excerpt explained how to calculate trigonometric functions on computers/calculators (I don't have the English version of this book, so I summarized it using ChatGPT):

The pages explain how trigonometric functions, especially the sine function, are calculated. Through a short classroom dialogue, the text raises the question of how a calculator can produce a value such as sin(0.7). It then explains that trigonometric functions cannot be represented by finite polynomials, but they can be expressed as infinite power series, known as Taylor (or Maclaurin) expansions. Using sinx as an example, the book shows its series expansion and explains that by taking only the first few terms, one can obtain an approximate value of sin(0.7). The more terms used, the more accurate the result becomes. Although the series is infinite and never truly ends, calculators compute enough terms to achieve practical accuracy, which is sufficient for most real-world applications.

FFPsinπ8,23

Based on this idea, we can actually elaborate on how a computer calculates sinx.

  1. First, use trigonometric identities to reduce the range of x to [0,π/2]. (Range reduction)

This is the approach used in the paper, but in practice, rounding to [π/4,π/4] and then using the double-angle formula reduces the error by an order of magnitude.

  1. Use Taylor series expansion for approximation, for example, sinxx16x3+1120x5. (Polynomial Evaluation)
  2. Use Horner's method: x16x3+1120x5=((1120x216)x2+1)x, to reduce the number of multiplications and lower the error.

These operations only require floating-point addition and multiplication. The coefficients of the polynomial are relatively fixed and can be obtained through a lookup table.

We now correspond this to the specific primitive function FFPsinπ8,23, which is the calculation of sinπx. The steps are as follows:

  1. First, handle special cases: When |x|>223, since q=23, it exceeds the precision of the mantissa, so x must be an integer. Therefore, sinπx=0. Another special case is when |x|<214, in which case sinπxπx, and the error is within the error of the floating-point representation of x itself.
  1. Range Reduction Step: The goal is to calculate the parity bit a{0,1} and the interval δ from the input α. Let |α|=2K+a+n. If n<0.5, then δ=n; otherwise, δ=1n. Therefore, sinπα=(1)a1{α<0}sinπδ.
m = α.m * 2^{α.e + 14}
a = TR(m, q+14); n = m mod 2^{q+14}

Transform α into an integer. Since |α|=2α.eα.m2q, we have m=α.m2α.e+14|α|2q+14. Then a is the integer part of |α|, and n is the fractional part of |α|.

f = (n > 2^{q+13} ? 2^{q+14} - n : n)
k,K = MSNZB(f); f = f * K
z = 1{f=0}; e = (z ? -2^{p-1}+1 : k - q - 14)

The first line ensures that f=δ2q+14. The second line finds the most significant bit of f, thus normalizing f, and the third line sets the correct exponent bits e.

When f=0, sinπα=0. The zero bit z is set to 1, and the exponent bits e are set to 2p1+1 according to the convention in the paper.

δ = (z, 0, e, TR(f, q+14−Q))

Finally, the fixed-point number f is stored again with Q bits of precision and combined with other parameters to form the floating-point number δ.

  1. Polynomial Evaluation Step: Note that the paper uses the Remez method and other more accurate approximation methods, rather than Taylor expansion:
if 1{δ.e < −14} then
   µ = Float_{p,Q}(π) ⊗_{p,Q} δ

Previously, in the special case where |δ|214, we had sinπxπx.

if 1{δ.e < −5} then
   idx1 = δ.e + 14 mod 2^4
   (θ1, θ3, θ5) = GetC_{4,9,5,p,Q}(idx1, Θ1_sin, K1_sin)

Consider the case where 214<|δ|<25. Here, the GetC function performs a table lookup operation. By selecting the constant class K1_sin, it retrieves the corresponding parameters (θ1, θ3, θ5) from the table Θ1_sin using the index idx1 (which is the lower 4 bits of δ.e+14).

4 is the number of bits in the table, 9 is the number of splines (coefficient pairs), and 5 is the degree of the fitted polynomial.

idx2 = 32 · (δ.e + 5 mod 27)
idx2 = idx2 + ZXt(TR(δ.m, Q−5) mod 32, 7)
idx2 = 1{δ.e = −1} ? 127 : idx2
(θ1, θ3, θ5) = GetC_{7,34,5,p,Q}(idx2, Θ2_sin, K2_sin)

In the interval 25|δ|<0.5, a lookup table is used for differentiation based on the lower 2 bits of δ.e + 5 and the upper 5 bits of δ.m (this is the second innovation of the paper: segmented indexing) to call the corresponding (θ1, θ3, θ5) that minimizes the error. The case of |δ|=0.5 is handled separately.

  1. Use the Horner's method to calculate the corresponding floating-point value:
Δ = δ ⊗_{p,Q} δ
µ = ((θ5 ⊗ Δ) ⊞* θ3) ⊗ Δ
µ = (µ ⊞* θ1) ⊗ δ
Return (µ.z, a ⊕ α.s, Round*(µ.e,µ.m))

Δ=δ2

μ=(θ5Δ+θ3)Δ=θ5δ4+θ3δ2

μ=(μ+θ1)δ=θ5δ5+θ3δ3+θ1δsinπδ

As mentioned earlier, sinπα=(1)a1{α<0}sinπδ, so a ⊕ α.s determines the sign bit, and the final result is obtained by rounding.

To summarize, the FFPsinπ8,23 function first performs range reduction on the input, reducing the problem to a small interval from 0 to 0.5 and recording the sign compensation; a fast path is used for extremely large or small inputs. Then, the function determines the segment based on the exponent and mantissa bits, selects the corresponding polynomial coefficients using a lookup table, calculates the approximate value using Horner's method, and finally restores the sign and rounds to a single-precision output.

One potential drawback of this method is that the table used by the GetC function is ad-hoc and differs from the protocol used for floating-point arithmetic operations. If one wants to arbitrarily specify p,q, or migrate to SecDouble, this coefficient table needs to be recalculated to ensure the error is less than 1 ULP. The paper sacrifices algorithmic flexibility in pursuit of performance and accuracy.

FFPlog28,23

Let α=m2N, where m[1,2) and N=α.e. Let a=1{N=1}. Then there are two cases:

log2(α)={N+log2(1+δ)a=0,log2(1δ)a=1.

Then, two sets of approximate polynomials (θ0a,θ1a,θ2a,θ3a) and (θ0b,θ1b,θ2b,θ3b) are constructed for a=0 and a=1 respectively. The values ​​of δ.e and δ.m within the specified range are then looked up in a table using the piecewise indexing method.

Next, the value of log2(1+) is calculated using Horner's method. For the final addition of N, N is first converted to a floating-point number using a lookup table, and then the FFPAddp,q protocol is called.

  1. Range Reduction Step:
a = 1{N = −1}
f = a ? (2^{q+1} − α.m) : (α.m − 2^q)
k,K = MSNZB(f); f = f *_{q+1} K
e = a ? (k − q − 1) : (k − q)

When N=1 (in which case a=1), f=1m/2; otherwise (in which case a=0), f=m1, and then normalize f.

z = 1{f = 0}; e = (z ? −2^{p−1}+1 : e); 
N = α.e; δ = (z,0,e, f *_{Q+1} 2^{Q−q})

A floating-point number δ is constructed using f, and the precision is increased from q=23 bits to Q=27 bits, in preparation for the subsequent evaluation.

  1. Polynomial Evaluation Step:
if 1{δ.z} then
   µ = Float_{p,Q}(0)

When δ=0, log2(1+)=log21=0, so μ=0 is returned.

if 1{δ.e < −5} then
   idx1 = (δ.e + 24) mod 2^5
   (θa0,θa1,θa2,θa3) = GetC_{5,19,4,p,Q}(idx1, Θ1_log, K1_log)
   (θb0,θb1,θb2,θb3) = GetC_{5,18,4,p,Q}(idx1, Θ3_log, K3_log)

When 224δ<25, the lower 5 bits of δ.e + 24 are used as the index to look up the table Θ1_log for the case a=1, and the table Θ3_log for the case a=0. Both lookups must be performed to avoid data dependencies.

else
   idx2 = 16 · (δ.e + 5 mod 2^7)
   idx2 = idx2 + ZXt(TR(δ.m, Q−4) mod 16, 7)
   (θa0,θa1,θa2,θa3) = GetC_{7,20,4,p,Q}(idx2, Θ2_log, K2_log)
   (θb0,θb1,θb2,θb3) = GetC_{7,32,4,p,Q}(idx2, Θ4_log, K4_log)

When 25δ<1, use the lower 3 bits of δ.e + 5 and the upper 4 bits of δ.m as the index to look up the table Θ2_log for the case a=1, and the table Θ4_log for the case a=0.

(θ0,θ1,θ2,θ3) = a ? (θa0,θa1,θa2,θa3) : (θb0,θb1,θb2,θb3)

Based on the classification of the value of a discussed above, select the final polynomial coefficients.

  1. Horner's method and the final addition of N.
µ = ((θ3 ⊗ δ) ⊞* θ2) ⊗ δ
µ = ((µ ⊞* θ1) ⊗ δ) ⊞* θ0

Calculate the value of the cubic polynomial μ=θ3δ3+θ2δ2+θ1δ+θ0=log2(1+).

β = LInt2Float(N)
β′ = (β.z,β.s,β.e, β.m *_{Q+1} 2^{Q-6})

Using a small LUT (lookup table), N is converted into a low-precision (6-bit mantissa) floating-point form β, which is then converted into a high-precision form Q, denoted as β.

γ = a ? µ : (µ ⊞* β′)
Return (γ.z, γ.s, Round*(γ.e, γ.m))

Finally, a floating-point addition is performed:

The final result of the protocol is obtained by rounding γ.