# Uniformly Random Scalars

Many cryptographic protocols, like Diffie-Hellman and SPAKE2, require a way to choose a uniformly random scalar from some prime-order range. Why? What is the best way to do this?

## What (is a scalar)?

Classic Diffie-Hellman Key Exchange starts with each side chosing a random scalar. This is kept secret, but is used to derive a "public ephemeral element" that is sent to the other side. It is also used upon the peer's ephemeral element to build the shared secret element, from which the final secret key is derived. SPAKE2, as a modified DH protocol, relies on this secret random scalar too.

Scalars are basically integers in a specific range, bounded by the order
of an Abelian group, and the order is generally a big prime number P. To
be precise, scalars are "equivalence classes of integers modulo P",
meaning that you're choosing a *class* of integers, all of which are
equal to each other if your idea of "equal" is modulo P. If P is 5, then
one such equivalence class is the integers 2, 7, 12, -3, -8, -13, etc.
Each of these classes can be *represented* by a single member, which is
an integer between 0 and P, so we usually pretend that scalars are just
integers with the constraint that `0 <= x < P`

. We say "2" instead of
"the class that includes 2, 7, 12, etc".

Also note: there is some confusion, at least in my mind, about the
precise range of scalars. Some references (including the original SPAKE2
paper) say `Zp`

, which means any positive integer less than P (```
0 <=
x < P
```

).
The Handbook of Applied Cryptography
section on Diffie-Hellman (protocol 12.47, page 516) says scalars should
be `1 <= x <= P-2`

(excluding both 0 and -1). I'm pretty sure that 0
is a bad choice: in DH it will cause the resulting shared element to
always be the same thing (the identity element), independent of the
other party's message. It's a bit like
a weak key in symmetric
ciphers. But P is huge, so the chance of accidentally getting a scalar
of 0 (or any other specific value) is effectively nil. As long as the
protocol only uses scalars from trusted sources (i.e. ourselves, not the
network), we don't need to worry about it.

So for simplicity, I'll define our task to be generating an integer
`x`

, where `0 <= x < P`

for some large (prime) P, such that the
value is uniformly randomly distributed in that range (all values are
equally likely).

## Why (do we need a random one)?

The DSA and ECDSA signature algorithms also use a unique secret random scalar
(known as a "nonce", or just `k`

), and
are vulnerable to
attack if this nonce is biased. If you know the first or last few bits of
each nonce, and you have multiple signatures to work with, then a brute-force
search for the signer's private key is *much* easier than it should be. In
some cases, the private key can be recovered in a couple of hours.

Of course, if the implementation
doesn't even try to be random, then you wind
up with things like the
Playstation 3
where they used the same hard-coded value of `k`

for every single message,
allowing the private key to be recovered trivially with just two signatures.

It isn't clear if other protocols (like DH) are quite this vulnerable. The Logjam paper, in section 3.5, mentions attacks on small-exponent DH in poorly-chosen integer groups, and this email about Curve25519 scalars points out the attack-resistance provided by their specific clamping decisions (which constrain the scalar to certain values). But in general, our security proofs are built around the assumption that the scalar is unique and uniformly random, so to be safe we must follow those rules.

## How (do we create one)?

We can assume that our operating system gives us a source of random
bytes. `/dev/urandom`

on a fully-initialized unix-like host will give
us as many as we need.

If our target range `0 <= x < 256`

, or `0 <= x < 65536`

, or some other
power of 256, that'd be trivial. It would also be easy to produce integers in
a range that's any integral power of two (you just mask off the extra bits,
and treat the result as an integer). But since P is a prime, we're never
going to have a nice round size for truncation.

So we need to use `/dev/urandom`

to get a **seed**, and then convert some
number of these seed bytes to an integer. This is pretty easy: just treat the
array of bytes as a base-256 number. In Python2, we can exploit the
`hexlify()`

and `int()`

functions to make this really fast (python3 adds
`int.from_bytes()`

, which is even better):

```
def bytes_to_integer(seed_bytes):
return int(binascii.hexlify(seed_bytes), 16)
```

What's the range of this number? It will be 0 to `2**len(seed_bytes)`

.
If we use too few bytes, then it will obviously not even cover the
entire target range, so our first step is to make the seed larger than
the total range. This introduces the possibility of getting a number
that's too big, so we'll have to modulo down:

```
def make_random_scalar_with_bytes(seed_length_bytes, P):
# check that our seed will produce sufficiently-large integers
# the right-hand side is roughly equal to ln2(P)
assert 8*seed_length_bytes > (4*len("{:x}".format(P)))
seed_bytes = os.urandom(seed_length_bytes)
hash_int = bytes_to_integer(seed_bytes)
scalar = hash_int % P
return scalar
```

What's a reasonable choice of seed length? For the Curve25519 group, P is
`2**252 + 27742317777372353535851937790883648493`

, which lies on the low
end of the range between `2**252`

and `2**253`

. If we use 253 random bits
(which you get from 32 random bytes by doing something like ```
seed_bytes[0]
&= 0x1F
```

to mask out the top three bits), then we'll get a suitable value
slightly more than half the time, and the modulo function will kick in (i.e.
"aliasing" occurs) slightly less than half the time.

But that's pretty badly biased. Each time aliasing happens (e.g.
`hash_int >= P`

) means that two values of `hash_int`

(which *is*
uniform) are mapping to the same value of `scalar`

(which therefore is
not uniform). Consider the simple case of `P = 2**8 - 1 == 255`

(so we
want outputs from 0 to 254, inclusive, and exclude only 255), and our
`seed_length`

is 1 byte. Seeds of 0 and 255 will both map to an output
of 0, so zeros will appear in the output twice as frequently as any
other value. The one case of aliasing will induce a bias in our output.

The amount of bias, in a statistical sense, depends upon how many extra
bits we start with, and how close our target `P`

is to a power of 2,
so it's something like `ln2(P) - floor(ln2(P))`

, using the base-2
logarithm of our target P.

## The Best Good-Enough Solution

The simplest solution that yields a minimal bias is to throw more bits
at the problem. Using a `seed_length`

that's 32 bytes (128 bits)
larger than we really need reduces the bias to a statistically
insignificant level. In this case, we're aliasing almost **all** the
time:

```
def make_random_scalar(P):
# conversion that reduces the bias to a fraction of a bit
minimal_length_bits = 4*len("%x" % P)
safe_length_bits = minimal_length_bits + 128
safe_length_bytes = safe_length_bytes // 8
# that gets us between 121 and 128 bits of safety margin
return make_random_scalar_with_bytes(safe_length_bytes, P)
```

This is the approach used by the Ed25519 codebase to compute unbiased
deterministic nonces from the private key and the message being signed.
These nonces have the same requirements as ECDSA: they must be unique
and unbiased. The Ed25519 signing function creates a 512-bit hash and
then reduces it down to the ~252-bit group order: see the bottom of page
6 of the Ed25519 paper,
where `r`

is the nonce, and computations end up being performed mod
`P`

(which they call `l`

). They use about 258 extra bits.

FIPS 186-4, which defines DSA and ECDSA, says that 64 extra bits are sufficient (in appendix B.2.1).

## The Exact Solution: Try-Try-Again

There is a way to remove **all** the bias, but you might not like it. To
achieve zero bias, you remove the modulo-P step (so there's no chance of
aliasing), and you add a loop that keeps trying new random seeds over
and over again until the integer just happens to be in the right range.

```
def try_try_again(P):
length_in_bits = 4*len("%x" % P)
seed_length_bytes = round_up_to_multiple_of_eight(length_in_bits)
while True:
seed_bytes = mask(os.urandom(seed_length_bytes), length_in_bits)
candidate = bytes_to_integer(seed_bytes)
if candidate < P:
return candidate
# else, try again
```

This takes an unpredictable amount of time, but provides a perfectly
uniform output. The number of trials that you'll need depends upon the
same bias that we're removing. If you mask the bytes down to the minimum
number of bits, then the worst case (where P is just slightly larger
than some power of 2) is an average of two passes. If you don't bother
masking individual bits, then the worst case is 255 average passes. If P
is just slightly **smaller** than a power of 2, the average is a single
pass.

But this is an exponential distribution: if you're really unlucky, it
could take thousands of iterations before you find a suitable integer,
or worse. The **mean** is small, but the **maximum** is infinite.

I used this "try-try-again" algorithm as an option in
python-ecdsa. But unbounded
runtime is a drag, so the recommended approach is to use the
extra-128-bits scheme described above (in `make_random_scalar()`

).

This technique is also used (since around 2003 for large ranges, and
since 2010 for all ranges) in
Python's `random.SystemRandom.randrange()`

function, and
`secrets.randbelow()`

in Python3.6.

Before that point, python2.4 had
a bug (reported by none other than
Ron Rivest, the R in RSA!) in which `random.SystemRandom`

used
`/dev/urandom`

as a seed correctly, but `randrange()`

used that seed
to create a floating point number, then multiplied it out to the desired
range (and rounded the result to an integer). As a result, no matter how
large a range you asked for, the number could never have more than about
53 bits of entropy (and in fact the low-order bits were always zero,
which is exactly where ECDSA is vulnerable).

That bug was fresh in my mind when I wrote the python-ecdsa code, which is why I avoided using the standard library functions. But at this point it's probably safe to just use the following (though be sure to check what the underlying functions are really doing, especially if you're porting this to some other language which might have made the same mistake as Python):

```
import secrets
def make_random_scalar(P):
return secrets.randbelow(P)
```

or, on python2.7:

```
from random import SystemRandom
def make_random_scalar(P):
return SystemRandom().randrange(P)
```

## Scalars From Seeds

For testing, it may be useful to break the function up into two pieces.
The private inner function is deterministic, and accepts the seed bytes
as an argument. The externally-visible outer function is where
`/dev/urandom`

is sampled. The inner function can be unit tested.

```
def _bytes_to_integer(seed_bytes):
return int(binascii.hexlify(seed_bytes), 16)
def _map_bytes_to_scalar(seed_bytes, P):
# check that our seed will produce sufficiently-large integers
# the right-hand side is roughly equal to ln2(P)
assert 8*len(seed_bytes) > (4*len("{:x}".format(P)))
hash_int = _bytes_to_integer(seed_bytes)
scalar = hash_int % P
return scalar
def make_random_scalar(P):
# conversion that reduces the bias to a fraction of a bit
minimal_length_bits = 4*len("%x" % P)
safe_length_bits = minimal_length_bits + 128
safe_length_bytes = safe_length_bytes // 8
# that gets us between 121 and 128 bits of safety margin
seed_bytes = os.urandom(seed_length_bytes)
return _map_bytes_to_scalar(seed_bytes, P)
```

This can also be used in a related function: mapping seeds to scalars.
This function is needed for protocols like SPAKE2, where the
`password`

input must be converted into a scalar for the blinding
step. In this case, uniformity is not strictly necessary (the SPAKE2
password isn't randomly distributed, so any deterministic function of it
will have the same non-random distribution). But if your library already
has `_map_bytes_to_scalar()`

, then it may be easiest to build on top
of that:

```
def password_to_scalar(pw, P):
seed = sha256(pw).digest()
return _map_bytes_to_scalar(seed, P)
```

In addition, you might want the seed-to-scalar function to behave differently for different protocols, so the same password used in two different places doesn't produce values which could be mixed/matched in an attack. The usual way to accomplish this is to feed some sort of algorithm identifier into the hash function. Some options are:

- a simple prefix string:
`sha256("my algorithm name" + pw)`

- a real key-derivation function:
`HKDF(context="my algorithm name", secret=pw)`

. This also gives you exact control over the number of bytes, not limited to the native output size of the hash function. - some modern hash functions like BLAKE2 have dedicated
"personalization" inputs:
`blake2(input=pw, personalize="my algorithm name")`

## Use in python-spake2

All of this is an attempt to explain why the password-to-scalar function
in my python-spake2 library
is so over-complicated. When I wrote that function, I was worried that
the blinding scalar needed to be uniformly random (like most other
scalars in cryptographic protocols). So I combined all the techniques
above: both algorithm-specific hash personalization *and* using an
oversized hash output.

In retrospect, it would probably have been ok to just truncate a plain SHA256 output to something less than the Curve25519 group order. In fact, just using 128 bits would have been enough, which removes the need for the modulo step.

```
def password_to_scalar(pw, P):
return _bytes_to_integer(sha256(pw).digest()[:16])
```

So if you're looking at the `password_to_scalar`

function
in python-spake2 and think
it's unnecessarily complicated, that's why.

## Conclusions

Thanks to Thomas Ptáček, Sean Devlin, Thomas Pornin, and Zaki Manian for their advice and feedback.