Like most parts of mbed TLS, the implementation of the elliptic curve operations can be tuned using various compile-time flags. This article explains the two parameters that control trade-offs between performance and footprint, in more detail than our general article about reducing footprint.
Performance and RAM figures
As we'll be discussing performance-footprint trade-offs, it's useful to have some performance figures. For convenience, the figures quoted in this article were collected with mbed TLS 2.2 on a machine with a 64-bit CPU - more detail about the machine would be distracting, what matters here is not the absolute numbers, but the comparison between various curves and settings. You can reproduce those results on your machine using scripts/ecc-heap.sh from the mbed TLS sources.
The RAM figures only include heap usage, not the stack (this is a limitation of our measurement script). However, these should be meaning full as most memory used by elliptic curve operations will be on the heap. Just remember that RAM figures may be slightly lower on a 32-bit machine.
First it should be noted that the rest of the article is about curves using what is known as the short Weierstrass form: this includes NIST curves, the Brainpool curves, the "Koblitz" curves such as the one used by Bitcoin... in short, everything except newer curves such as Curve25519 and Curve448.
Let's get Curve22519 out of the way by saying that on the reference machine, it performed 358 ECDHE/s (ephemeral Elliptic Curve Diffie-Hellman key exchanges per second) using around 1500 bytes on the heap.
The main operation on elliptic curve is multiplication of a point by an integer, which is performed using additions and doubling, similar to the fast exponentiation algorithms you're probably familiar with. It is possible to boost performance by pre-computing some well-chosen multiples of the input point just in time before we multiply it by the chosen integer. (If you're curious, we're using what's known as a comb method.)
Of course, the more multiples you compute and store before performing the actual multiplication, the more RAM you'll use. On the other hand, while precomputing a few multiples gives better performance than pre-computing none, you don't want to compute too many either. The optimal size of the pre-computation table depends on the size of the curve.
By default, our multiplication function will select the table size that gives the best performance for the curve used. However, you might want to set an upper bound for the table size: this is what MBEDTLS_ECP_WINDOW_SIZE does. More specifically the value of this macro is the log of the maximum number of points that will be precomputed.
For example, for NIST P-256, the performance and RAM figures for various values of this parameter (with fixed point optimization disabled, see next section) are as follows:
2 3 4 5 6 ECDHE/s 44 50 53 53 53 Heap bytes 2064 2448 3592 3624 3680
As you can see, when moving from 2 to 4, performance increases as well as RAM usage, while larger values aren't helpful for performance (the code will select 4 as the effective value anyway) and don't change RAM usage as much.
You might have noticed that the first paragraph of the previous sections said some multiples are precomputed just in time, suggesting that they are discarded once the operation is done, and computed again for the next operation. Well this is only half of the story, but before we mention the other half let's back up a little bit.
Elliptic curves each come with a standard "base point" (also know as generator, so usually denoted by the letter G), for when the protocol requires a point known by all parties. For example, in ECDHE, one party generates a secret exponent a, computes aG, send the result to its peer, receives the peer's public share Q then computes the shared secret aQ.
Notice that for the first part (but not the second), the point to be multiplied is known in advance: it's G. So when we first multiply a G by an integer, we pre-compute well-chosen multiples of G as usual - but when we're done, we may want to keep those multiples around in order to speed up subsequent multiplications of G by other integers. On the other hand, keeping that table around obviously uses RAM: again we have a performance-memory trade-off.
That's why we made the MBEDTLS_ECP_FIXED_POINT_OPTIM: set it to 1 in order to keep the table around, 0 to discard it.
For example, the performance and RAM figures for NIST P-256 without and with fixed point optimization (with max window size set to 4, see previous section) are as follows:
0 1 ECDHE/s 53 76 Heap bytes 3592 5360
The ECC implementation in mbed TLS uses some memory-performance trade-offs. The defaults values are chosen with performance in mind, but you can easily adjust them to reduce RAM usage in order to get the trade-off that best fits your particular needs.
It should be noted that newer curves such as Curve25519, with a simpler implementation that doesn't use any trade-offs, manage to deliver superior performance while using less RAM compared to the NIST curves.