PolarSSL is now part of ARM Official announcement and rebranded as mbed TLS.

MBEDTLS takes long connection time


Oct 24, 2017 14:53
Vikas

Hello,

I am trying to run MBEDTLS on embedded platform as client and trying to connect to remote cloud server. Main issue we are facing is the connection time it takes to connect to server.

When i try to connect to server using below configuration it takes about 2sec to connect. Long connection time to server is creating problem. We are using TLS-ECDHE-RSA-WITH-AES-128-GCM-SHA256"

/* Save RAM at the expense of ROM */

define MBEDTLS_AES_ROM_TABLES

/* Save RAM by adjusting to our exact needs */

define MBEDTLS_ECP_MAX_BITS 512

define MBEDTLS_MPI_MAX_SIZE 1024 // 384 bits is 48 bytes

/* Save RAM at the expense of speed, see ecp.h */

define MBEDTLS_ECP_WINDOW_SIZE 2

define MBEDTLS_ECP_FIXED_POINT_OPTIM 0

/* Significant speed benefit at the expense of some ROM */

define MBEDTLS_ECP_NIST_OPTIM

Based on below link, I tried to change few parameter to improve the connection time but it didnt helped. https://tls.mbed.org/kb/how-to/reduce-mbedtls-memory-and-storage-footprint

/* Save RAM at the expense of speed, see ecp.h */

define MBEDTLS_ECP_WINDOW_SIZE 6

define MBEDTLS_ECP_FIXED_POINT_OPTIM 1

After above changes, still it takes same time for connection. Appreciate help in guiding how to reduce handshake time and improve performance.

 
Oct 26, 2017 13:58
Yum

Unfortunately, AES engine in mbed performs quite badly. I also made a separate post for exactly the same issue that handshake as well other operations were quite slow compared to openssl.

On i7 4600U, openssl is 300% faster. On ARM, openssl wins by evens better margin.

 
Oct 27, 2017 12:23
Gilles Peskine

Hi Vikas,

First, are you sure there is no difference? Changes to config.h only may not be picked up as a dependency change by the makefile, resulting in unchanged object files. Run make clean; make if in doubt.

The time to start a TLS connection is usually dominated by two things: network delays, and asymmetric cryptography. A TLS handshake involves two packet exchanges: client → server → client → server → client. The client needs to perform an expensive asymmetric cryptography operation after each of the two packets it receives from the server.

There is a way to reduce a lot of overhead the second time a client connects to the same server, called session resumption. On Mbed TLS, set the option MBEDTLS_SSL_SESSION_TICKETS in config.h, and call mbedtls_ssl_set_session and mbedtls_ssl_get_session to copy data from a previous session to the same server. This requires support on the server side as well, for resumption either with session tickets (RFC 5077) or with session id.

You could try different ciphersuites. However the parameters you've chosen look good at first glance, in particular the choice of ciphersuite (ECDHE-RSA is usually the best choice when client-side performance is the bottleneck) and setting MBEDTLS_ECP_WINDOW_SIZE to 6 (higher is faster at the expense of memory, 6 is the maximum).

Have you measured timings to see which part of the handshake is slow, e.g. with Wireshark? This could help to see where to look for improvements.

You can also turn off the debugging features by commenting out MBEDTLS_DEBUG_C. This should result in a lower memory usage and slightly better performance.

Best regards,
Gilles Peskine — Mbed TLS team

 
Oct 27, 2017 16:00
Gilles Peskine

P.S. One thing that may speed up the connection time (specifically, the second message sent by the client) is to select a small key size for the ECDHE part. Try compiling your client with MBEDTLS_ECP_DP_SECP256R1_ENABLED defined, but none of the other MBEDTLS_ECP_DP_xxx macros. (If you need other curves for other reasons in your application, call mbedtls_ssl_conf_curves to select one or more of the MBEDTLS_ECP_DP_xxx constants.) This will select a smaller key size for ECDHE that's still acceptable for security. Do make sure that your server supports the selected curve (it's typically configurable). Within the ECP_DP_SECPxxx group, the smaller the xxx number the faster.

By the way, you can probably save some memory by reducing MBEDTLS_ECP_MAX_BITS and MBEDTLS_MPI_MAX_SIZE. MBEDTLS_ECP_MAX_BITS only needs to be as large as the EC curve size, e.g. 256 for SECP256R1, 384 for SECP384R1, etc. For MBEDTLS_MPI_MAX_SIZE, the value 512 is enough for RSA key sizes up to 4096 bits.

Best regards,
Gilles Peskine — Mbed TLS team

 
Nov 8, 2017 12:27
Vikas

Hi Gilles

Thank you so much for your response.

Yes. If I change ECP_WINDOW_SIZE to 6 and ECP_FIXED_POINT_OPTIM to 1 there is improvement in connection time but that increases memory too. As I am running it on embedded platform my goal is to have good connection time and without consuming too much memory.

I spent more time to find out what part of connection takes long time. I am trying to connect to local MBEDTLS server using ECDHE-RSA-GCM-SHA256 cipher suite with sec256R1 curve. There are two exchange in handshake which takes more time and due to which it result into long connection time.

Client key exchange takes about ~200ms to generate public/private key pair and generate shared secret key and certificate verify message takes about ~300ms. I know asymmetric key operation will take more time but still its huge and that leads to long connection time.

Client key exchange function call and execution time : mbedtls_ecdh_make_public - 114ms - mbedtls_ecp_gen_keypair_base - - mbedtls_ecp_mul -
- ecp_mul_comb - 96ms

If we observe above function call multiplication operation is taking more time. I am not sure how to make these operations faster on embedded platform. But I see micro ecc library which is doing multiplication operation faster and function EccPoint_mult takes about ~20ms.

I see assembly code present in microECC for ARM platforms and same in MBEDTLS ( file bn_mul.h ). But I dont know how to enable it. Is there anything specific I need to configure to enable that assembly code?. currently MBEDTLS_HAVE_ASM is enabled in config.h.

 
Nov 8, 2017 13:08
Ron Eldor

Hi Vikas,
As I mentioned in the other post, please verify that in fact the ASM file in bn_mul.h is being compiled.
The ASM code is specific to the supported instruction set, and it is very much likely that your platform is currently not supported in bn_mul.h. The decision of what and if the ASM code is being compiled is done at compile time. The definition is either done by the toolchain, or in the config.h file (e.g. MBEDTLS_HAVE_SSE2 ).
Note that these operations, by nature, take time, as they include complex calculations on big numbers.
Regards,
Mbed TLS Team member
Ron

 
Nov 9, 2017 19:53
Vikas

Thank you Ron.

Yes. I verified that ASM code in bn_mul.h is getting compiled as toolchain setting the instruction set. So the execution time of functions I mentioned above are with the ASM enabled.

I understand that these operations are complex and multiplication will take time. But I am not sure how come libraries like microECC able to finish these operations faster. Here some more data on execution time for MBEDTLS & microECC functions.

While generating key-pair : micro ECC : EccPoint_mult fun takes ~ 16ms ( with uECC_ASM set as uECC_asm_fast) micro ECC : EccPoint_mult function takes ~ 45ms ( with uECC_ASM as uECC_asm_none) MBEDTLS : mbedtls_ecp_mul function takes ~ 95ms

If we observe above, by enabling assembly code in microECC performance is improving lot. Whereas I dont see any assembly for ECC multiplication.

Can you suggest any configurations which can improve execution time for mbedtls_ecp_mul ?. or may be any other pointers to achieve performance like microECC using MBEDTLS.

 
Nov 12, 2017 12:08
Ron Eldor

Hi Vikas,
Thank you for your confirmation. As the bn_mul.h file contains many varieties of ASM code, for several platforms, it is possible that some of the platforms optimization code should be updated. I can suggest you try compiling with several optimization flags, which may increase performance.
Unfortunately, I don't have configuration definitions other than what was already suggested, to suggest you, at the time being.
Regards,
Mbed TLS
Team member

 
Nov 15, 2017 20:08
Vikas

Thanks Ron.

I just increased ECP_WINDOW_SIZE to 4 and I see overall ECC multiplication time is reduced by ~30ms. But after increasing ECP_WINDOW_SIZE to 4 now double_Jacobian function execution time increased by 12ms. I am not sure why. Any idea why this function takes more time now.

If we could run double jacobian function in the same time with ECP_WINDOW_SIZE 2 then we can improve more. Atleast for now I am not sure what are the other ways to optimize performance on embedded platform. Because atleast configuration what we are using that looks optimized.

 
Nov 16, 2017 08:28
Ron Eldor

HI Vikas,
Have you considered modifying MBEDTLS_MPI_WINDOW_SIZE as well?
Regards,
Mbed TLS Team member
Ron

 
Nov 16, 2017 08:30
Manuel Pégourié-Gonnard

Hi Vikas,

The constant MBEDTLS_ECP_WINDOW_SIZE controls a time-memory trade-off where some values are pre-computed to make the rest of the scalar multiplication algorithm faster (if you're curious, we use a comb method, which is somewhat similar a window method for fast exponentiation). So it is normal and expected that when increasing the window size, more time is spent of some parts (more values are pre-computed) and less on others (final part) in a way that the overall time is reduced (and memory consumption increased).

I think it is clear from that description that it is not possible to reduce (or even keep constant) the time it takes to pre-compute values when we're pre-computing more of them. However, overall execution time is still reduced as you observe.

Regards,
Mbed TLS team member
Manuel

 
Nov 23, 2017 12:05
Jayasankar

Hi,

As per the discussion in this thread, I have also tried changing the ECP_WINDOW_SIZE to 4 and this reduced the tls handshake time by ~30ms on my platform.

To reduce the handshake time further I have tried setting MBEDTLS_ECP_FIXED_POINT_OPTIM to 1. But with this flag, handshake time did not come down further.

As per the description about MBEDTLS_ECP_FIXED_POINT_OPTIM flag, enabling it should speed up the repeated multiplication of the generator.

Could you please let me know what may be the reason for no change in handshake time even after setting MBEDTLS_ECP_FIXED_POINT_OPTIM to 1

I have tried with MBEDTLS_TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 and TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 Cipher Suites.

 
Nov 30, 2017 16:22
Jayasankar

Hi Ron/Manuel,

Could you please provide some details about MBEDTLS_ECP_FIXED_POINT_OPTIM config?

I have tried enabling MBEDTLS_ECP_FIXED_POINT_OPTIM to reduce the TLS handshake time. But there is no difference in handshake time even after enabling it.

I have tried with MBEDTLS_TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 and TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 Cipher Suites.

Thanks, Jayasankar.

 
Dec 18, 2017 09:47
Antonio

Hi all,

Please could you guide me, how did you get the micro-ECC code to work together with "mbedtls", I need to use it as well, but I am struggling to integrate it with mbedtls.

I am using ARM-M3.

Thanks in advance.

 
Feb 7, 2018 13:45
Jayasankar

Hi,

I have seen mbedtls 2.7 released. Is there any changes made in 2.7, that can improve the tls handshake time?

thanks, Jayasankar.

 
Feb 7, 2018 15:54
Ron Eldor

Hi Jayasankar,
Looking at the ChangeLog, you will see that this mostly introduces security fixes, support for HW acceleration for additional modules, and some other bug fixes. There weren't changes that directly pointed on improving the performance of the ECP calculations. Note that there isn't much that can be done, as this is a limitation of the operation itself, and very much dependent on the MCU you are using.
Regards,
Mbed TLS Team member
Ron

 
Feb 8, 2018 09:51
Jayasankar

Hi Ron, thanks a lot for clarifying.

Also to reduce the tls handshake time, I have tried replacing mbedtls "ecp_mod_p256" function with MicroECC's "vli_mmod_fast" and this is reducing "client key exchange" time by ~25ms. Here is the MicroECC modp implemenation(I have used uECC_WORD_SIZE == 4 config) https://github.com/kmackay/micro-ecc/blob/static/uECC.c

Could you please suggest if it is safe to use MicroECC's modp in mbedtls code? will there be any security related issues or any other issues?

Thanks -Jayasankar.