Response Time is Crucial

We’ve been having technical issues recently and they all were caused by one serious problem—our nodes were not responsive enough. This literally means that they were not responding to HTTP requests as fast as it would be required for the network to function correctly. Why response time is important and what happens when nodes get slow? Here is my post-mortem analysys. Yes, the problem is gone and now it’s time to analyze why the speed of data delivery is so crucial for Zold.

As you probably know from our White Paper, there are three basic operations in Zold: fetch, push, and merge. The first two (fetch and push) interact with the network of distributed nodes, the last one (merge) works locally, putting wallet files together and deciding which set of copies is the most trustable.

Zold network is decentralized and this means, first of all, that it consists of anonymous nodes—the servers we can’t trust. We can’t trust their data and we can’t trust their responsiveness. If they are too slow to respond, we can’t wait and slow down the entire fetch or push operation. We have to move on and ignore any particular node, which is too slow. To fetch a wallet from the network we have to make a number of HTTP GET requests to a number of nodes. All of them or most of them will return the content that is merged later into the target wallet file.

Each HTTP request has a timeout parameter associated with it. If a request takes longer than that number, it gets terminated and its results are ignored. This is done in order to isolate good nodes from bad ones and ensure that the speed of the entire operation is high enough.

If just a few nodes are slow, while the rest are fast, it’s not a big deal. We still get the data we need fast enough and we still can use them for the merge operation, since we can compare the scores received and decide which set of copies dominate.

However, if most of the nodes are slow, the entire network collapses. We simply can’t make any reasonable decisions about the data in the wallets, because we reject the majority of copies coming from the network. We reject too often and that affects the accuracy of data. Not just the speed, but the accuracy and trustworthiness of information in the wallets!

When too many nodes are too slow, they look dead to most of its clients and other nodes. If they are dead, the information they provide can’t be trusted. If the information can’t be trusted, we can’t merge wallets anymore with any reasonable guarantee.

Thus, it’s absolutely critical for Zold to make sure its nodes are fast enough. At the moment here is the data I’m getting from one of the nodes:

$ ab -n 1000 -c 10 http://b2.zold.io:4096/wallet/0000000000000000
This is ApacheBench, Version 2.3 <$Revision: 1826891 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking b2.zold.io (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests


Server Software:        thin
Server Hostname:        b2.zold.io
Server Port:            4096

Document Path:          /wallet/0000000000000000
Document Length:        4126 bytes

Concurrency Level:      10
Time taken for tests:   19.004 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      4640190 bytes
HTML transferred:       4126000 bytes
Requests per second:    52.62 [#/sec] (mean)
Time per request:       190.043 [ms] (mean)
Time per request:       19.004 [ms] (mean, across all concurrent requests)
Transfer rate:          238.44 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       45   53   4.9     52      99
Processing:    72  135  45.4    123     403
Waiting:       71  134  45.4    122     403
Total:        122  188  46.3    176     465

Percentage of the requests served within a certain time (ms)
  50%    176
  66%    193
  75%    205
  80%    215
  90%    250
  95%    279
  98%    321
  99%    361
 100%    465 (longest request)

As you see, the majority of requests finish in less than 200 milliseconds. This is good enough for Zold network to function correctly. We will definitely aim for higher speed in the future, but for now these numbers are good enough.

PS. You may check the speed of all visible nodes here.