EOS Mainnet Update: New Node Architecture Greatly Improves EOS Reliability

In our previous blog post and most recent EOS Hot Sauce, we took a look behind the scenes at the collaborative work that’s been happening in the last few weeks between Block.one and the key block producers of the EOS Mainnet. Today we want to take a more technical look at the details of the architectural changes recently deployed by block producers. 

Troubleshooting

In mid-January, the EOS Mainnet started experiencing a much larger volume of incoming transactions, which led to a subsequent surge in microforks. The first hypothesis was that the high number of transactions was overwhelming the signing nodes (aka producing nodes). 

Key block producers first attempted to increase the CPU speed on signing nodes in order to compensate for the large transaction volume, which helped, but didn’t fix the problem. The signing nodes were now producing bigger blocks but still overloaded while trying to process incoming transactions. Furthermore, blocks were too slow to propagate to the next block producer and microforks remained too high.

After significant investigation, Block.one and key block producers identified the source of the problem: the high number of transactions hitting the signing node from multiple connections meant the node could not work efficiently. The solution required a reduction in the number of connections on the signing node. Ultimately, that meant having only 2 connections.

This is a significant change as block producers might have had multiple connections to public P2P cluster, multiple connections to API nodes, and other various connections. All of those needed to be replaced by only 2 connections.

First Connection: Blocks Only

Block producers organized to create a ‘blocks only’ P2P network to quickly deliver blocks between each other. Allowing block producers to ignore transactions means a reduced processing load – and when blocks arrive on time, microforks don’t happen.

So a newly produced block has to go from producer 1 -> block peer 1 -> block peer 2 -> producer 2.

For this to work properly, a few changes were needed in EOSIO from Block.one:

  1. Ability to establish bi-directional blocks-only connections (added in EOSIO 2.0.2 only)
  2. Ability to limit the CPU used to create blocks in the signing node (added in EOSIO 1.8.11 and 2.0.2).

As these changes were made available, block producers quickly deployed them and assigned a CPU-effort value of 50%. This dramatically reduced the number of dropped blocks.

This architecture is working very well at the time of publication, but a few more adjustments are needed to reduce the remaining dropped blocks further. To guide future changes, Block.one wrote a detailed explanation of how to configure settings to assure blocks arrive on time.

Second Connection: Transactions

In this new node architecture, block producers also had to create a transactions network. Note: the transactions network also includes blocks.

All incoming transactions, no matter where they are coming from, must consolidate into a single “transaction peer” node that connects to the signing node. This node must handle as many transactions as possible, but also must not overwhelm the signing node with too many transactions so as to allow the signing node to continue producing blocks efficiently.

To handle the incoming transaction volume, Block producers might wish to use EOSIO 2.0 with EOS-VM enabled. However, using EOS-VM would overwhelm an block signing node runing version 1.8. In this case an extra “barrier” 1.8 node is needed to slow down the transactions and allow the signing node to function efficiently.

While block producers were optimizing the block-only delivery mechanism, the ability to process transactions was temporarily reduced. This negatively impacted some dapps due to transactions getting lost. This should no longer be an issue.

Conclusion

Figuring out the problem and coming up with a solution was a very collaborative effort between the Top 21 and Block.one. This took a tremendous amount of work and coordination on the part of many teams located all around the world. For example, at one point several adjacent block producers sent in their logs to Block.one so issues could be traced going from one block producer to the next. It’s normal that some teams were more involved than others, but all the teams contributed what they had to. We also want to give a special shout out to WhaleEx who provided great leadership by coming up with the solution, as well as offering a lot of help implementing it.

As a final note, it’s important to know that not all block producers have the exact same configuration. Node topology, CPU speed, transaction load and EOSIO versions do have some variation as different block producers have different needs and requirements. EOS Nation continues to work with Block.one to work towards having more than 2 connections to the block signing node so that it is operating with efficiency and redundancy.

This summary should be helpful for other block producers on EOS as well as other EOSIO network block producers who might want to come up with similar design to prevent overloading.

Issues That Would Help With Troubleshooting:

GitHub EOSIO Pull Requests:

Diagram Details

Here are the configuration technical details:

  • Producer Node
    1.8.x or 2.0.x, wabt, speculative, cpu-effort-percent = 50 – 80
  • Blocks Peer Node
    2.0.x, eos-vm-jit, read-only, full validation
  • Transactions Barrier Node
    1.8.x, wabt, speculative, light validation
  • Transactions Peer Node
    2.0.x, eos-vm-jit, speculative, full validation

EOS Nation is a top 21 Block Producer on the EOS public network. We earn inflation rewards based on the percentage of tokens staked towards us. Those rewards are shared back with token holders through our Proxy4Nation Reward Proxy and also reinvested into EOSIO community, tools, and infrastructure. Help grow the ecosystem by staking your vote to eosnationftw or proxying to proxy4nation

2 thoughts on “EOS Mainnet Update: New Node Architecture Greatly Improves EOS Reliability”

  1. These updates for EOSNation are awesome and help me believe my investment in EOS is in good hands. It would be great if all BPs published a similar report from each of their perspectives on a regular basis. One of the greatest fuels for FUD in this is space (other than plain old lies) is silence. Thanks for your great work and please encourage other BPs to follow your lead in great comms.

    • Thank you for reading and for the wonderful comment! As for other BPs publishing similar reports, we don’t feel there is that much of a need for it. We are happy to take on this role of “leading communicator” within the Top21.

Comments are closed.

Daniel Keyes

Chief Operating Officer (COO)
Responsibilities include: product management, operations, community
Location: Toronto, Canada

Prior to founding the first EOS community in Toronto and co-founding EOS Nation, Daniel spent a decade in the financial technology industry working several diverse roles. His extensive experience in customer service, sales, sales coaching, agent training, digital marketing, digital process management (lean green belt), and product management (certified scrum master, certified product owner) eventually lead him to consulting for a blockchain dev shop.

Daniel earned a Bachelor of Journalism from Ryerson University in 2009 and worked as a chase producer intern at Global TV.

Daniel lives by the principles of Truth, Love, and Freedom.