For shard takeover the malicious group needs to have ⅔+1 members in a single shard from the consensus buffer of 400. This equals 267 validators. If we consider having 33% of total network nodes as malicious, the probability of having 267 malicious nodes is 2.84*10^-47. This considering the new consensus mechanism where all the nodes from the shard has to sign the block.
A very old post on ethresearch: Building towards a "99%" Fault Tolerant Sharded System - currently provably ~67% - Security - Ethereum Research . Here the consensus size was 63 out of 400, a random 63 for every round.
Now after 5 years of a working mainnet, provably secure environment when considering the honest majority assumption, we can try fisherman once again in order to improve the security of services and the network even more.
After some long debates with Founder of Solana: https://x.com/aeyakovenko/status/1799105839925764176
there is a way to improve and I get his supposed problem is. Although the math is not in the favor of sharding.
In one of the tweets he said that Circle (or Binance) in a sharded network needs to trust the honest majority assumption for state root hash changes in shards where the service does not run nodes. So in some sense the attack vector is that the service is running an observer for shard0, because there is the deposit address, and if something happens on shard2 where the service does not run a node, he can’t detect a double mint/double spend there. And the difference in case of Solana is that, even if 100% of nodes are malicious, Circle will never accept invalid blocks as it is validating the whole state.
The problem: We know that any service can run full observing squad for all the shards, but it is true, that if one of those nodes discover an error in a signed block/they do not receive a block in the shard they run (malicious group does not send the blocks to them) they do not send a message to the full observer squad that a shard header/block is wrong.
The solution: your own nodes are the fishermans for your own service
Create a new internal p2p topic between the nodes of a service, not a public topic, but only between your own set of nodes in order to send messages about headers and blocks from different shards and from metachain as well. The messages need to be simple: attest that block header for shard, round, nonce with headerHash was correct. So if the service runs 3 shards + metachain, in the internal p2p network every shard and the metachain node will send the about message for all the blocks they are verifying, every node for their own shard. Verifying means that those nodes have all the data, processed the block and committed to their storage. These messages are accumulated in an internal pool of “commitedBlocks”.
Also, these nodes can also send messages in this p2p topic about blocks which are incorrect, or for which they did not receive all the data. Like they see a signed block but they do not have all the data for it. Or they tried to process a block, but it gave an error or a roothash missmatch or processing missmatch. These messages are accumulated in an internal pool of “incorrectBlocks”.
This information is used by all the other nodes when those are processing blocks. So when a metachain node is processing and finalizing the shardHeaders, the node will look into the internal pool of commitedBlocks and will not process the metachain block proposed by someone which has a shardHeader which is missing from the commitedBlocks pool. Now the same thing is done on a shard node, it will not process a shardBlock having the headerHash of a metachain block which is missing from the commitedBlocks pool or if there is a crossShardHeader from a ShardBlock which is missing from the commitedBlocks.
This resolves the above mentioned problem and the nodes will stop processing blocks and will stop at the last correct state for a given service even if there is an incorrect cross shard miniblock, because of a maliciously processed and commited block in one shard. So even if 100% of the network is malicious the service will not process anything incorrect as it validates everything. And even if one service wants (not obliged) to run nodes for all the shards, this still scales even more than running one big node for a non-sharded ecosystem. It still has horizontal scalability, as you just deploy more nodes, which are inexpensive, you do not depend on super hardcore and expensive machines or new hardware, it just works on everything, even on Raspberry Pi.
How this solutions scales to validators:
Right now over 95% of the validators run at least 4 nodes, and statistically speaking at any moment of time most of the validators will have 1 node in every single shard and metachain. With the current multiKey node solution, most of the validators run a full observing squad with multiKeys, so they validate all the blocks in all the shards at all the time. Now, when the above mentioned solution is used in case of validator nodes as well, it will mean that validators will not sign a metablock which finalizes a shardblock if that shardblock is missing from the commitedBlocks pool or if it is in the invalidBlocks pool.
Improve the solution if a validator/service does not run a node for all the shards:
When creating the setup for the nodes, in a specific config we can write the IP addresses of the nodes from which the node accepts information of correctness from another shard he does not validate. If there is a missing shard there, or there are less nodes than number of shards, the check whether there is a shardBlockHeader of hashX in the “commitedBlockPool” will return true at all times. If a node changes shard, as it is shuffled out, it will send a message of changed shard from X to Y and this change is saved in the “commitedBlockPool” component. So at every time the internal nodes will know what the other nodes are validating/processing, and if some shard is not validated actively, that will return true.
This means that even when we will have 200 shards and not everyone will run 200 nodes, there will be a constant ongoing cross shard validation process between the internal nodes, greatly enhancing the security of the network. A big groups of honest nodes statistically will be all over the chain, thus having even a bigger shared security en
So let’s look at some numbers:
Even now the network is safe with the honest majority assumption as demonstrated countlessly:
For shard takeover the malicious group needs to have ⅔+1 members in a single shard from the consensus buffer of 400. This equals 267 validators. Taken into consideration the assumptions that 25% of the total nodes are malicious, the probability of the malicious groups has super majority in one shard is 2.8810^-78. If we consider having 33% of total network nodes as malicious, the probability of having 267 malicious nodes is 2.8410^-47.
With the current setup of 3 shard + meta, we can demonstrate that security in any kind of assumption is the same as for a non sharded chain. One number someone put, was what if 65% of nodes are malicious, because in that case there is a 34% of chance of having more than 67% of nodes malicious in one shard and that might introduce problems, like one shard is corrupted and the bad information from there is propagated to others. With the proposed improvements, that shard may create an invalid block, but as the other shards will not have a dishonest majority, that cross shard information will not be accepted. Also, the invalid block will not be accepted by honest nodes, so no harm is created, funds are safe.
The chain can be hardforked from the last correct state. The same as in case of non sharded chains. By making the nodes inside a service/staking provider to be the fisherman’s/other shard verifiers, we create a shared security which does not depend on only economics, but execution.
There is no reason to not go sharding. The MultiversX sharded chain is as secure as a single state chain, but it is infinitely more scalable.
All the web and most of the software industries use partitioning, and there is no reason to not do the same in blockchain. Horizontal scaling is about deploying more nodes when you need more to process. It is far easier, cheaper to run multiple small nodes vs being forced to run insanely heavy machines. All the web is scaling by running multiple services and partitioning processing/data/storage on thousands of small machines, not by building supercomputers.