Node stuck on block 408373

Hi
https://explorer.aeternity.io/ stuck on block 408373
And my local node v5.10.1 synced from latest backup archive from https://downloads.aeternity.io/ (date 29.03.2021, md5 135cc531aa3716db1fbedcebc85a3d75) stuck on the same block with error:
[info] Failed to add synced block 408374: {root_hash_mismatch,<<183,198,226,192,128,29,14,105,206,55,86,64,185,221,154,66,135,234,71,122,235,125,199,223,246,166,77,186,15,115,102,173>>,<<208,6,152,198,139,111,114,56,162,190,138,208,75,50,247,139,240,24,243,64,229,46,145,56,18,58,198,135,216,251,85,99>>}

When will be a fix for this issue? And do we need expect a huge rollback from current top to block 408373?

3 Likes

I told @oleg to post this issue here. it seems like some nodes stuck on block 408373 (including the node that runs the new middleware)

cc @dimitar.chain

1 Like

There is a bug that just got exposed by chance, upgrading your node to 5.10.1 should work. It is a bit more complicated with the MDW but we will get it running soon.

The bug is accidentally binding an external library to protocol. This library has a function that introduces non-determinism according to its different versions :slight_smile: That means that if a node uses a different library version it will yield a different result, hence transaction checks will fail and valid blocks suddenly become invalid. This poses no risk to user’s tokens, nor the network security, simply a handful of nodes are stuck. Since most of the nodes are running the latest version, they face no issue whatsoever. So upgrading your node to latest version shall be enough to unblock it.

3 Likes

in regards to the middleware I was wondering how to easily build a production ready docker image. I opened a thread Ae-image-builder / node and plugins recently but somehow this didn’t get any attention.

however, my image builder is working but I think there is room for improvement.

Thanks for the clarification of the bug nature.
I running Ubuntu “18.04.4 LTS (Bionic Beaver)” and tryed node of the latest version 5.10.1. Prebuilded one and one I builded myself from source. Both cant sync futher than block 408373.

I hope a fresh blockchain db backup archive may help. Looking forward for it.

You can try with the latest docker image:

  1. You stop your node
  2. You start the latest docker image on with port 3015 forwarded and you mount the folder the DB lives under as a docker’s volume. There is an example here
  3. Once your node syncs beyond this poisonous transaction, you can stop the docker and start your 5.10.1 node

This is how we unstuck the MDW’s node. Since it depends on the node’s software, it had to be upgraded to 5.10.1 as well :slight_smile: Note that this workaround above could work or it could not not work, it really depends on your hardware.

Early next week we will provide a proper fix and a new DB snapshot

4 Likes

Thanks. Approach with docker worked well.

5 Likes

When will the fixed version be released?

Some users still report the issue.

1 Like

We expect tomorrow but this is only if everything goes by plan.

4 Likes

Having a proper fix still takes time and once it is done, we will do a full sync to make sure we don’t break backwards compatibility. This would require some more time, sadly. The release will not happen today. For whoever is still having a stuck node, they could use the latest DB snapshot. It dates from 5th April and it is after the stuck node height.

2 Likes

Do you know which library was causing the problem? I was updating my node and started syncing from scratch until I reached this block but will try syncing with the docker instance in order to fetch the db and start from there.

2 Likes

Yes, we are well aware which is the library that causes this, a fix is being tested still. You don’t need the docker image for the full sync, only for the generation with height 408373: so you can sync your node up to the stuck point, stop your node and start the docker image for a while, till it syncs beyond that generation (it should be fast). Then you can stop the docker image and resume the sync with your node again :slight_smile:

1 Like

My Docker cannot start any version after 5.8.0. Is there any environment variable that needs to be added after that?
I was able to start it normally before 5.8.0. The only difference is:
ERTS_LIB_DIR --> SYSTEM_LIB_DIR

Here is the Docker startup log:

/home/aeternity/node
Root: /home/aeternity/node
Exec: /home/aeternity/node/erts-10.7.2.3/bin/erlexec -boot /home/aeternity/node/releases/5.11.0/aeternity -mode embedded -boot_var SYSTEM_LIB_DIR /home/aeternity/node/lib -config /home/aeternity/node/releases/5.11.0/sys.config -args_file /home/aeternity/node/releases/5.11.0/vm.args – console -noinput

So last night we’ve released 5.11.0 that should unlock any stuck nodes, you can get it from here:

I see @LiuShao.chain is facing a different issue: I summon @dincho.chain to help.

3 Likes

Could you please check if your processor supports AVX2 instructions ?

cat /proc/cpuinfo | grep avx2

Oh, my God. It may not support it, although I haven’t looked it up yet. Because it runs on my NAS, its hardware is low.
is intel celeron J3455

It is a custom system that may not be supported by regular instructions. I rarely go into its command-line interface, which carries its own desktop.

I checked that it does not support AVX2(Advanced Vector Extensions 2)

OK, then that’s the reason

How should I solve it? I can’t replace the hardware.

You should build the node docker image locally on that specific hardware.

2 Likes