Transaction broadcasted and then not found

So the rabbit whole lead me to a certain race condition in nodes communication. There was a lot of digging and eventually I was able to expose this an actually issue in a reproducible test. All tracked and already fixed in this PR. The fix would be deployed in the next release.

In advance: I have no proof this is what we are facing on main net. It is extremely hard to debug in such distributed environment, especially we don’t have any access to the interesting logs. What I am sure is that this is a legit bug scenario and it would be fixed on main net soon.

6 Likes

That’s great news, thanks for all the hard work :rocket:

2 Likes

Thanks for reporting it :slight_smile: Those race conditions are notoriously hard to detect and fix, so let’s see if this fixes it :slight_smile:

1 Like

Hey, I have one more question but I won’t spam with new threads.
Some of my transactions take blocks and blocks before ending up in the chain. I’m tracking one of my transactions now and it’s been almost two hours when it finally it got mined. I’ve checked all the preconditions and transaction looks ok. This is something I’ve started noticing in the past few weeks. Is this related with variable hash rate? Can I do anything about this? Set a bigger gas price, or something. This is slowly killing my UX as the user sees pending transaction on frontend and this can last for hours…

2 Likes

hey @filip, that is actually a very good question you have there, it would actually be wort a new thread but lets keep it here. did this phenomenon occur only when sending the TX through SuperheroWallet or directly through the JS SDK or in both cases ? Actually, in both cases you should get a reasonably well gas price estimation automatically, but you can also always set a higher gas price yourself.

I broadcast transactions using js-sdk, the usual . sendTransaction() method, with the default configuration. And I’ve done this on the mainnet/testnet for some time now, but only recently I’ve started noticing issues… Yeah I can try and double the gas price, I know the default value is 1e9, but I was always told that there is no need to change this value. Do you have any other suggestions? Thanks mate!

2 Likes

maybe @dimitar.chain knows whether the miners started having different gas prices which confuse the SDK’s gas price estimation or something ? did @philipp.chain run into this issue ?

I just experienced the a similar issue with my NameUpdateTx:

  • th_2HBR5mqiH23j1zUZK6mMbFGJFvMBkKNnqPvhKuPSL6qH1e3UGE

initially it was included in block 403530, then it was dropped and finally it was included in block 403534.

2 Likes

Ok, so I see 2 things now:

  1. The initial reports were that transaction is orphaned and then gone forever. This seems to be fixed with the introduction of release 5.10.1. Now orphaned transactions are reincluded again, only later on. Although this hurts finality, it seems like an issue had indeed been fixed but there is yet another one to track.
  2. So far we only had “the transaction is gone forever” which made the tracking of the issue really hard. Now we have some data, investigating generations 403530 - 403534 provides us with some clues what might be going on. I have 2 ideas I’d like to test, will keep you all updated.
2 Likes

Awesome! If you need more data, here’s another transaction hash:
th_hpJx1bEr28bh6dcPrZStTuJqpNCdKREeXRDdPDXSiZHdoEdRu

I tracked this one, and it went through the series of transitions: being included at height N only to be excluded moments later (height: -1), then again included in N+1 or N+2 and so on, you get the point. After few transitions, it eventually got mined and included in the chain at height 401597. This might be useful for your research. Let me know if I can help in any other way, cheers!

2 Likes

Yes, this was really valuable, thank you. This excluded a few possibilities…

kk. also happening now with regular spendTx,

th_2ocFpJmorLtyLAkAwM34DkC1XoPQj6JQnQqwWLs47G32nVy9Je

Just leaving this here, maybe you can get use of it

1 Like

Just to let you know, my transactions are now failing regularly. Can’t even spendTx, tried over baseaepp, arkane wallet, everything. Can you confirm? Did you try sending Ae today/yesterday? Try spending and see what happens with the tx and balances

1 Like

Yes, we are debugging this. We have a clue and it will be resolved soon.

5 Likes

hope to see this fixed soon, thanks! =)

5 Likes

We have some more information for you.

TLDR; this should be solved now :slight_smile:

Longer version: we did a lot of digging. Yes, the issue we solved last week was indeed a bug but it was not causing the problem at hand. We dug deeper and deeper and found some optimisation points but no bug that could cause transactions to simply disappear. We were checking syncing, gossiping and even block candidate generation. Everything seemed solid.

Thanks to your input here, we were actually able to trace a transaction (@filip’s) that was accepted in the blockchain and then rolled back. Then accepted again and once more rolled back. @hanssv.chain wrote a small script and we got an a-ha moment:

TX found in micro block mh_GJtShFamkJKL74B8CkG7oDoMEJMAhKq1DRMkjNVcUTq8GgvPw at 401550
TX found in micro block mh_28DLeSxooxSV3ciwsZNY8uXtPCkHqBrwfabHuFrjyt4Qrmdn63 at 401554
TX found in micro block mh_2XwKWVGg4dtqmFY18DQX2q7WDxvuAtPkgDRtQudcytDVzDiYoN at 401556
TX found in micro block mh_2NZvRnFwpcUZ3ZRrv5Pk9bko8mHXQjEZxMyHiRH4yjaJxAAcrr at 401558
TX found in micro block mh_2FGqfQMRjYC5uy6iFMmuZE7JHzFvbjQ3WEJUgvF1NMExsyHUJ at 401559
TX found in micro block mh_EJrYGFSg88ht6ZjPDpRyu3gYQAmamg58q3aWPKww47Ma7rQGm at 401564
TX found in micro block mh_2ivuthR78Tc2VUDDy2NY963a8Km5zPf2EVqCxnt3W3kE1bYP3R at 401569
TX found in micro block mh_RvLDXCeKvP4JLVP54pjwzSJFwVVmDiuY3TULZrzn4ZgVQqNWH at 401571
TX found in micro block mh_2orCBh99J4w4QC2uyKYofxxgefvuh9HCA4AZX5iMR8MB5Eo98U at 401572
TX found in micro block mh_hCm9gvQBXMhVH22FNriQaq9k72CuZuTZEaombmNRNEBByTRqm at 401573
TX found in micro block mh_ZfUHGhexYoTTN9wEgQSoB9Qxef4UUgP4eQT32DwsRt5zX89ai at 401574
TX found in micro block mh_5xHaR8q2YGEZaaaMzwss5AFNymJR29sXerNdRftiD1yVfKUMZ at 401575
TX found in micro block mh_2ecaiJiUXcPrvXK8DbaSbtazV488fZfinD6YJUhHLfBTm4mTiY at 401579
TX found in micro block mh_2VWqNdeHcojpcZt6QMnTTCTJMhtToD57uuySCLmzwVzpxifBcD at 401587
TX found in micro block mh_qebWGA84TgpcNsKBGgGzAH9wXkhi5gpk8PheBzwFDCdHMamCU at 401590
TX found in micro block mh_Ma3vJ5BrQY1mKXmWQSNoeFLDrbH11Abjz2HjwxAbo2WCM6fNz at 401592
TX found in micro block mh_3iYrGtqmtXY37n9EnAA6oSDUAzyqpkWxC52zz79XezwzWh1Ny at 401593
TX found in micro block mh_3kwBJexZFLzWCmMFtFe6F1kZjhW8F1xr7BAECdabuAubsxZyG at 401594
TX found in micro block mh_mfuvMHTumMD1zMemLFsTsugsa1i3egPacmUMXuJJfhC2JTrj3 at 401596
TX found in micro block mh_2hmwwX4hoTP7wzaXUom9rApsrEL9S19jT3tC2ssK8rPnvPpEZs at 401597

The transaction was reaching the miner (so no issue with sync or gossip), it was included in a block but then it was rolled back. After that it was included again (so block candidate was solid) only be rolled back again.

In the context of BitcoinNG microforks are expected. Keyblocks are being produced roughly every 3 minutes but micro blocks are PoW-free and can be issued every 3 seconds. This results in a really elastic and dynamic system, where you can expect a first confirmation in a matter of seconds. The downside is those micro blocks still have to propagate in the network, restart the miner and this usually means that the last few micro blocks in a generation are discarded in a microfork. There is nothing we could to there, the limit is pretty much the speed of light. A general rule of the thumb is if your transaction was included in the end of the 3 minutes interval, it is likely to get rolled back but included win the next generation.

After some further digging, we found out that not only a handful of micro blocks were rolled back but whole generations full of micro blocks. The cause was a configuration on the miner’s end, nothing to fix in the node, really. The miners applied the new setting around generation height 404840 and you can see the result below:

We do still have microforks but they are completely normal and expected. What is more - they are all short in the end of the generation.

Bottom line is the blockchain is a distributed system with a lot of moving parts and different actors. It is really hard to debug but that is a small price to pay for its benefits.

6 Likes

Great effort, kudos to you and Hans!
I was just worried because it was happening so often with the orphaned microblocks, especially because of the recent attacks and so on. But this is more clear now, the node config works obviously , the attached image speaks for itself :smiley:
Out of curiosity, what part of the node config was changed to achieve this dramatic reduction in orphaned microblocks?

4 Likes

Oh, no - it was a team-wide effort that included all of us, including @uwigeroferlang.chain, @dincho.chain and especially @gorbak25. At the end it was Gregorz that reached out to miners.

It was not a node’s setting, actually. The node faithfully produces correct block candidates to be mined upon.

If the mining happens on the node - the mining process is being interrupted in order to mine on top of latest block. There is a slight race condition there: since mining is async process a key block could be mined while we interrupt the process itself and in this case we still accept the valid key block, even it means micro forking the last micro block.

Since it is not 2010 anymore and individual miners are almost non-existent, all mining is done via pools. The setting that had to be tweaked was how often does the pool reset the hash the miners are mining. That hash represents the latest block (either a micro or a key block) and that would be the prev_hash of the newly minted key block. If that hash is an old one, then all newer blocks are micro forked. It is similar with the on-premise mining, it is still an async process but this time it is distributed between the pool and its miners, which introduces a lot of latency when restarting the hash being mined upon.

Again, distributed programming is hard but distributed programming in a trustless environment is a whole new level of hard. Since you’ve read everything above, here you go a popular joke about it:

There are only two hard problems in distributed systems:

2 . Exactly-once delivery
1 . Guaranteed order of messages
2 . Exactly-once delivery

3 Likes

Does this problem exist in hyperchains?

1 Like

Wow, good job team, really!

Yeah the way you put it I can understand the nature of this issue, but I can only imagine the actual complexity level of debugging/discovering this…

Good one :joy::joy::joy:
I did my fair share of the distributed systems back in the college days, but the Concurrency - The Works of Leslie Lamport is still sitting on the shelf waiting , I’m not brave enough :joy:

3 Likes