Improved mining speed - Multi GPU

hanssv.chain · December 7, 2018, 2:39pm

DISCLAIMER: This is still work in progress, it has only went through limited amounts of testing. The official release of this will be in v1.1.0 planned for next Thursday (December 13th)

NOTE: This update changes the interface between the miner and the node, thus you need to build both the miner and the node to use this update. The update is in a branch named multi_gpu_v2 on GitHub - aeternity/epoch .

Following the discussion on the inefficiency that follows from only doing one graph exploration in a GPU context, adding the overhead of creating the context to each attempt, we have re-designed how the miner interfaces with the ~~epoch~~ aeternity node. By doing this we can do multiple attempts in the same GPU context improving the mining rate.

Instructions

To try this branch you should follow the exact steps of the multi GPU mining guide prepared by Chris: Deploy AE Mainnet CUDA MultiGPU Miner. | by Chris | Medium
with two important exceptions:

To clone the repository you should replace multi_gpu with multi_gpu_v2 when cloning the repository, i.e. run
git clone -b multi_gpu_v2 https://github.com/aeternity/epoch.git multi_gpu && cd multi_gpu
(and no, the name of the directory does not matter it can still be multi_gpu…)
When writing the epoch.yaml file you should also add repeats: N as well instances: X. Note that X and N are different numbers.

This will run N cuckoo puzzles in the same GPU context (for each instance), which should improve mining speed. I.e one mining “round” will try N x X nonces. But N needs to be calibrated. Look into the log/epoch_mining.log file of the node and check the interval between successive Start mining lines, like in this example:

2018-12-06 15:21:57.141 [info] <0.1267.0>@aec_conductor:start_mining:674 Starting mining
2018-12-06 15:22:02.413 [info] <0.1267.0>@aec_conductor:start_mining:674 Starting mining

Here it is 5.3 seconds between the mining attempts which is a bit high, for best network performance it should be between 3-5s. So adjusting N until the miner hits this sweetspot is crucial. Note: the node has to be restarted between changes to epoch.yaml. I suggest start with N = 5 and adjust accordingly.

Update

Saturday, Dec 8th, 9:30 CET
The initial version reports an error when it fails to find a solution in a mining attempt. This is fixed and it should now report a debug message. No need to re-build really, just ignore the error.

Monday, Dec 10th, 9:30 CET
Lifted the branch multi_gpu_v2 on top of Tromp’s latest updates. I.e. it now supports the -c flag to reduce CPU load.

Btczuma · December 7, 2018, 3:01pm

I am trying. But in the previous yaml file, I didn’t use “repeats: N”. What is this parameter? What should be set?

hanssv.chain · December 7, 2018, 3:04pm

No, that is kind of why I wrote this as one of the two exceptions to also add repeats: N…

And it means that instead of solving one cuckoo puzzle in each GPU context we solve N puzzles. Please read the rest carefully so you pick a suitable N.

Btczuma · December 7, 2018, 3:39pm

Hello Hans! I have compiled V2miner, but I did not find the correct format for “repeats” in the documentation at –https://github.com/aeternity/epoch/blob/master/docs/configuration.md--. As join the repeats, i will get an error. Please tell me the standard format of “repeats”. Thank you

hanssv.chain · December 7, 2018, 3:48pm

Look at the documentation for the multi_gpu_v2 branch, not the documentation for master branch.

But it goes under mining > cuckoo > miner just like instances: X

Btczuma · December 7, 2018, 4:17pm

I can feel that the GPU usage is getting higher. I will give you feedback on the operation.THKS

Kryztoval · December 7, 2018, 6:35pm

Will this be ported to experimental for windows?

simply-vc-fran · December 7, 2018, 7:27pm

Thanks for this!

Does this still have an inefficienty for multiple GPUs. Would it be more efficient if I keep it to one node per GPU?

hanssv.chain · December 7, 2018, 7:41pm

I don’t have any numbers on that I am afraid I don’t have the necessary hardware to do the experiments… Please report you findings!

ROB · December 7, 2018, 10:11pm

I attempted this and got it running. Noticed a runtime error when looking at the epoch.mining.log to set timings.

018-12-07 13:51:35.916 [info] <0.1266.0>@aec_conductor:start_mining:674 Starting mining
2018-12-07 13:51:39.882 [error] <0.1266.0>@aec_conductor:handle_mining_reply:718 Failed to mine block, runtime error; retrying with different nonce (was 17599323422117554270). Error: no_solution
2018-12-07 13:51:39.885 [info] <0.1266.0>@aec_conductor:start_mining:674 Starting mining
2018-12-07 13:51:43.841 [error] <0.1266.0>@aec_conductor:handle_mining_reply:718 Failed to mine block, runtime error; retrying with different nonce (was 17599323422117554278). Error: no_solution
2018-12-07 13:51:43.845 [info] <0.1266.0>@aec_conductor:start_mining:674 Starting mining
2018-12-07 13:51:47.804 [error] <0.1266.0>@aec_conductor:handle_mining_reply:718 Failed to mine block, runtime error; retrying with different nonce (was 17599323422117554286). Error: no_solution
2018-12-07 13:51:47.806 [info] <0.1266.0>@aec_conductor:start_mining:674 Starting mining
2018-12-07 13:51:51.751 [error] <0.1266.0>@aec_conductor:handle_mining_reply:718 Failed to mine block, runtime error; retrying with different nonce (was 17599323422117554294). Error: no_solution
2018-12-07 13:51:51.753 [info] <0.1266.0>@aec_conductor:start_mining:674 Starting mining
2018-12-07 13:51:55.666 [error] <0.1266.0>@aec_conductor:handle_mining_reply:718 Failed to mine block, runtime error; retrying with different nonce (was 17599323422117554302). Error: no_solution
2018-12-07 13:51:55.669 [info] <0.1266.0>@aec_conductor:start_mining:674 Starting mining
2018-12-07 13:51:59.575 [error] <0.1266.0>@aec_conductor:handle_mining_reply:718 Failed to mine block, runtime error; retrying with different nonce (was 17599323422117554310). Error: no_solution
2018-12-07 13:51:59.578 [info] <0.1266.0>@aec_conductor:start_mining:674 Starting mining
2018-12-07 13:52:03.525 [error] <0.1266.0>@aec_conductor:handle_mining_reply:718 Failed to mine block, runtime error; retrying with different nonce (was 17599323422117554318). Error: no_solution
2018-12-07 13:52:03.528 [info] <0.1266.0>@aec_conductor:start_mining:674 Starting mining
2018-12-07 13:52:07.462 [error] <0.1266.0>@aec_conductor:handle_mining_reply:718 Failed to mine block, runtime error; retrying with different nonce (was 17599323422117554326). Error: no_solution
2018-12-07 13:52:07.465 [info] <0.1266.0>@aec_conductor:start_mining:674 Starting mining
2018-12-07 13:52:11.367 [error] <0.1266.0>@aec_conductor:handle_mining_reply:718 Failed to mine block, runtime error; retrying with different nonce (was 17599323422117554334). Error: no_solution
2018-12-07 13:52:11.370 [info] <0.1266.0>@aec_conductor:start_mining:674 Starting mining

Btczuma · December 8, 2018, 1:36am

Yes, in the mining.log, it is the same as you. But in epoch_pow_cuckoo, it seems to work fine. Now only Hans is coming to help.

epoch_mining.log ：
2018-12-08 09:27:13.006 [info] <0.1263.0>@aec_conductor:start_mining:674 Starting mining
2018-12-08 09:27:16.389 [error] <0.1263.0>@aec_conductor:handle_mining_reply:718 Failed to mine block, runtime error; retrying with different nonce (was 13019665358618762053). Error: no_solution

epoch_pow_cuckoo.log:
2018-12-08 09:27:52.125 [debug] <0.677.7>@aec_pow_cuckoo:parse_generation_result:501 Seeding completed in 70 + 73 ms
2018-12-08 09:27:52.125 [debug] <0.677.7>@aec_pow_cuckoo:parse_generation_result:501 12-cycle found
2018-12-08 09:27:52.125 [debug] <0.677.7>@aec_pow_cuckoo:parse_generation_result:501 6-cycle found
2018-12-08 09:27:52.125 [debug] <0.677.7>@aec_pow_cuckoo:parse_generation_result:501 2-cycle found
2018-12-08 09:27:52.125 [debug] <0.677.7>@aec_pow_cuckoo:parse_generation_result:501 22-cycle found
2018-12-08 09:27:52.125 [debug] <0.677.7>@aec_pow_cuckoo:parse_generation_result:501 570-cycle found
2018-12-08 09:27:52.125 [debug] <0.677.7>@aec_pow_cuckoo:parse_generation_result:501 findcycles edges 57338 time 11 ms total 494 ms
2018-12-08 09:27:52.125 [debug] <0.677.7>@aec_pow_cuckoo:parse_generation_result:501 Time: 495 ms
2018-12-08 09:27:52.125 [debug] <0.677.7>@aec_pow_cuckoo:parse_generation_result:501 0 total solutions
2018-12-08 09:27:52.317 [debug] <0.681.7>@aec_pow_cuckoo:generate:81 Generating solution for data hash <<160,36,89,250,31,130,131,165,173,80,192,142,138,102,69,177,75,134,94,6,178,119,16,244,60,141,206,132,188,102,56,1>> and nonce 13019665358618762251 with target 520169504.
2018-12-08 09:27:52.317 [info] <0.682.7>@aec_pow_cuckoo:generate_int:223 Executing cmd: “./cuda29 -h 6F43525A2B682B4367365774554D434F696D5A467355754758676179647844305049334F684C786D4F41453D -n 13019665358618762251 -r 3 -E 1 -d 0”
2018-12-08 09:27:52.418 [info]strong text

djkyf · December 8, 2018, 1:58am

me too

2018-12-08 00:49:24.392 [error] <0.1266.0>@aec_conductor:handle_mining_reply:718 Failed to mine block, runtime error; retrying with different nonce (was 16731439899052739363). Error: no_solution
2018-12-08 00:49:24.398 [info] <0.1266.0>@aec_conductor:start_mining:674 Starting mining

hanssv.chain · December 8, 2018, 8:23am

It looks like I’ve screwed up the return value from the miner, but it is a completely harmless error-report. The miner will work exactly as it should, just that it reports an error when it fails to find a solution.

I’ll try to fix it later since it looks ugly, but have limited time today…

hanssv.chain · December 8, 2018, 8:32am

The error reported by the miner:
2018-12-08 00:49:24.392 [error] <0.1266.0>@aec_conductor:handle_mining_reply:718 Failed to mine block, runtime error; retrying with different nonce (was 16731439899052739363). Error: no_solution

is totally benign, just the wrong thing returned by the more efficient miner. But if you pull the branch multi_gpu_v2 again I have pushed a patch that fixes this and instead gives the expected debug message!

Btczuma · December 8, 2018, 9:45am

This is great, I have been worried. Thank you for your work, efficiency has increased.

gunray · December 8, 2018, 11:45am

I think I’m at about 3 solutions / second per 1080ti with 6gpu setup so its an 4-5x increase in efficiency : )

still probably about 20-30% room to go compared to stratum consistent work feed but already a HUGE difference

AppKoder · December 8, 2018, 12:06pm

Here it is 5.3 seconds between the mining attempts which is a bit high, for best network performance it should be between 3-5s. So adjusting N until the miner hits this sweetspot is crucial. Note: the node has to be restarted between changes to epoch.yaml. I suggest start with N = 5 and adjust accordingly.

Why do we need to target 3-5s. Isn’t it more efficient for the miner to just continue hashing until it finds a valid nonce? Is it because the miner does not return until it finishes checking all N nonces?

hanssv.chain · December 8, 2018, 12:13pm

No, it is because the miner get the blockhash to mine on as input. And the candidate block will change as soon as there is a new micro block, this will happen every 3 seconds with the BitCoin NG parameters used in the Aeternity network. Thus with the current design the miner can’t be (much) more long running. We are looking into daemonizing the miner but this is a much more involved project than the quick re-design made this week…

Btczuma · December 8, 2018, 1:35pm

Can you share your thoughts? What is the best set of repeats?

ROB · December 8, 2018, 4:32pm

Thanks I will pull it today and test. Thanks for all the hard work!