Expected 1080 Ti performance

tromp · November 29, 2018, 1:14pm

A 1080 Ti solves a graph on 2^29 nodes in about 225 milliseconds, giving a rate of 4.4 Graphs Per Second (GPS). Only about 1 in 42 graphs has a 42-cycle, so the solution rate is about 1 every 10 seconds. This assumes the GPU solver context is maintained between graphs, as with running the standalone solver on a range of nonces.
If you are building a new GPU context (i.e. allocating GBs of device memory) for every nonce then you’re doing it wrong…

tromp · November 29, 2018, 2:54pm

I’ll change my cuckoo CUDA solver to make it easier to run it through a function interface, as currently done for the cuckatoo solver in Grin:

github.com

mimblewimble/grin-miner/blob/master/cuckoo-miner/src/miner/miner.rs

// Copyright 2020 The Grin Developers
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//     http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

//! Main interface for callers into cuckoo-miner. Provides functionality
//! to load a mining plugin, send it a Cuckoo Cycle POW problem, and
//! return any resulting solutions.

use std::ptr::NonNull;
use std::sync::{mpsc, Arc, RwLock};

This file has been truncated. show original

Hopefully the AE devs can adopt a similar interface to avoid a solver restart at every nonce.

gunray · November 29, 2018, 9:21pm

Just to throw some numbers for 1080ti and AE’s way of mining.

This is for G4540 and some slow RAM, times may depend on these heavily.

With 1 GPU, a single iteration (time between the solver launches) takes about 550ms. This alone is a substantial inneficiency.

With 6 GPU its 1250ms which gives an average of total about 210ms per GPU, so just a bit faster than 2 GPUs running solo.

In both cases GPUs do their work in about 220ms.

tromp · November 30, 2018, 11:29pm

Done now; see latest commit…

crypto_user · December 1, 2018, 12:45pm

are you john tromp the inventor of cuckoo cycle?

tromp · December 1, 2018, 1:44pm

Yep; I am john tromp the inventor of cuckoo cycle and typer of at least 20 characters…

Kryztoval · December 1, 2018, 2:59pm

This may be too much to ask for, but, is there a call that returns a solution? So I can test if it is working properly.
something like “mean29 ‘tjnfbvjklnlkgbjfb’” has a solution.

Never mind, i saw it in your code repository.

doge · December 2, 2018, 1:03pm

I understand the issue better now - I assumed that loading and unloading huge graphs is what what happening in memory. So a different nonce on the same graph leads to a different distirbution of cycles in the graph ?

hanssv.chain · December 2, 2018, 1:06pm

Yes, a different nonce means a completely different graph and thus a different distribution of cycles.

doge · December 2, 2018, 1:12pm

So what is meant by “GPU context” is some sort of in-GPU-memory data structure that can very efficiently change the direction of the edges in a graph ? Instead of creating a new graph from scratch ? I like it.

tromp · December 2, 2018, 1:17pm

That makes no sense. The nonce (along with rest of header) defines the graph.

aemin · December 2, 2018, 1:18pm

can we add NONCE RANGE (4ex) like
extra_args: “-r 1000000” ??
seems it working without restarts.

tromp · December 2, 2018, 1:20pm

The solver context is just a bunch of allocated GPU memory along with some solver configuration settings. It can be re-used for different graphs.

tromp · December 2, 2018, 1:25pm

But the AE process invoking the solver doesn’t know how to parse its output.
Furthermore, the solver needs to update the header roughly every 3 seconds to produce new micro blocks.

mrbeery · December 4, 2018, 3:31pm

Roughly, how many AE can you expect to mine per 24 h with 1080 Ti?

hanssv.chain · December 4, 2018, 5:29pm

To make a computation example, if we assume you make 3 cuckoo attempts per second.
Then, with the current difficulty, you will on average get about 0.2 blocks per 24 hours.

mrbeery · December 5, 2018, 2:28pm

And each block is 473 AE? (According to http://aeknow.org). If that is correct it seems extremely high.

hanssv.chain · December 5, 2018, 2:31pm

The reward is adjusted according to an inflation curve - it is discussed in Block reward and block time

The highest reward is 473 for a while here in the beginning if I remember correctly.

mrbeery · December 5, 2018, 2:42pm

Thanks for very fast reply! I know there’s a high inflation the first year and that the difficulty will be increasing (hopefully).

How does the 2080 Ti compare, has anyone managed to mine with that card as of now?

2nd_doge · December 5, 2018, 2:49pm

I am getting my 2080 RTX Ti on Monday and will test it out.

Generally speaking, I need to test the new 1.0.1 release. Will check if this fixes the issue with all the hashpower going to waste 8x 1080 GTX and no Blocks mined “successfully” since launch + 2h.

I would hold off on larger investments at the moment.

If you estimate the current Graphs per Second of the network there are huge farms or a huge number of people mining. Quite different than most mainnet launches e.g. Ethereum where miners made thousands in the first days of the network at an exchange rate that practically didn’t exist. Then again - orphaned Blocks are rewarded on Ethereum which is not the case with Aeternity. So it is a winner take-all.