Skip to content

Commit

Permalink
[hotfix] Fix GPU reg bug. bad indent (#883)
Browse files Browse the repository at this point in the history
* added cuda solver

* boost versions to fix pip error

* allow choosing device id

* fix solution check to use keccak

* adds params for cuda and dev_id to register

* list devices by name during selection

* add block number logging

* fix calculation of hashrate

* fix update interval default

* add --TPB arg to register

* add update_interval flag

* switch back to old looping/work structure

* change typing

* device count is a function

* stop early if wallet registered

* add update interval and num proc flag

* add better number output

* optimize multiproc cpu reg
keeping proc until solution

* fix test

* change import to cubit

* fix import and default

* up default
should have default in CLI call

* add comments about params

* fix config var access

* add cubit as extra

* handle stale pow differently
check registration after failure

* [feature] cpu register faster (#854)

* add update interval and num proc flag

* add better number output

* optimize multiproc cpu reg
keeping proc until solution

* fix test

* make sure to exit properly if registered during

* fix tests

* change import to use tests

* add optional type hints and None default

* change to count using allowed processes

* add documentation. Fix random start

* restrict number of processes for integration test

* fix stale check

* use wallet.is_registered instead

* attempt to fix test issue

* fix my test

* oops typo

* typo again ugh

* remove print out

* fix partly reg test

* fix if solution None

* fix test?

* fix patch

* [hotfix] fix flags for multiproc register limit (#876)

* add dot get

* add to subtensor args and defaults

* remove dot get because in subtensor args

* typo

* fix test

* add args for cuda to subtensor

* add cuda args to reregister call

* add to wallet register the cuda args

* fix refs and tests

* add for val test also

* fix tests with rereg

* Fix/diff unpack bit shift (#878)

* fix incorrect bit shift

* move inner function out and add test for diff pack

* fix test

* fix call arg check in test

* add assert

* fix test for py37

* refactor the diff pack into two functions
move the test to a unit test

* fix test

* fix patch for tests

* add mock_register to subtensor passed instead

* move register under the check for isregistered

* use patch obj instead

* fit patch object

* [Feature] [cubit] CUDA registration solver (#868)

* added cuda solver

* boost versions to fix pip error

* allow choosing device id

* fix solution check to use keccak

* adds params for cuda and dev_id to register

* list devices by name during selection

* add block number logging

* fix calculation of hashrate

* fix update interval default

* add --TPB arg to register

* add update_interval flag

* switch back to old looping/work structure

* change typing

* device count is a function

* stop early if wallet registered

* add update interval and num proc flag

* add better number output

* optimize multiproc cpu reg
keeping proc until solution

* fix test

* change import to cubit

* fix import and default

* up default
should have default in CLI call

* add comments about params

* fix config var access

* add cubit as extra

* handle stale pow differently
check registration after failure

* restrict number of processes for integration test

* fix stale check

* use wallet.is_registered instead

* attempt to fix test issue

* fix my test

* oops typo

* typo again ugh

* remove print out

* fix partly reg test

* fix if solution None

* fix test?

* fix patch

* add args for cuda to subtensor

* add cuda args to reregister call

* add to wallet register the cuda args

* fix refs and tests

* add for val test also

* fix tests with rereg

* fix patch for tests

* add mock_register to subtensor passed instead

* move register under the check for isregistered

* use patch obj instead

* fit patch object

* Fix/move overview args to cli (#867)

* move cli args to CLI and fix overview

* use dot get

* fix tests

* add hotkeys/all_hotkeys to (un)stake

* fix default

* fix default in unstake

* add unstake multiple

* add add stake multiple

* move all/hotkeys back to wallet args

* convert to balance first
add catch for unstake multi

* fix ref to wallet

* fix test patch for multi hotkeys

* try to fix tests

* fix tests patch

* fix mock wallet length

* don't use new?

* fix call args get

* typo

* fix typo

* fix prompt

* remove unneeded if

* modify POW submit to use rolling submit again

* add backoff to block get from network

* add test for backoff get block

* suppress the dev id flag if not set

* remove dest so it uses first arg

* fix pow submit loop

* move registration status with

* fix max attempts check

* remove status in subtensor.register

* add submit status

* change to neuron get instead

* fix count

* try to patch live display

* fix patch

* .

* separate test cases

* add POWNotStale and tests

* add more test cases for block get with retry

* fix return to None

* fix arg order

* fix indent

* add test to verify solution is submitted

* fix mock call

* patch hex bytes instead

* typo :/

* fix print out for unstake

* fix indexing into mock call

* call indexing

* access dict not with dot

* fix other indent

Co-authored-by: Eugene <etesting007@gmail.com>
  • Loading branch information
camfairchild and Eugene-hu committed Aug 17, 2022
1 parent b54523e commit b171dd9
Show file tree
Hide file tree
Showing 3 changed files with 119 additions and 66 deletions.
2 changes: 1 addition & 1 deletion bittensor/_cli/cli_impl.py
Original file line number Diff line number Diff line change
Expand Up @@ -309,7 +309,7 @@ def unstake( self ):
if not self.config.no_prompt:
if not Confirm.ask("Do you want to unstake from the following keys:\n" + \
"".join([
f" [bold white]- {wallet.hotkey_str}: {amount.tao}𝜏[/bold white]\n" for wallet, amount in zip(final_wallets, final_amounts)
f" [bold white]- {wallet.hotkey_str}: {amount}𝜏[/bold white]\n" for wallet, amount in zip(final_wallets, final_amounts)
])
):
return None
Expand Down
131 changes: 66 additions & 65 deletions bittensor/_subtensor/subtensor_impl.py
Original file line number Diff line number Diff line change
Expand Up @@ -501,73 +501,74 @@ def register (
else:
pow_result = bittensor.utils.create_pow( self, wallet, num_processes=num_processes, update_interval=update_interval)

# pow failed
if not pow_result:
# might be registered already
if (wallet.is_registered( self )):
bittensor.__console__.print(":white_heavy_check_mark: [green]Registered[/green]")
return True

# pow successful, proceed to submit pow to chain for registration
else:
with bittensor.__console__.status(":satellite: Submitting POW..."):
# check if pow result is still valid
while bittensor.utils.POWNotStale(self, pow_result):
with self.substrate as substrate:
# create extrinsic call
call = substrate.compose_call(
call_module='SubtensorModule',
call_function='register',
call_params={
'block_number': pow_result['block_number'],
'nonce': pow_result['nonce'],
'work': bittensor.utils.hex_bytes_to_u8_list( pow_result['work'] ),
'hotkey': wallet.hotkey.ss58_address,
'coldkey': wallet.coldkeypub.ss58_address
}
)
extrinsic = substrate.create_signed_extrinsic( call = call, keypair = wallet.hotkey )
response = substrate.submit_extrinsic( extrinsic, wait_for_inclusion=wait_for_inclusion, wait_for_finalization=wait_for_finalization )

# We only wait here if we expect finalization.
if not wait_for_finalization and not wait_for_inclusion:
bittensor.__console__.print(":white_heavy_check_mark: [green]Sent[/green]")
# pow failed
if not pow_result:
# might be registered already
if (wallet.is_registered( self )):
bittensor.__console__.print(":white_heavy_check_mark: [green]Registered[/green]")
return True

# pow successful, proceed to submit pow to chain for registration
else:
with bittensor.__console__.status(":satellite: Submitting POW..."):
# check if pow result is still valid
while bittensor.utils.POWNotStale(self, pow_result):
with self.substrate as substrate:
# create extrinsic call
call = substrate.compose_call(
call_module='SubtensorModule',
call_function='register',
call_params={
'block_number': pow_result['block_number'],
'nonce': pow_result['nonce'],
'work': bittensor.utils.hex_bytes_to_u8_list( pow_result['work'] ),
'hotkey': wallet.hotkey.ss58_address,
'coldkey': wallet.coldkeypub.ss58_address
}
)
extrinsic = substrate.create_signed_extrinsic( call = call, keypair = wallet.hotkey )
response = substrate.submit_extrinsic( extrinsic, wait_for_inclusion=wait_for_inclusion, wait_for_finalization=wait_for_finalization )

# We only wait here if we expect finalization.
if not wait_for_finalization and not wait_for_inclusion:
bittensor.__console__.print(":white_heavy_check_mark: [green]Sent[/green]")
return True

# process if registration successful, try again if pow is still valid
response.process_events()
if not response.is_success:
if 'key is already registered' in response.error_message:
# Error meant that the key is already registered.
bittensor.__console__.print(":white_heavy_check_mark: [green]Already Registered[/green]")
return True

bittensor.__console__.print(":cross_mark: [red]Failed[/red]: error:{}".format(response.error_message))
time.sleep(0.5)

# Successful registration, final check for neuron and pubkey
else:
bittensor.__console__.print(":satellite: Checking Balance...")
neuron = self.neuron_for_pubkey( wallet.hotkey.ss58_address )
if not neuron.is_null:
bittensor.__console__.print(":white_heavy_check_mark: [green]Registered[/green]")
return True

# process if registration successful, try again if pow is still valid
response.process_events()
if not response.is_success:
if 'key is already registered' in response.error_message:
# Error meant that the key is already registered.
bittensor.__console__.print(":white_heavy_check_mark: [green]Already Registered[/green]")
return True

bittensor.__console__.print(":cross_mark: [red]Failed[/red]: error:{}".format(response.error_message))
time.sleep(0.5)

# Successful registration, final check for neuron and pubkey
else:
bittensor.__console__.print(":satellite: Checking Balance...")
neuron = self.neuron_for_pubkey( wallet.hotkey.ss58_address )
if not neuron.is_null:
bittensor.__console__.print(":white_heavy_check_mark: [green]Registered[/green]")
return True
else:
# neuron not found, try again
bittensor.__console__.print(":cross_mark: [red]Unknown error. Neuron not found.[/red]")
continue
else:
# Exited loop because pow is no longer valid.
bittensor.__console__.print( "[red]POW is stale.[/red]" )
return False
if attempts < max_allowed_attempts:
#Failed registration, retry pow
attempts += 1
bittensor.__console__.print( ":satellite: Failed registration, retrying pow ...({}/{})".format(attempts, max_allowed_attempts))
else:
# Failed to register after max attempts.
bittensor.__console__.print( "[red]No more attempts.[/red]" )
return False
# neuron not found, try again
bittensor.__console__.print(":cross_mark: [red]Unknown error. Neuron not found.[/red]")
continue
else:
# Exited loop because pow is no longer valid.
bittensor.__console__.print( "[red]POW is stale.[/red]" )
return False

if attempts < max_allowed_attempts:
#Failed registration, retry pow
attempts += 1
bittensor.__console__.print( ":satellite: Failed registration, retrying pow ...({}/{})".format(attempts, max_allowed_attempts))
else:
# Failed to register after max attempts.
bittensor.__console__.print( "[red]No more attempts.[/red]" )
return False

def serve (
self,
Expand Down
52 changes: 52 additions & 0 deletions tests/unit_tests/bittensor_tests/utils/test_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
import random
import torch
import multiprocessing
from types import SimpleNamespace

from sys import platform
from substrateinterface.base import Keypair
Expand Down Expand Up @@ -346,6 +347,57 @@ def test_pow_not_stale_diff_block_number_too_old(self):

assert not bittensor.utils.POWNotStale(mock_subtensor, mock_solution)

def test_pow_called_for_cuda():
class MockException(Exception):
pass
mock_compose_call = MagicMock(side_effect=MockException)

mock_subtensor = bittensor.subtensor(_mock=True)
mock_subtensor.neuron_for_pubkey=MagicMock(is_null=True)
mock_subtensor.substrate = MagicMock(
__enter__= MagicMock(return_value=MagicMock(
compose_call=mock_compose_call
)),
__exit__ = MagicMock(return_value=None),
)

mock_wallet = SimpleNamespace(
hotkey=SimpleNamespace(
ss58_address=''
),
coldkeypub=SimpleNamespace(
ss58_address=''
)
)

mock_result = {
"block_number": 1,
'nonce': random.randint(0, pow(2, 32)),
'work': b'\x00' * 64,
}

with patch('bittensor.utils.POWNotStale', return_value=True) as mock_pow_not_stale:
with patch('torch.cuda.is_available', return_value=True) as mock_cuda_available:
with patch('bittensor.utils.create_pow', return_value=mock_result) as mock_create_pow:
with patch('bittensor.utils.hex_bytes_to_u8_list', return_value=b''):

# Should exit early
with pytest.raises(MockException):
mock_subtensor.register(mock_wallet, cuda=True, prompt=False)

mock_pow_not_stale.assert_called_once()
mock_create_pow.assert_called_once()
mock_cuda_available.assert_called_once()

call0 = mock_pow_not_stale.call_args
assert call0[0][0] == mock_subtensor
assert call0[0][1] == mock_result

mock_compose_call.assert_called_once()
call1 = mock_compose_call.call_args
assert call1[1]['call_function'] == 'register'
call_params = call1[1]['call_params']
assert call_params['nonce'] == mock_result['nonce']

if __name__ == "__main__":
test_solve_for_difficulty_fast_registered_already()

0 comments on commit b171dd9

Please sign in to comment.