Skip to content

A mistake occurs When I try to train this nework? #3

@datar001

Description

@datar001

I try to train this network according the command "python ppo_single_large_hiar.py train"。
But It makes a mistake as followings:

Failure # 1 (occurred at 2022-12-29_17-59-27)
Traceback (most recent call last):
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/tune/execution/ray_trial_executor.py", line 989, in get_next_executor_event
future_result = ray.get(ready_future)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/_private/worker.py", line 2277, in get
raise value
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, �[36mray::PPO.init()�[39m (pid=1210644, ip=192.168.124.36, repr=PPO)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/algorithms/algorithm.py", line 308, in init
super().init(config=config, logger_creator=logger_creator, **kwargs)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/tune/trainable/trainable.py", line 157, in init
self.setup(copy.deepcopy(self.config))
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/algorithms/algorithm.py", line 418, in setup
self.workers = WorkerSet(
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/evaluation/worker_set.py", line 171, in init
self._local_worker = self._make_worker(
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/evaluation/worker_set.py", line 661, in _make_worker
worker = cls(
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 613, in init
self._build_policy_map(
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1784, in _build_policy_map
self.policy_map.create_policy(
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/policy/policy_map.py", line 123, in create_policy
self[policy_id] = create_policy_for_framework(
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/utils/policy.py", line 80, in create_policy_for_framework
return policy_class(observation_space, action_space, merged_config)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/algorithms/ppo/ppo_torch_policy.py", line 66, in init
self._initialize_loss_from_dummy_batch()
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/policy/policy.py", line 1050, in _initialize_loss_from_dummy_batch
actions, state_outs, extra_outs = self.compute_actions_from_input_dict(
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/policy/torch_policy_v2.py", line 483, in compute_actions_from_input_dict
return self._compute_action_helper(
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/utils/threading.py", line 24, in wrapper
return func(self, *a, **k)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/policy/torch_policy_v2.py", line 1016, in _compute_action_helper
dist_inputs, state_out = self.model(input_dict, state_batches, seq_lens)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/models/modelv2.py", line 259, in call
res = self.forward(restored, state or [], seq_lens)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/models/torch/complex_input_net.py", line 207, in forward
nn_out, _ = self.flatten[i](
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/models/modelv2.py", line 259, in call
res = self.forward(restored, state or [], seq_lens)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/models/torch/fcnet.py", line 146, in forward
self._features = self._hidden_layers(self._last_flat_in)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/torch/nn/modules/container.py", line 100, in forward
input = module(input)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/models/torch/misc.py", line 169, in forward
return self._model(x)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/torch/nn/modules/container.py", line 100, in forward
input = module(input)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 87, in forward
return F.linear(input, self.weight, self.bias)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/torch/nn/functional.py", line 1610, in linear
ret = torch.addmm(bias, input, weight.t())
RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_addmm

It seems to be a cuda mistake. But I test the cuda environment in the python console and it's ok

image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions