[Bug Report] Multiple issues within `ray/tuner.py`

Hello,

I have noticed multiple issues within the `step()` function of `ray/tuner.py`, some of which prevent me from having an uninterrupted hyperparameter tuning session with ray. Here are the issues with possible workarounds:

1. There is the following loop to idle until the incoming data is updated:
https://github.com/isaac-sim/IsaacLab/blob/7de6d6fef9424c95fc68dc767af67ffbe0da6832/scripts/reinforcement_learning/ray/tuner.py#L115-L117
However, due to the keyword `"done"` we insert into `self.data` at each loop, the `data` and `self.data` can never be equal, even if the underlying data are equal (`"done"` will be absent within `data`).

   I suggest we change this part to something like:

   ```
   data_ = {k: v for k, v in data.items() if k != "done"}
   self_data_ = {k: v for k, v in self.data.items() if k != "done"}
   while util._dicts_equal(data_, self_data_):
      data = util.load_tensorboard_logs(self.tensorboard_logdir)
      data_ = {k: v for k, v in data.items() if k != "done"}
      sleep(2)  # Lazy report metrics to avoid performance overhead
   ```
2. Update to the `self.data["done"]` to mark the run as finished happens currently here: 
https://github.com/isaac-sim/IsaacLab/blob/7de6d6fef9424c95fc68dc767af67ffbe0da6832/scripts/reinforcement_learning/ray/tuner.py#L104-L105
However, from time to time, I notice that the process that executes the training takes a while to return after the end of the training, and we end up inside the following loop (after the fix from bullet 1):
https://github.com/isaac-sim/IsaacLab/blob/7de6d6fef9424c95fc68dc767af67ffbe0da6832/scripts/reinforcement_learning/ray/tuner.py#L115-L117
from which we can never exit (since the data is not updated anymore, and we don't check if the process returned or not). Consequently, the ray stuck there.

   I suggest we change both of the while loops as follows:
   ```
   while data is None:
      data = util.load_tensorboard_logs(self.tensorboard_logdir)
      sleep(2)  # Lazy report metrics to avoid performance overhead
      proc_status = self.proc.poll()
      if proc_status is not None:
         break

   if self.data is not None:
      data_ = {k: v for k, v in data.items() if k != "done"}
      self_data_ = {k: v for k, v in self.data.items() if k != "done"}
      while util._dicts_equal(data_, self_data_):
         data = util.load_tensorboard_logs(self.tensorboard_logdir)
         data_ = {k: v for k, v in data.items() if k != "done"}
         sleep(2)  # Lazy report metrics to avoid performance overhead
         proc_status = self.proc.poll()
         if proc_status is not None:
            break
   ```
3. Finally, while this might not necessarily be an issue directly related to IsaacLab, I noticed that sometimes the process executing the training hangs right after the end of the training forever (maybe at `simulation_app.close()` ?), hence halting all the ray process as we never can mark the run as finished. 

   While it might not be the best solution, I applied the following patch as a workaround, and it seems to work for me:
    ```
    if self.data is not None:
      data_ = {k: v for k, v in data.items() if k != "done"}
      self_data_ = {k: v for k, v in self.data.items() if k != "done"}
      time_start = time.time()
      while util._dicts_equal(data_, self_data_):
         self.data_freeze_duration = time.time() - time_start
         data = util.load_tensorboard_logs(self.tensorboard_logdir)
         data_ = {k: v for k, v in data.items() if k != "done"}
         sleep(2)  # Lazy report metrics to avoid performance overhead
         proc_status = self.proc.poll()
         if proc_status is not None:
            break
         if self.data_freeze_duration > SOME_THRESHOLD:
            self.data_freeze_duration = 0.0
            self.proc.terminate()
            try:
               retcode = self.proc.wait(timeout=20)
            except Exception as e:
               raise ValueError("The frozen process did not terminate within timeout duration.") from e
               self.data = data
               self.data["done"] = True
               return self.data
   ```
### Additional context

I have tested these only on a single GPU (4090 RTX) and with the rsl_rl library.

### System Info

Commit: https://github.com/isaac-sim/IsaacLab/commit/bc7c9f5c7c2a9f6fd6c69a6bdbfaace19b4204be
Isaac Sim Version: 4.5
OS: Ubuntu 22.04
GPU: 4090 RTX
CUDA: 12.2
GPU Driver: `535.129.03`

### Checklist

- [x] I have checked that there is no similar issue in the repo (**required**)
- [x] I have checked that the issue is not in running Isaac Sim itself and is related to the repo

### Acceptance Criteria

- [ ] Ray runs without interruptions or cover from interruptions


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug Report] Multiple issues within `ray/tuner.py` #2328

Additional context

System Info

Checklist

Acceptance Criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	while util._dicts_equal(data, self.data):
	data = util.load_tensorboard_logs(self.tensorboard_logdir)
	sleep(2) # Lazy report metrics to avoid performance overhead

	if proc_status is not None: # process finished, signal finish
	self.data["done"] = True

[Bug Report] Multiple issues within ray/tuner.py #2328

Description

Additional context

System Info

Checklist

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Bug Report] Multiple issues within `ray/tuner.py` #2328