Deeprec hangs in distributed mode.

# Current behavior
 In distributed mode, deeprec works fine when training on one hour of data, but hangs when training on one day or more. Log：
![6ca9fe77321c27383b3b3de9bb8fc5d5](https://user-images.githubusercontent.com/35439432/229059537-ea1626df-2411-46bf-acb8-fb61fada092d.png)
Nvidia-smi:
![a3ee237e24abfd35d1c087126b6331f8](https://user-images.githubusercontent.com/35439432/229059658-5b425fcf-f027-4f71-9d2c-908ffca14bf5.png)
cpu:
![071c9938c994a484295fdc3ef25b483d](https://user-images.githubusercontent.com/35439432/229063284-2d8b5341-20b7-407a-9f5d-de15b5922efc.png)


# Expected behavior
 Deeprec works fine in distributed mode. Log:
![315532d0f8197d279e990d49332c85b3](https://user-images.githubusercontent.com/35439432/229060024-c09fd4b4-987a-49cb-8153-dd4e0a1d16c6.png)

# System information
- GPU model and memory:  Two GPU devices： Tesla T4 . Memory: 15109MiB
- OS Platform: x86_64 x86_64 x86_64 GNU/Linux
- Docker version: Docker version 20.10.8, build 3967b7d
- GCC/CUDA/cuDNN version:  CUDA 11.4 /cuDnn 8
- Python/conda version: python3.6
- TensorFlow/PyTorch version: DeepRec deeprec2302,  HybridBackend a832b4e1da3f60ddaf3d7f358cf09b370568ff34
# Code to reproduce

```python
    sess_config = tf.ConfigProto(
        # If the device you specify doesn't exist, allow TF to assign the device automatically
        allow_soft_placement=True,
        log_device_placement=False,  # Whether to print the device assignment log
    )
    sess_config.gpu_options.force_gpu_compatible = True
    sess_config.gpu_options.allow_growth = True

    with tf.train.MonitoredTrainingSession(master="", checkpoint_dir=self.__ckpt_dir, config=sess_config):
```
# Willing to contribute

Yes



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deeprec hangs in distributed mode. #125

Current behavior

Expected behavior

System information

Code to reproduce

Willing to contribute

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Deeprec hangs in distributed mode. #125

Description

Current behavior

Expected behavior

System information

Code to reproduce

Willing to contribute

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions