You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Even the job on google ai platforms run properly the job does not finish automatically. Also or because of that the job_collect functionality does not copy any files into local directory (runs)... when I cancel the job manually on google ai platform I see the the new job folder of the corresponding job.
So... why the hack the job runs for ever on google ai platform?!
I think the download functionality does not work properly. I also do not have a local runs directory created as it does in the mnist_mlp.R script. I think job_collect is the problem
I had the same problem. The problem is with the below chunk in path-to-library/cloudml/cloudml/cloudml/deploy.py
# Stream output from subprocess to console.
for line in iter(process.stdout.readline, ""):
sys.stdout.write(line.decode('utf-8'))
Once the execution is completed, this does not does not halt and hence enters a continuous loop.
Resolution : comment out the above chunk from deploy.py and it will give you a successful execution. Downside : you won't be able to see step-by-step installation progress and hence won't get a hint from logs if there is an error in the script. But below chunk will ensure the check on successful execution. If there is an error in the script, it will keep on running endlessly.
# Finalize the process.
stdout, stderr = process.communicate()
# Detect a non-zero exit code.
if process.returncode != 0:
fmt = "Command %s failed: exit code %s"
print(fmt % (commands, process.returncode))
else:
print("Command %s ran successfully." % (commands, ))
Note : Novice in python and cloud environment. Take my comments with pinch of a salt. :-)
i have a problem by applying mnist_mlp.R (https://github.com/rstudio/keras/blob/master/vignettes/examples/mnist_mlp.R) using cloudml_train on google cloud platform.
Even the job on google ai platforms run properly the job does not finish automatically. Also or because of that the job_collect functionality does not copy any files into local directory (runs)... when I cancel the job manually on google ai platform I see the the new job folder of the corresponding job.
So... why the hack the job runs for ever on google ai platform?!
I think the download functionality does not work properly. I also do not have a local runs directory created as it does in the mnist_mlp.R script. I think job_collect is the problem
cloudml::job_collect('Project Name', destination = '../runs', view = 'save')
does not copy anything in the destination folder
Any Idea what we can do?
R commands:
library(cloudml)
cloudml_train("mnist_mlp.R", config = "config.yml")
config.yml:
trainingInput:
scaleTier: BASIC
runtimeVersion: "2.1"
pythonVersion: "3.7"
The text was updated successfully, but these errors were encountered: