Skip to content

[Jobs] Add huggingface-cli jobs commands #3211

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 42 commits into from
Jul 23, 2025
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
4836c04
jobs
lhoestq Jul 10, 2025
682a789
style
lhoestq Jul 10, 2025
af05c27
docs
lhoestq Jul 10, 2025
3895c8e
mypy
lhoestq Jul 10, 2025
3661cb7
style
lhoestq Jul 10, 2025
13f17c8
minor
lhoestq Jul 10, 2025
5e99d64
remove hfjobs mentions
lhoestq Jul 10, 2025
7efe998
add huggingface-cli jobs uv commands
lhoestq Jul 11, 2025
ab8511e
add some uv options
lhoestq Jul 11, 2025
3c00292
add test
lhoestq Jul 11, 2025
3136ef4
fix for 3.8
lhoestq Jul 11, 2025
9fc3c78
Update src/huggingface_hub/commands/jobs/uv.py
davanstrien Jul 14, 2025
fd926b5
move to HfApi
lhoestq Jul 16, 2025
1bf5f66
minor
lhoestq Jul 16, 2025
aefb493
more comments
lhoestq Jul 16, 2025
31a3d97
uv run local_script.py
lhoestq Jul 17, 2025
f7c8be9
lucain's comments
lhoestq Jul 17, 2025
541aa6a
more lucain's comments
lhoestq Jul 17, 2025
251e719
Apply suggestions from code review
lhoestq Jul 21, 2025
97a856b
style
lhoestq Jul 21, 2025
1102968
minor
lhoestq Jul 21, 2025
99b538a
Remove JobUrl and add url in JobInfo directly
Wauplin Jul 22, 2025
53fb0aa
Apply suggestions from code review
lhoestq Jul 22, 2025
4e3523d
add namespace arg
lhoestq Jul 22, 2025
5db3b42
fix wrong job url
lhoestq Jul 22, 2025
76588ef
add missing methods at top level
lhoestq Jul 22, 2025
63dd90f
add docs
lhoestq Jul 22, 2025
bfd326a
uv script url as env, not secret
lhoestq Jul 22, 2025
c9ab2f1
rename docs
lhoestq Jul 22, 2025
cf59dca
update test
lhoestq Jul 22, 2025
da1d40d
again
lhoestq Jul 22, 2025
334d831
improve docs
lhoestq Jul 22, 2025
028d32a
Merge branch 'main' into jobs
Wauplin Jul 23, 2025
fed7195
add image arg to run_uv_job
lhoestq Jul 23, 2025
eaaa6a1
List flavors from SpaceHardware
Wauplin Jul 23, 2025
c7660d7
Merge branch 'jobs' of github.com:huggingface/huggingface_hub into jobs
Wauplin Jul 23, 2025
af0e9fb
add to overview
lhoestq Jul 23, 2025
e6043ae
remove zero GPU from flavors
Wauplin Jul 23, 2025
c444391
add JobInfo etc. from _jobs_api in top level __init__
lhoestq Jul 23, 2025
ea6579a
add package_reference doc page
lhoestq Jul 23, 2025
3f6a2f7
minor - link JobInfo in docs
lhoestq Jul 23, 2025
3e049db
JobInfo docstring
lhoestq Jul 23, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
121 changes: 121 additions & 0 deletions docs/source/en/guides/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -604,3 +604,124 @@ Copy-and-paste the text below in your GitHub issue.
- HF_HUB_ETAG_TIMEOUT: 10
- HF_HUB_DOWNLOAD_TIMEOUT: 10
```

## huggingface-cli jobs

Run compute jobs on Hugging Face infrastructure with a familiar Docker-like interface.

`huggingface-cli jobs` is a command-line tool that lets you run anything on Hugging Face's infrastructure (including GPUs and TPUs!) with simple commands. Think `docker run`, but for running code on A100s.

```bash
# Directly run Python code
>>> huggingface-cli jobs run python:3.12 python -c "print('Hello from the cloud!')"

# Use GPUs without any setup
>>> huggingface-cli jobs run --flavor a10g-small pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel \
... python -c "import torch; print(torch.cuda.get_device_name())"

# Run from Hugging Face Spaces
>>> huggingface-cli jobs run hf.co/spaces/lhoestq/duckdb duckdb -c "select 'hello world'"
```

### ✨ Key Features

- 🐳 **Docker-like CLI**: Familiar commands (`run`, `ps`, `logs`, `inspect`) to run and manage jobs
- 🔥 **Any Hardware**: From CPUs to A100 GPUs and TPU pods - switch with a simple flag
- 📦 **Run Anything**: Use Docker images, HF Spaces, or your custom containers
- 🔐 **Simple Auth**: Just use your HF token
- 📊 **Live Monitoring**: Stream logs in real-time, just like running locally
- 💰 **Pay-as-you-go**: Only pay for the seconds you use

### Prerequisites

- A Hugging Face account (currently in testing for HF staff)
- Authenticate with the Hugging Gace Hub (e.g. `huggingface-cli login`)


### Quick Start

#### 1. Run your first job

```bash
# Run a simple Python script
>>> huggingface-cli jobs run python:3.12 python -c "print('Hello from HF compute!')"
```

This command runs the job and shows the logs. You can pass `--detach` to run the Job in the background and only print the Job ID.

#### 2. Check job status

```bash
# List your running jobs
>>> huggingface-cli jobs ps

# Inspect the status of a job
>>> huggingface-cli jobs inspect <job_id>

# View logs from a job
>>> huggingface-cli jobs logs <job_id>

# Cancel a job
>>> huggingface-cli jobs cancel <job_id>
```

#### 3. Run on GPU

You can also run jobs on GPUs or TPUs with the `--flavor` option. For example, to run a PyTorch job on an A10G GPU:

```bash
# Use an A10G GPU to check PyTorch CUDA
>>> huggingface-cli jobs run --flavor a10g-small pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel \
... python -c "import torch; print(f"This code ran with the following GPU: {torch.cuda.get_device_name()}")"
```

Running this will show the following output!

```bash
This code ran with the following GPU: NVIDIA A10G
```

That's it! You're now running code on Hugging Face's infrastructure. For more detailed information checkout the [Quickstart Guide](docs/quickstart.md).

### Common Use Cases

- **Model Training**: Fine-tune or train models on GPUs (T4, A10G, A100) without managing infrastructure
- **Synthetic Data Generation**: Generate large-scale datasets using LLMs on powerful hardware
- **Data Processing**: Process massive datasets with high-CPU configurations for parallel workloads
- **Batch Inference**: Run offline inference on thousands of samples using optimized GPU setups
- **Experiments & Benchmarks**: Run ML experiments on consistent hardware for reproducible results
- **Development & Debugging**: Test GPU code without local CUDA setup

### Pass Environment variables and Secrets

You can pass environment variables to your job using

```bash
# Pass environment variables
>>> huggingface-cli jobs run -e FOO=foo -e BAR=bar python:3.12 python -c "import os; print(os.environ['FOO'], os.environ['BAR'])"
```

```bash
# Pass an environment from a local .env file
>>> huggingface-cli jobs run --env-file .env python:3.12 python -c "import os; print(os.environ['FOO'], os.environ['BAR'])"
```

```bash
# Pass secrets - they will be encrypted server side
>>> huggingface-cli jobs run -s MY_SECRET=psswrd python:3.12 python -c "import os; print(os.environ['MY_SECRET'])"
```

```bash
# Pass secrets from a local .secrets.env file - they will be encrypted server side
>>> huggingface-cli jobs run --secret-env-file .secrets.env python:3.12 python -c "import os; print(os.environ['MY_SECRET'])"
```

### Hardware

Available `--flavor` options:

- CPU: `cpu-basic`, `cpu-upgrade`
- GPU: `t4-small`, `t4-medium`, `l4x1`, `l4x4`, `a10g-small`, `a10g-large`, `a10g-largex2`, `a10g-largex4`,`a100-large`
- TPU: `v5e-1x1`, `v5e-2x2`, `v5e-2x4`

(updated in 03/25 from Hugging Face [suggested_hardware docs](https://huggingface.co/docs/hub/en/spaces-config-reference))
1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ def get_version() -> str:
"requests",
"tqdm>=4.42.1",
"typing-extensions>=3.7.4.3", # to be able to import TypeAlias
"dotenv",
]

extras = {}
Expand Down
2 changes: 2 additions & 0 deletions src/huggingface_hub/commands/huggingface_cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
from huggingface_hub.commands.delete_cache import DeleteCacheCommand
from huggingface_hub.commands.download import DownloadCommand
from huggingface_hub.commands.env import EnvironmentCommand
from huggingface_hub.commands.jobs import JobsCommands
from huggingface_hub.commands.lfs import LfsCommands
from huggingface_hub.commands.repo import RepoCommands
from huggingface_hub.commands.repo_files import RepoFilesCommand
Expand Down Expand Up @@ -44,6 +45,7 @@ def main():
DeleteCacheCommand.register_subcommand(commands_parser)
TagCommands.register_subcommand(commands_parser)
VersionCommand.register_subcommand(commands_parser)
JobsCommands.register_subcommand(commands_parser)

# Experimental
UploadLargeFolderCommand.register_subcommand(commands_parser)
Expand Down
46 changes: 46 additions & 0 deletions src/huggingface_hub/commands/jobs/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Copyright 2025 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Contains commands to interact with jobs on the Hugging Face Hub.

Usage:
# run a job
huggingface-cli jobs run image command
"""

from argparse import _SubParsersAction

from huggingface_hub.commands import BaseHuggingfaceCLICommand
from huggingface_hub.commands.jobs.cancel import CancelCommand
from huggingface_hub.commands.jobs.inspect import InspectCommand
from huggingface_hub.commands.jobs.logs import LogsCommand
from huggingface_hub.commands.jobs.ps import PsCommand
from huggingface_hub.commands.jobs.run import RunCommand
from huggingface_hub.utils import logging


logger = logging.get_logger(__name__)


class JobsCommands(BaseHuggingfaceCLICommand):
@staticmethod
def register_subcommand(parser: _SubParsersAction):
jobs_parser = parser.add_parser("jobs", help="Commands to interact with your huggingface.co jobs.")
jobs_subparsers = jobs_parser.add_subparsers(help="huggingface.co jobs related commands")

# Register commands
InspectCommand.register_subcommand(jobs_subparsers)
LogsCommand.register_subcommand(jobs_subparsers)
PsCommand.register_subcommand(jobs_subparsers)
RunCommand.register_subcommand(jobs_subparsers)
CancelCommand.register_subcommand(jobs_subparsers)
29 changes: 29 additions & 0 deletions src/huggingface_hub/commands/jobs/_cli_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
import os
from typing import Union


def tabulate(rows: list[list[Union[str, int]]], headers: list[str]) -> str:
"""
Inspired by:

- stackoverflow.com/a/8356620/593036
- stackoverflow.com/questions/9535954/printing-lists-as-tabular-data
"""
col_widths = [max(len(str(x)) for x in col) for col in zip(*rows, headers)]
terminal_width = max(os.get_terminal_size().columns, len(headers) * 12)
while len(headers) + sum(col_widths) > terminal_width:
col_to_minimize = col_widths.index(max(col_widths))
col_widths[col_to_minimize] //= 2
if len(headers) + sum(col_widths) <= terminal_width:
col_widths[col_to_minimize] = terminal_width - sum(col_widths) - len(headers) + col_widths[col_to_minimize]
row_format = ("{{:{}}} " * len(headers)).format(*col_widths)
lines = []
lines.append(row_format.format(*headers))
lines.append(row_format.format(*["-" * w for w in col_widths]))
for row in rows:
row_format_args = [
str(x)[: col_width - 3] + "..." if len(str(x)) > col_width else str(x)
for x, col_width in zip(row, col_widths)
]
lines.append(row_format.format(*row_format_args))
return "\n".join(lines)
32 changes: 32 additions & 0 deletions src/huggingface_hub/commands/jobs/cancel.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
from argparse import Namespace, _SubParsersAction
from typing import Optional

import requests

from huggingface_hub import whoami
from huggingface_hub.utils import build_hf_headers

from .. import BaseHuggingfaceCLICommand


class CancelCommand(BaseHuggingfaceCLICommand):
@staticmethod
def register_subcommand(parser: _SubParsersAction) -> None:
run_parser = parser.add_parser("jobs cancel", help="Cancel a Job")
run_parser.add_argument("job_id", type=str, help="Job ID")
run_parser.add_argument(
"--token", type=str, help="A User Access Token generated from https://huggingface.co/settings/tokens"
)
run_parser.set_defaults(func=CancelCommand)

def __init__(self, args: Namespace) -> None:
self.job_id: str = args.job_id
self.token: Optional[str] = args.token or None

def run(self) -> None:
username = whoami(self.token)["name"]
headers = build_hf_headers(token=self.token)
requests.post(
f"https://huggingface.co/api/jobs/{username}/{self.job_id}/cancel",
headers=headers,
).raise_for_status()
37 changes: 37 additions & 0 deletions src/huggingface_hub/commands/jobs/inspect.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
import json
from argparse import Namespace, _SubParsersAction
from typing import Optional

import requests

from huggingface_hub import whoami
from huggingface_hub.utils import build_hf_headers

from .. import BaseHuggingfaceCLICommand


class InspectCommand(BaseHuggingfaceCLICommand):
@staticmethod
def register_subcommand(parser: _SubParsersAction) -> None:
run_parser = parser.add_parser("inspect", help="Display detailed information on one or more Jobs")
run_parser.add_argument(
"--token", type=str, help="A User Access Token generated from https://huggingface.co/settings/tokens"
)
run_parser.add_argument("jobs", nargs="...", help="The jobs to inspect")
run_parser.set_defaults(func=InspectCommand)

def __init__(self, args: Namespace) -> None:
self.token: Optional[str] = args.token or None
self.jobs: list[str] = args.jobs

def run(self) -> None:
username = whoami(self.token)["name"]
headers = build_hf_headers(token=self.token)
inspections = [
requests.get(
f"https://huggingface.co/api/jobs/{username}/{job}",
headers=headers,
).json()
for job in self.jobs
]
print(json.dumps(inspections, indent=4))
84 changes: 84 additions & 0 deletions src/huggingface_hub/commands/jobs/logs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
import json
import time
from argparse import Namespace, _SubParsersAction
from typing import Optional

import requests

from huggingface_hub import whoami
from huggingface_hub.utils import build_hf_headers

from .. import BaseHuggingfaceCLICommand


class LogsCommand(BaseHuggingfaceCLICommand):
@staticmethod
def register_subcommand(parser: _SubParsersAction) -> None:
run_parser = parser.add_parser("logs", help="Fetch the logs of a Job")
run_parser.add_argument("job_id", type=str, help="Job ID")
run_parser.add_argument("-t", "--timestamps", action="store_true", help="Show timestamps")
run_parser.add_argument(
"--token", type=str, help="A User Access Token generated from https://huggingface.co/settings/tokens"
)
run_parser.set_defaults(func=LogsCommand)

def __init__(self, args: Namespace) -> None:
self.job_id: str = args.job_id
self.timestamps: bool = args.timestamps
self.token: Optional[str] = args.token or None

def run(self) -> None:
username = whoami(self.token)["name"]
headers = build_hf_headers(token=self.token)
requests.get(
f"https://huggingface.co/api/jobs/{username}/{self.job_id}",
headers=headers,
).raise_for_status()

logging_started = False
logging_finished = False
job_finished = False
# - We need to retry because sometimes the /logs doesn't return logs when the job just started.
# (for example it can return only two lines: one for "Job started" and one empty line)
# - Timeouts can happen in case of build errors
# - ChunkedEncodingError can happen in case of stopped logging in the middle of streaming
# - Infinite empty log stream can happen in case of build error
# (the logs stream is infinite and empty except for the Job started message)
# - there is a ": keep-alive" every 30 seconds
while True:
try:
resp = requests.get(
f"https://huggingface.co/api/jobs/{username}/{self.job_id}/logs",
headers=headers,
stream=True,
timeout=120,
)
log = None
for line in resp.iter_lines(chunk_size=1):
line = line.decode("utf-8")
if line and line.startswith("data: {"):
data = json.loads(line[len("data: ") :])
# timestamp = data["timestamp"]
if not data["data"].startswith("===== Job started"):
logging_started = True
log = data["data"]
print(log)
logging_finished = logging_started
except requests.exceptions.ChunkedEncodingError:
# Response ended prematurely
break
except KeyboardInterrupt:
break
except requests.exceptions.ConnectionError as err:
is_timeout = err.__context__ and isinstance(err.__context__.__cause__, TimeoutError)
if logging_started or not is_timeout:
raise
if logging_finished or job_finished:
break
job_status = requests.get(
f"https://huggingface.co/api/jobs/{username}/{self.job_id}",
headers=headers,
).json()
if "status" in job_status and job_status["status"]["stage"] not in ("RUNNING", "UPDATING"):
job_finished = True
time.sleep(1)
Loading
Loading