Skip to content

[Jobs] Add huggingface-cli jobs commands #3211

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 32 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
146 changes: 146 additions & 0 deletions docs/source/en/guides/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -604,3 +604,149 @@ Copy-and-paste the text below in your GitHub issue.
- HF_HUB_ETAG_TIMEOUT: 10
- HF_HUB_DOWNLOAD_TIMEOUT: 10
```

## huggingface-cli jobs

Run compute jobs on Hugging Face infrastructure with a familiar Docker-like interface.

`huggingface-cli jobs` is a command-line tool that lets you run anything on Hugging Face's infrastructure (including GPUs and TPUs!) with simple commands. Think `docker run`, but for running code on A100s.

```bash
# Directly run Python code
>>> huggingface-cli jobs run python:3.12 python -c "print('Hello from the cloud!')"

# Use GPUs without any setup
>>> huggingface-cli jobs run --flavor a10g-small pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel \
... python -c "import torch; print(torch.cuda.get_device_name())"

# Run from Hugging Face Spaces
>>> huggingface-cli jobs run hf.co/spaces/lhoestq/duckdb duckdb -c "select 'hello world'"

# Run a Python script with `uv` (experimental)
>>> huggingface-cli jobs uv run my_script.py
```

### ✨ Key Features

- 🐳 **Docker-like CLI**: Familiar commands (`run`, `ps`, `logs`, `inspect`) to run and manage jobs
- 🔥 **Any Hardware**: From CPUs to A100 GPUs and TPU pods - switch with a simple flag
- 📦 **Run Anything**: Use Docker images, HF Spaces, or your custom containers
- 🔐 **Simple Auth**: Just use your HF token
- 📊 **Live Monitoring**: Stream logs in real-time, just like running locally
- 💰 **Pay-as-you-go**: Only pay for the seconds you use

### Prerequisites

- A Hugging Face account (currently in testing for HF staff)
- Authenticate with the Hugging Gace Hub (e.g. `huggingface-cli login`)

### Quick Start

#### 1. Run your first job

```bash
# Run a simple Python script
>>> huggingface-cli jobs run python:3.12 python -c "print('Hello from HF compute!')"
```

This command runs the job and shows the logs. You can pass `--detach` to run the Job in the background and only print the Job ID.

#### 2. Check job status

```bash
# List your running jobs
>>> huggingface-cli jobs ps

# Inspect the status of a job
>>> huggingface-cli jobs inspect <job_id>

# View logs from a job
>>> huggingface-cli jobs logs <job_id>

# Cancel a job
>>> huggingface-cli jobs cancel <job_id>
```

#### 3. Run on GPU

You can also run jobs on GPUs or TPUs with the `--flavor` option. For example, to run a PyTorch job on an A10G GPU:

```bash
# Use an A10G GPU to check PyTorch CUDA
>>> huggingface-cli jobs run --flavor a10g-small pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel \
... python -c "import torch; print(f"This code ran with the following GPU: {torch.cuda.get_device_name()}")"
```

Running this will show the following output!

```bash
This code ran with the following GPU: NVIDIA A10G
```

That's it! You're now running code on Hugging Face's infrastructure. For more detailed information checkout the [Quickstart Guide](docs/quickstart.md).

### Common Use Cases

- **Model Training**: Fine-tune or train models on GPUs (T4, A10G, A100) without managing infrastructure
- **Synthetic Data Generation**: Generate large-scale datasets using LLMs on powerful hardware
- **Data Processing**: Process massive datasets with high-CPU configurations for parallel workloads
- **Batch Inference**: Run offline inference on thousands of samples using optimized GPU setups
- **Experiments & Benchmarks**: Run ML experiments on consistent hardware for reproducible results
- **Development & Debugging**: Test GPU code without local CUDA setup

### Pass Environment variables and Secrets

You can pass environment variables to your job using

```bash
# Pass environment variables
>>> huggingface-cli jobs run -e FOO=foo -e BAR=bar python:3.12 python -c "import os; print(os.environ['FOO'], os.environ['BAR'])"
```

```bash
# Pass an environment from a local .env file
>>> huggingface-cli jobs run --env-file .env python:3.12 python -c "import os; print(os.environ['FOO'], os.environ['BAR'])"
```

```bash
# Pass secrets - they will be encrypted server side
>>> huggingface-cli jobs run -s MY_SECRET=psswrd python:3.12 python -c "import os; print(os.environ['MY_SECRET'])"
```

```bash
# Pass secrets from a local .secrets.env file - they will be encrypted server side
>>> huggingface-cli jobs run --secret-env-file .secrets.env python:3.12 python -c "import os; print(os.environ['MY_SECRET'])"
```

### Hardware

Available `--flavor` options:

- CPU: `cpu-basic`, `cpu-upgrade`
- GPU: `t4-small`, `t4-medium`, `l4x1`, `l4x4`, `a10g-small`, `a10g-large`, `a10g-largex2`, `a10g-largex4`,`a100-large`
- TPU: `v5e-1x1`, `v5e-2x2`, `v5e-2x4`

(updated in 03/25 from Hugging Face [suggested_hardware docs](https://huggingface.co/docs/hub/en/spaces-config-reference))

### UV Scripts (Experimental)

Run UV scripts (Python scripts with inline dependencies) on HF infrastructure:

```bash
# Run a UV script (creates temporary repo)
>>> huggingface-cli jobs uv run my_script.py

# Run with persistent repo
>>> huggingface-cli jobs uv run my_script.py --repo my-uv-scripts

# Run with GPU
>>> huggingface-cli jobs uv run ml_training.py --flavor gpu-t4-small

# Pass arguments to script
>>> huggingface-cli jobs uv run process.py input.csv output.parquet --repo data-scripts

# Run a script directly from a URL
>>> huggingface-cli jobs uv run https://huggingface.co/datasets/username/scripts/resolve/main/example.py
```

UV scripts are Python scripts that include their dependencies directly in the file using a special comment syntax. This makes them perfect for self-contained tasks that don't require complex project setups. Learn more about UV scripts in the [UV documentation](https://docs.astral.sh/uv/guides/scripts/).
1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ def get_version() -> str:
"requests",
"tqdm>=4.42.1",
"typing-extensions>=3.7.4.3", # to be able to import TypeAlias
"dotenv",
]

extras = {}
Expand Down
2 changes: 2 additions & 0 deletions src/huggingface_hub/commands/huggingface_cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
from huggingface_hub.commands.delete_cache import DeleteCacheCommand
from huggingface_hub.commands.download import DownloadCommand
from huggingface_hub.commands.env import EnvironmentCommand
from huggingface_hub.commands.jobs import JobsCommands
from huggingface_hub.commands.lfs import LfsCommands
from huggingface_hub.commands.repo import RepoCommands
from huggingface_hub.commands.repo_files import RepoFilesCommand
Expand Down Expand Up @@ -44,6 +45,7 @@ def main():
DeleteCacheCommand.register_subcommand(commands_parser)
TagCommands.register_subcommand(commands_parser)
VersionCommand.register_subcommand(commands_parser)
JobsCommands.register_subcommand(commands_parser)

# Experimental
UploadLargeFolderCommand.register_subcommand(commands_parser)
Expand Down
48 changes: 48 additions & 0 deletions src/huggingface_hub/commands/jobs/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Copyright 2025 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Contains commands to interact with jobs on the Hugging Face Hub.

Usage:
# run a job
huggingface-cli jobs run image command
"""

from argparse import _SubParsersAction

from huggingface_hub.commands import BaseHuggingfaceCLICommand
from huggingface_hub.commands.jobs.cancel import CancelCommand
from huggingface_hub.commands.jobs.inspect import InspectCommand
from huggingface_hub.commands.jobs.logs import LogsCommand
from huggingface_hub.commands.jobs.ps import PsCommand
from huggingface_hub.commands.jobs.run import RunCommand
from huggingface_hub.commands.jobs.uv import UvCommand
from huggingface_hub.utils import logging


logger = logging.get_logger(__name__)


class JobsCommands(BaseHuggingfaceCLICommand):
@staticmethod
def register_subcommand(parser: _SubParsersAction):
jobs_parser = parser.add_parser("jobs", help="Commands to interact with your huggingface.co jobs.")
jobs_subparsers = jobs_parser.add_subparsers(help="huggingface.co jobs related commands")

# Register commands
InspectCommand.register_subcommand(jobs_subparsers)
LogsCommand.register_subcommand(jobs_subparsers)
PsCommand.register_subcommand(jobs_subparsers)
RunCommand.register_subcommand(jobs_subparsers)
CancelCommand.register_subcommand(jobs_subparsers)
UvCommand.register_subcommand(jobs_subparsers)
29 changes: 29 additions & 0 deletions src/huggingface_hub/commands/jobs/_cli_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
import os
from typing import List, Union


def tabulate(rows: List[List[Union[str, int]]], headers: List[str]) -> str:
"""
Inspired by:

- stackoverflow.com/a/8356620/593036
- stackoverflow.com/questions/9535954/printing-lists-as-tabular-data
"""
col_widths = [max(len(str(x)) for x in col) for col in zip(*rows, headers)]
terminal_width = max(os.get_terminal_size().columns, len(headers) * 12)
while len(headers) + sum(col_widths) > terminal_width:
col_to_minimize = col_widths.index(max(col_widths))
col_widths[col_to_minimize] //= 2
if len(headers) + sum(col_widths) <= terminal_width:
col_widths[col_to_minimize] = terminal_width - sum(col_widths) - len(headers) + col_widths[col_to_minimize]
row_format = ("{{:{}}} " * len(headers)).format(*col_widths)
lines = []
lines.append(row_format.format(*headers))
lines.append(row_format.format(*["-" * w for w in col_widths]))
for row in rows:
row_format_args = [
str(x)[: col_width - 3] + "..." if len(str(x)) > col_width else str(x)
for x, col_width in zip(row, col_widths)
]
lines.append(row_format.format(*row_format_args))
return "\n".join(lines)
32 changes: 32 additions & 0 deletions src/huggingface_hub/commands/jobs/cancel.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
from argparse import Namespace, _SubParsersAction
from typing import Optional

import requests

from huggingface_hub import whoami
from huggingface_hub.utils import build_hf_headers

from .. import BaseHuggingfaceCLICommand


class CancelCommand(BaseHuggingfaceCLICommand):
@staticmethod
def register_subcommand(parser: _SubParsersAction) -> None:
run_parser = parser.add_parser("jobs cancel", help="Cancel a Job")
run_parser.add_argument("job_id", type=str, help="Job ID")
run_parser.add_argument(
"--token", type=str, help="A User Access Token generated from https://huggingface.co/settings/tokens"
)
run_parser.set_defaults(func=CancelCommand)

def __init__(self, args: Namespace) -> None:
self.job_id: str = args.job_id
self.token: Optional[str] = args.token or None

def run(self) -> None:
username = whoami(self.token)["name"]
headers = build_hf_headers(token=self.token)
requests.post(
f"https://huggingface.co/api/jobs/{username}/{self.job_id}/cancel",
headers=headers,
).raise_for_status()
37 changes: 37 additions & 0 deletions src/huggingface_hub/commands/jobs/inspect.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
import json
from argparse import Namespace, _SubParsersAction
from typing import Optional

import requests

from huggingface_hub import whoami
from huggingface_hub.utils import build_hf_headers

from .. import BaseHuggingfaceCLICommand


class InspectCommand(BaseHuggingfaceCLICommand):
@staticmethod
def register_subcommand(parser: _SubParsersAction) -> None:
run_parser = parser.add_parser("inspect", help="Display detailed information on one or more Jobs")
run_parser.add_argument(
"--token", type=str, help="A User Access Token generated from https://huggingface.co/settings/tokens"
)
run_parser.add_argument("jobs", nargs="...", help="The jobs to inspect")
run_parser.set_defaults(func=InspectCommand)

def __init__(self, args: Namespace) -> None:
self.token: Optional[str] = args.token or None
self.jobs: list[str] = args.jobs

def run(self) -> None:
username = whoami(self.token)["name"]
headers = build_hf_headers(token=self.token)
inspections = [
requests.get(
f"https://huggingface.co/api/jobs/{username}/{job}",
headers=headers,
).json()
for job in self.jobs
]
print(json.dumps(inspections, indent=4))
Loading