Skip to content

[Jobs] Add huggingface-cli jobs commands #3211

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 42 commits into from
Jul 23, 2025
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
4836c04
jobs
lhoestq Jul 10, 2025
682a789
style
lhoestq Jul 10, 2025
af05c27
docs
lhoestq Jul 10, 2025
3895c8e
mypy
lhoestq Jul 10, 2025
3661cb7
style
lhoestq Jul 10, 2025
13f17c8
minor
lhoestq Jul 10, 2025
5e99d64
remove hfjobs mentions
lhoestq Jul 10, 2025
7efe998
add huggingface-cli jobs uv commands
lhoestq Jul 11, 2025
ab8511e
add some uv options
lhoestq Jul 11, 2025
3c00292
add test
lhoestq Jul 11, 2025
3136ef4
fix for 3.8
lhoestq Jul 11, 2025
9fc3c78
Update src/huggingface_hub/commands/jobs/uv.py
davanstrien Jul 14, 2025
fd926b5
move to HfApi
lhoestq Jul 16, 2025
1bf5f66
minor
lhoestq Jul 16, 2025
aefb493
more comments
lhoestq Jul 16, 2025
31a3d97
uv run local_script.py
lhoestq Jul 17, 2025
f7c8be9
lucain's comments
lhoestq Jul 17, 2025
541aa6a
more lucain's comments
lhoestq Jul 17, 2025
251e719
Apply suggestions from code review
lhoestq Jul 21, 2025
97a856b
style
lhoestq Jul 21, 2025
1102968
minor
lhoestq Jul 21, 2025
99b538a
Remove JobUrl and add url in JobInfo directly
Wauplin Jul 22, 2025
53fb0aa
Apply suggestions from code review
lhoestq Jul 22, 2025
4e3523d
add namespace arg
lhoestq Jul 22, 2025
5db3b42
fix wrong job url
lhoestq Jul 22, 2025
76588ef
add missing methods at top level
lhoestq Jul 22, 2025
63dd90f
add docs
lhoestq Jul 22, 2025
bfd326a
uv script url as env, not secret
lhoestq Jul 22, 2025
c9ab2f1
rename docs
lhoestq Jul 22, 2025
cf59dca
update test
lhoestq Jul 22, 2025
da1d40d
again
lhoestq Jul 22, 2025
334d831
improve docs
lhoestq Jul 22, 2025
028d32a
Merge branch 'main' into jobs
Wauplin Jul 23, 2025
fed7195
add image arg to run_uv_job
lhoestq Jul 23, 2025
eaaa6a1
List flavors from SpaceHardware
Wauplin Jul 23, 2025
c7660d7
Merge branch 'jobs' of github.com:huggingface/huggingface_hub into jobs
Wauplin Jul 23, 2025
af0e9fb
add to overview
lhoestq Jul 23, 2025
e6043ae
remove zero GPU from flavors
Wauplin Jul 23, 2025
c444391
add JobInfo etc. from _jobs_api in top level __init__
lhoestq Jul 23, 2025
ea6579a
add package_reference doc page
lhoestq Jul 23, 2025
3f6a2f7
minor - link JobInfo in docs
lhoestq Jul 23, 2025
3e049db
JobInfo docstring
lhoestq Jul 23, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
141 changes: 141 additions & 0 deletions docs/source/en/guides/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -604,3 +604,144 @@ Copy-and-paste the text below in your GitHub issue.
- HF_HUB_ETAG_TIMEOUT: 10
- HF_HUB_DOWNLOAD_TIMEOUT: 10
```

## huggingface-cli jobs

Run compute jobs on Hugging Face infrastructure with a familiar Docker-like interface.

`huggingface-cli jobs` is a command-line tool that lets you run anything on Hugging Face's infrastructure (including GPUs and TPUs!) with simple commands. Think `docker run`, but for running code on A100s.

```bash
# Directly run Python code
>>> huggingface-cli jobs run python:3.12 python -c "print('Hello from the cloud!')"

# Use GPUs without any setup
>>> huggingface-cli jobs run --flavor a10g-small pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel \
... python -c "import torch; print(torch.cuda.get_device_name())"

# Run from Hugging Face Spaces
>>> huggingface-cli jobs run hf.co/spaces/lhoestq/duckdb duckdb -c "select 'hello world'"

# Run a Python script with `uv` (experimental)
>>> huggingface-cli jobs uv run my_script.py
```

### ✨ Key Features

- 🐳 **Docker-like CLI**: Familiar commands (`run`, `ps`, `logs`, `inspect`) to run and manage jobs
- 🔥 **Any Hardware**: From CPUs to A100 GPUs and TPU pods - switch with a simple flag
- 📦 **Run Anything**: Use Docker images, HF Spaces, or your custom containers
- 🔐 **Simple Auth**: Just use your HF token
- 📊 **Live Monitoring**: Stream logs in real-time, just like running locally
- 💰 **Pay-as-you-go**: Only pay for the seconds you use

### Quick Start

#### 1. Run your first job

```bash
# Run a simple Python script
>>> huggingface-cli jobs run python:3.12 python -c "print('Hello from HF compute!')"
```

This command runs the job and shows the logs. You can pass `--detach` to run the Job in the background and only print the Job ID.

#### 2. Check job status

```bash
# List your running jobs
>>> huggingface-cli jobs ps

# Inspect the status of a job
>>> huggingface-cli jobs inspect <job_id>

# View logs from a job
>>> huggingface-cli jobs logs <job_id>

# Cancel a job
>>> huggingface-cli jobs cancel <job_id>
```

#### 3. Run on GPU

You can also run jobs on GPUs or TPUs with the `--flavor` option. For example, to run a PyTorch job on an A10G GPU:

```bash
# Use an A10G GPU to check PyTorch CUDA
>>> huggingface-cli jobs run --flavor a10g-small pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel \
... python -c "import torch; print(f"This code ran with the following GPU: {torch.cuda.get_device_name()}")"
```

Running this will show the following output!

```bash
This code ran with the following GPU: NVIDIA A10G
```

That's it! You're now running code on Hugging Face's infrastructure. For more detailed information checkout the [Quickstart Guide](docs/quickstart.md).

### Common Use Cases

- **Model Training**: Fine-tune or train models on GPUs (T4, A10G, A100) without managing infrastructure
- **Synthetic Data Generation**: Generate large-scale datasets using LLMs on powerful hardware
- **Data Processing**: Process massive datasets with high-CPU configurations for parallel workloads
- **Batch Inference**: Run offline inference on thousands of samples using optimized GPU setups
- **Experiments & Benchmarks**: Run ML experiments on consistent hardware for reproducible results
- **Development & Debugging**: Test GPU code without local CUDA setup

### Pass Environment variables and Secrets

You can pass environment variables to your job using

```bash
# Pass environment variables
>>> huggingface-cli jobs run -e FOO=foo -e BAR=bar python:3.12 python -c "import os; print(os.environ['FOO'], os.environ['BAR'])"
```

```bash
# Pass an environment from a local .env file
>>> huggingface-cli jobs run --env-file .env python:3.12 python -c "import os; print(os.environ['FOO'], os.environ['BAR'])"
```

```bash
# Pass secrets - they will be encrypted server side
>>> huggingface-cli jobs run -s MY_SECRET=psswrd python:3.12 python -c "import os; print(os.environ['MY_SECRET'])"
```

```bash
# Pass secrets from a local .secrets.env file - they will be encrypted server side
>>> huggingface-cli jobs run --secret-env-file .secrets.env python:3.12 python -c "import os; print(os.environ['MY_SECRET'])"
```

### Hardware

Available `--flavor` options:

- CPU: `cpu-basic`, `cpu-upgrade`
- GPU: `t4-small`, `t4-medium`, `l4x1`, `l4x4`, `a10g-small`, `a10g-large`, `a10g-largex2`, `a10g-largex4`,`a100-large`
- TPU: `v5e-1x1`, `v5e-2x2`, `v5e-2x4`

(updated in 03/25 from Hugging Face [suggested_hardware docs](https://huggingface.co/docs/hub/en/spaces-config-reference))

### UV Scripts (Experimental)

Run UV scripts (Python scripts with inline dependencies) on HF infrastructure:

```bash
# Run a UV script (creates temporary repo)
>>> huggingface-cli jobs uv run my_script.py

# Run with persistent repo
>>> huggingface-cli jobs uv run my_script.py --repo my-uv-scripts

# Run with GPU
>>> huggingface-cli jobs uv run ml_training.py --flavor gpu-t4-small

# Pass arguments to script
>>> huggingface-cli jobs uv run process.py input.csv output.parquet --repo data-scripts

# Run a script directly from a URL
>>> huggingface-cli jobs uv run https://huggingface.co/datasets/username/scripts/resolve/main/example.py
```

UV scripts are Python scripts that include their dependencies directly in the file using a special comment syntax. This makes them perfect for self-contained tasks that don't require complex project setups. Learn more about UV scripts in the [UV documentation](https://docs.astral.sh/uv/guides/scripts/).
15 changes: 15 additions & 0 deletions src/huggingface_hub/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,7 @@
"add_space_variable",
"auth_check",
"cancel_access_request",
"cancel_job",
"change_discussion_status",
"comment_discussion",
"create_branch",
Expand Down Expand Up @@ -194,6 +195,7 @@
"duplicate_space",
"edit_discussion_comment",
"enable_webhook",
"fetch_job_logs",
"file_exists",
"get_collection",
"get_dataset_tags",
Expand All @@ -210,11 +212,13 @@
"get_user_overview",
"get_webhook",
"grant_access",
"inspect_job",
"list_accepted_access_requests",
"list_collections",
"list_datasets",
"list_inference_catalog",
"list_inference_endpoints",
"list_jobs",
"list_lfs_files",
"list_liked_repos",
"list_models",
Expand Down Expand Up @@ -251,6 +255,7 @@
"resume_inference_endpoint",
"revision_exists",
"run_as_future",
"run_job",
"scale_to_zero_inference_endpoint",
"set_space_sleep_time",
"space_info",
Expand Down Expand Up @@ -792,6 +797,7 @@
"auth_switch",
"cached_assets_path",
"cancel_access_request",
"cancel_job",
"change_discussion_status",
"comment_discussion",
"configure_http_backend",
Expand Down Expand Up @@ -825,6 +831,7 @@
"enable_webhook",
"export_entries_as_dduf",
"export_folder_as_dduf",
"fetch_job_logs",
"file_exists",
"from_pretrained_fastai",
"from_pretrained_keras",
Expand All @@ -851,12 +858,14 @@
"grant_access",
"hf_hub_download",
"hf_hub_url",
"inspect_job",
"interpreter_login",
"list_accepted_access_requests",
"list_collections",
"list_datasets",
"list_inference_catalog",
"list_inference_endpoints",
"list_jobs",
"list_lfs_files",
"list_liked_repos",
"list_models",
Expand Down Expand Up @@ -907,6 +916,7 @@
"resume_inference_endpoint",
"revision_exists",
"run_as_future",
"run_job",
"save_pretrained_keras",
"save_torch_model",
"save_torch_state_dict",
Expand Down Expand Up @@ -1143,6 +1153,7 @@ def __dir__():
add_space_variable, # noqa: F401
auth_check, # noqa: F401
cancel_access_request, # noqa: F401
cancel_job, # noqa: F401
change_discussion_status, # noqa: F401
comment_discussion, # noqa: F401
create_branch, # noqa: F401
Expand Down Expand Up @@ -1172,6 +1183,7 @@ def __dir__():
duplicate_space, # noqa: F401
edit_discussion_comment, # noqa: F401
enable_webhook, # noqa: F401
fetch_job_logs, # noqa: F401
file_exists, # noqa: F401
get_collection, # noqa: F401
get_dataset_tags, # noqa: F401
Expand All @@ -1188,11 +1200,13 @@ def __dir__():
get_user_overview, # noqa: F401
get_webhook, # noqa: F401
grant_access, # noqa: F401
inspect_job, # noqa: F401
list_accepted_access_requests, # noqa: F401
list_collections, # noqa: F401
list_datasets, # noqa: F401
list_inference_catalog, # noqa: F401
list_inference_endpoints, # noqa: F401
list_jobs, # noqa: F401
list_lfs_files, # noqa: F401
list_liked_repos, # noqa: F401
list_models, # noqa: F401
Expand Down Expand Up @@ -1229,6 +1243,7 @@ def __dir__():
resume_inference_endpoint, # noqa: F401
revision_exists, # noqa: F401
run_as_future, # noqa: F401
run_job, # noqa: F401
scale_to_zero_inference_endpoint, # noqa: F401
set_space_sleep_time, # noqa: F401
space_info, # noqa: F401
Expand Down
126 changes: 126 additions & 0 deletions src/huggingface_hub/_jobs_api.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# coding=utf-8
# Copyright 2019-present, the HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from dataclasses import dataclass
from datetime import datetime
from enum import Enum
from typing import Any, Dict, List, Optional

from huggingface_hub import constants
from huggingface_hub._space_api import SpaceHardware
from huggingface_hub.utils._datetime import parse_datetime
from huggingface_hub.utils._http import fix_hf_endpoint_in_url


class JobStage(str, Enum):
"""
Enumeration of possible stage of a Job on the Hub.

Value can be compared to a string:
```py
assert JobStage.COMPLETED == "COMPLETED"
```

Taken from https://github.com/huggingface/moon-landing/blob/main/server/job_types/JobInfo.ts#L61 (private url).
"""

# Copied from moon-landing > server > lib > Job.ts
COMPLETED = "COMPLETED"
CANCELED = "CANCELED"
ERROR = "ERROR"
DELETED = "DELETED"
RUNNING = "RUNNING"


class JobUrl(str):
"""Subclass of `str` describing a job URL on the Hub.

`JobUrl` is returned by `HfApi.create_job`. It inherits from `str` for backward
compatibility. At initialization, the URL is parsed to populate properties:
- endpoint (`str`)
- namespace (`Optional[str]`)
- job_id (`str`)
- url (`str`)

Args:
url (`Any`):
String value of the job url.
endpoint (`str`, *optional*):
Endpoint of the Hub. Defaults to <https://huggingface.co>.

Example:
```py
>>> HfApi.run_job("ubuntu", ["echo", "hello"])
JobUrl('https://huggingface.co/jobs/lhoestq/6877b757344d8f02f6001012', endpoint='https://huggingface.co', job_id='6877b757344d8f02f6001012')
```

Raises:
[`ValueError`](https://docs.python.org/3/library/exceptions.html#ValueError)
If URL cannot be parsed.
"""

def __new__(cls, url: Any, endpoint: Optional[str] = None):
url = fix_hf_endpoint_in_url(url, endpoint=endpoint)
return super(JobUrl, cls).__new__(cls, url)

def __init__(self, url: Any, endpoint: Optional[str] = None) -> None:
super().__init__()
# Parse URL
self.endpoint = endpoint or constants.ENDPOINT
namespace, job_id = url.split("/")[-2:]

# Populate fields
self.namespace = namespace
self.job_id = job_id
self.url = str(self) # just in case it's needed

def __repr__(self) -> str:
return f"JobUrl('{self}', endpoint='{self.endpoint}', job_id='{self.job_id}')"


@dataclass
class JobStatus:
stage: JobStage
message: Optional[str]

def __init__(self, **kwargs) -> None:
self.stage = kwargs["stage"]
self.message = kwargs.get("message")


@dataclass
class JobInfo:
id: str
created_at: Optional[datetime]
docker_image: Optional[str]
space_id: Optional[str]
command: Optional[List[str]]
arguments: Optional[List[str]]
environment: Optional[Dict[str, Any]]
secrets: Optional[Dict[str, Any]]
flavor: Optional[SpaceHardware]
status: Optional[JobStatus]

def __init__(self, **kwargs) -> None:
self.id = kwargs["id"]
created_at = kwargs.get("createdAt") or kwargs.get("created_at")
self.created_at = parse_datetime(created_at) if created_at else None
self.docker_image = kwargs.get("dockerImage") or kwargs.get("docker_image")
self.space_id = kwargs.get("spaceId") or kwargs.get("space_id")
self.command = kwargs.get("command")
self.arguments = kwargs.get("arguments")
self.environment = kwargs.get("environment")
self.secrets = kwargs.get("secrets")
self.flavor = kwargs.get("flavor")
self.status = JobStatus(**(kwargs["status"] if isinstance(kwargs.get("status"), dict) else {}))
Loading