huggingface · lhoestq · Jul 10, 2025 · Jul 10, 2025 · Jul 10, 2025 · Jul 10, 2025
diff --git a/docs/source/en/guides/cli.md b/docs/source/en/guides/cli.md
@@ -604,3 +604,144 @@ Copy-and-paste the text below in your GitHub issue.
 - HF_HUB_ETAG_TIMEOUT: 10
 - HF_HUB_DOWNLOAD_TIMEOUT: 10
 ```
+
+## huggingface-cli jobs
+
+Run compute jobs on Hugging Face infrastructure with a familiar Docker-like interface.
+
+`huggingface-cli jobs` is a command-line tool that lets you run anything on Hugging Face's infrastructure (including GPUs and TPUs!) with simple commands. Think `docker run`, but for running code on A100s.
+
+```bash
+# Directly run Python code
+>>> huggingface-cli jobs run python:3.12 python -c "print('Hello from the cloud!')"
+
+# Use GPUs without any setup
+>>> huggingface-cli jobs run --flavor a10g-small pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel \
+... python -c "import torch; print(torch.cuda.get_device_name())"
+
+# Run from Hugging Face Spaces
+>>> huggingface-cli jobs run hf.co/spaces/lhoestq/duckdb duckdb -c "select 'hello world'"
+
+# Run a Python script with `uv` (experimental)
+>>> huggingface-cli jobs uv run my_script.py
+```
+
+### ✨ Key Features
+
+- 🐳 **Docker-like CLI**: Familiar commands (`run`, `ps`, `logs`, `inspect`) to run and manage jobs
+- 🔥 **Any Hardware**: From CPUs to A100 GPUs and TPU pods - switch with a simple flag
+- 📦 **Run Anything**: Use Docker images, HF Spaces, or your custom containers
+- 🔐 **Simple Auth**: Just use your HF token
+- 📊 **Live Monitoring**: Stream logs in real-time, just like running locally
+- 💰 **Pay-as-you-go**: Only pay for the seconds you use
+
+### Quick Start
+
+#### 1. Run your first job
+
+```bash
+# Run a simple Python script
+>>> huggingface-cli jobs run python:3.12 python -c "print('Hello from HF compute!')"
+```
+
+This command runs the job and shows the logs. You can pass `--detach` to run the Job in the background and only print the Job ID.
+
+#### 2. Check job status
+
+```bash
+# List your running jobs
+>>> huggingface-cli jobs ps
+
+# Inspect the status of a job
+>>> huggingface-cli jobs inspect <job_id>
+
+# View logs from a job
+>>> huggingface-cli jobs logs <job_id>
+
+# Cancel a job
+>>> huggingface-cli jobs cancel <job_id>
+```
+
+#### 3. Run on GPU
+
+You can also run jobs on GPUs or TPUs with the `--flavor` option. For example, to run a PyTorch job on an A10G GPU:
+
+```bash
+# Use an A10G GPU to check PyTorch CUDA
+>>> huggingface-cli jobs run --flavor a10g-small pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel \
+... python -c "import torch; print(f"This code ran with the following GPU: {torch.cuda.get_device_name()}")"
+```
+
+Running this will show the following output!
+
+```bash
+This code ran with the following GPU: NVIDIA A10G
+```
+
+That's it! You're now running code on Hugging Face's infrastructure. For more detailed information checkout the [Quickstart Guide](docs/quickstart.md).
+
+### Common Use Cases
+
+- **Model Training**: Fine-tune or train models on GPUs (T4, A10G, A100) without managing infrastructure
+- **Synthetic Data Generation**: Generate large-scale datasets using LLMs on powerful hardware
+- **Data Processing**: Process massive datasets with high-CPU configurations for parallel workloads
+- **Batch Inference**: Run offline inference on thousands of samples using optimized GPU setups
+- **Experiments & Benchmarks**: Run ML experiments on consistent hardware for reproducible results
+- **Development & Debugging**: Test GPU code without local CUDA setup
+
+### Pass Environment variables and Secrets
+
+You can pass environment variables to your job using 
+
+```bash
+# Pass environment variables
+>>> huggingface-cli jobs run -e FOO=foo -e BAR=bar python:3.12 python -c "import os; print(os.environ['FOO'], os.environ['BAR'])"
+```
+
+```bash
+# Pass an environment from a local .env file
+>>> huggingface-cli jobs run --env-file .env python:3.12 python -c "import os; print(os.environ['FOO'], os.environ['BAR'])"
+```
+
+```bash
+# Pass secrets - they will be encrypted server side
+>>> huggingface-cli jobs run -s MY_SECRET=psswrd python:3.12 python -c "import os; print(os.environ['MY_SECRET'])"
+```
+
+```bash
+# Pass secrets from a local .env.secrets file - they will be encrypted server side
+>>> huggingface-cli jobs run --secrets-file .env.secrets python:3.12 python -c "import os; print(os.environ['MY_SECRET'])"
+```
+
+### Hardware
+
+Available `--flavor` options:
+
+- CPU: `cpu-basic`, `cpu-upgrade`
+- GPU: `t4-small`, `t4-medium`, `l4x1`, `l4x4`, `a10g-small`, `a10g-large`, `a10g-largex2`, `a10g-largex4`,`a100-large`
+- TPU: `v5e-1x1`, `v5e-2x2`, `v5e-2x4`
+
+(updated in 03/25 from Hugging Face [suggested_hardware docs](https://huggingface.co/docs/hub/en/spaces-config-reference))
+
+### UV Scripts (Experimental)
+
+Run UV scripts (Python scripts with inline dependencies) on HF infrastructure:
+
+```bash
+# Run a UV script (creates temporary repo)
+>>> huggingface-cli jobs uv run my_script.py
+
+# Run with persistent repo
+>>> huggingface-cli jobs uv run my_script.py --repo my-uv-scripts
+
+# Run with GPU
+>>> huggingface-cli jobs uv run ml_training.py --flavor gpu-t4-small
+
+# Pass arguments to script
+>>> huggingface-cli jobs uv run process.py input.csv output.parquet --repo data-scripts
+
+# Run a script directly from a URL
+>>> huggingface-cli jobs uv run https://huggingface.co/datasets/username/scripts/resolve/main/example.py
+```
+
+UV scripts are Python scripts that include their dependencies directly in the file using a special comment syntax. This makes them perfect for self-contained tasks that don't require complex project setups. Learn more about UV scripts in the [UV documentation](https://docs.astral.sh/uv/guides/scripts/).
diff --git a/src/huggingface_hub/__init__.py b/src/huggingface_hub/__init__.py
@@ -165,6 +165,7 @@
         "add_space_variable",
         "auth_check",
         "cancel_access_request",
+        "cancel_job",
         "change_discussion_status",
         "comment_discussion",
         "create_branch",
@@ -194,6 +195,7 @@
         "duplicate_space",
         "edit_discussion_comment",
         "enable_webhook",
+        "fetch_job_logs",
         "file_exists",
         "get_collection",
         "get_dataset_tags",
@@ -210,11 +212,13 @@
         "get_user_overview",
         "get_webhook",
         "grant_access",
+        "inspect_job",
         "list_accepted_access_requests",
         "list_collections",
         "list_datasets",
         "list_inference_catalog",
         "list_inference_endpoints",
+        "list_jobs",
         "list_lfs_files",
         "list_liked_repos",
         "list_models",
@@ -251,6 +255,7 @@
         "resume_inference_endpoint",
         "revision_exists",
         "run_as_future",
+        "run_job",
         "scale_to_zero_inference_endpoint",
         "set_space_sleep_time",
         "space_info",
@@ -792,6 +797,7 @@
     "auth_switch",
     "cached_assets_path",
     "cancel_access_request",
+    "cancel_job",
     "change_discussion_status",
     "comment_discussion",
     "configure_http_backend",
@@ -825,6 +831,7 @@
     "enable_webhook",
     "export_entries_as_dduf",
     "export_folder_as_dduf",
+    "fetch_job_logs",
     "file_exists",
     "from_pretrained_fastai",
     "from_pretrained_keras",
@@ -851,12 +858,14 @@
     "grant_access",
     "hf_hub_download",
     "hf_hub_url",
+    "inspect_job",
     "interpreter_login",
     "list_accepted_access_requests",
     "list_collections",
     "list_datasets",
     "list_inference_catalog",
     "list_inference_endpoints",
+    "list_jobs",
     "list_lfs_files",
     "list_liked_repos",
     "list_models",
@@ -907,6 +916,7 @@
     "resume_inference_endpoint",
     "revision_exists",
     "run_as_future",
+    "run_job",
     "save_pretrained_keras",
     "save_torch_model",
     "save_torch_state_dict",
@@ -1143,6 +1153,7 @@ def __dir__():
         add_space_variable,  # noqa: F401
         auth_check,  # noqa: F401
         cancel_access_request,  # noqa: F401
+        cancel_job,  # noqa: F401
         change_discussion_status,  # noqa: F401
         comment_discussion,  # noqa: F401
         create_branch,  # noqa: F401
@@ -1172,6 +1183,7 @@ def __dir__():
         duplicate_space,  # noqa: F401
         edit_discussion_comment,  # noqa: F401
         enable_webhook,  # noqa: F401
+        fetch_job_logs,  # noqa: F401
         file_exists,  # noqa: F401
         get_collection,  # noqa: F401
         get_dataset_tags,  # noqa: F401
@@ -1188,11 +1200,13 @@ def __dir__():
         get_user_overview,  # noqa: F401
         get_webhook,  # noqa: F401
         grant_access,  # noqa: F401
+        inspect_job,  # noqa: F401
         list_accepted_access_requests,  # noqa: F401
         list_collections,  # noqa: F401
         list_datasets,  # noqa: F401
         list_inference_catalog,  # noqa: F401
         list_inference_endpoints,  # noqa: F401
+        list_jobs,  # noqa: F401
         list_lfs_files,  # noqa: F401
         list_liked_repos,  # noqa: F401
         list_models,  # noqa: F401
@@ -1229,6 +1243,7 @@ def __dir__():
         resume_inference_endpoint,  # noqa: F401
         revision_exists,  # noqa: F401
         run_as_future,  # noqa: F401
+        run_job,  # noqa: F401
         scale_to_zero_inference_endpoint,  # noqa: F401
         set_space_sleep_time,  # noqa: F401
         space_info,  # noqa: F401

diff --git a/src/huggingface_hub/_jobs_api.py b/src/huggingface_hub/_jobs_api.py
@@ -0,0 +1,126 @@
+# coding=utf-8
+# Copyright 2019-present, the HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from dataclasses import dataclass
+from datetime import datetime
+from enum import Enum
+from typing import Any, Dict, List, Optional
+
+from huggingface_hub import constants
+from huggingface_hub._space_api import SpaceHardware
+from huggingface_hub.utils._datetime import parse_datetime
+from huggingface_hub.utils._http import fix_hf_endpoint_in_url
+
+
+class JobStage(str, Enum):
+    """
+    Enumeration of possible stage of a Job on the Hub.
+
+    Value can be compared to a string:
+    ```py
+    assert JobStage.COMPLETED == "COMPLETED"
+    ```
+
+    Taken from https://github.com/huggingface/moon-landing/blob/main/server/job_types/JobInfo.ts#L61 (private url).
+    """
+
+    # Copied from moon-landing > server > lib > Job.ts
+    COMPLETED = "COMPLETED"
+    CANCELED = "CANCELED"
+    ERROR = "ERROR"
+    DELETED = "DELETED"
+    RUNNING = "RUNNING"
+
+
+class JobUrl(str):
+    """Subclass of `str` describing a job URL on the Hub.
+
+    `JobUrl` is returned by `HfApi.create_job`. It inherits from `str` for backward
+    compatibility. At initialization, the URL is parsed to populate properties:
+    - endpoint (`str`)
+    - namespace (`Optional[str]`)
+    - job_id (`str`)
+    - url (`str`)
+
+    Args:
+        url (`Any`):
+            String value of the job url.
+        endpoint (`str`, *optional*):
+            Endpoint of the Hub. Defaults to <https://huggingface.co>.
+
+    Example:
+    ```py
+    >>> HfApi.run_job("ubuntu", ["echo", "hello"])
+    JobUrl('https://huggingface.co/jobs/lhoestq/6877b757344d8f02f6001012', endpoint='https://huggingface.co', job_id='6877b757344d8f02f6001012')
+    ```
+
+    Raises:
+        [`ValueError`](https://docs.python.org/3/library/exceptions.html#ValueError)
+            If URL cannot be parsed.
+    """
+
+    def __new__(cls, url: Any, endpoint: Optional[str] = None):
+        url = fix_hf_endpoint_in_url(url, endpoint=endpoint)
+        return super(JobUrl, cls).__new__(cls, url)
+
+    def __init__(self, url: Any, endpoint: Optional[str] = None) -> None:
+        super().__init__()
+        # Parse URL
+        self.endpoint = endpoint or constants.ENDPOINT
+        namespace, job_id = url.split("/")[-2:]
+
+        # Populate fields
+        self.namespace = namespace
+        self.job_id = job_id
+        self.url = str(self)  # just in case it's needed
+
+    def __repr__(self) -> str:
+        return f"JobUrl('{self}', endpoint='{self.endpoint}', job_id='{self.job_id}')"
+
+
+@dataclass
+class JobStatus:
+    stage: JobStage
+    message: Optional[str]
+
+    def __init__(self, **kwargs) -> None:
+        self.stage = kwargs["stage"]
+        self.message = kwargs.get("message")
+
+
+@dataclass
+class JobInfo:
+    id: str
+    created_at: Optional[datetime]
+    docker_image: Optional[str]
+    space_id: Optional[str]
+    command: Optional[List[str]]
+    arguments: Optional[List[str]]
+    environment: Optional[Dict[str, Any]]
+    secrets: Optional[Dict[str, Any]]
+    flavor: Optional[SpaceHardware]
+    status: Optional[JobStatus]
+
+    def __init__(self, **kwargs) -> None:
+        self.id = kwargs["id"]
+        created_at = kwargs.get("createdAt") or kwargs.get("created_at")
+        self.created_at = parse_datetime(created_at) if created_at else None
+        self.docker_image = kwargs.get("dockerImage") or kwargs.get("docker_image")
+        self.space_id = kwargs.get("spaceId") or kwargs.get("space_id")
+        self.command = kwargs.get("command")
+        self.arguments = kwargs.get("arguments")
+        self.environment = kwargs.get("environment")
+        self.secrets = kwargs.get("secrets")
+        self.flavor = kwargs.get("flavor")
+        self.status = JobStatus(**(kwargs["status"] if isinstance(kwargs.get("status"), dict) else {}))