Skip to content

[Jobs] Add huggingface-cli jobs commands #3211

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 42 commits into from
Jul 23, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
4836c04
jobs
lhoestq Jul 10, 2025
682a789
style
lhoestq Jul 10, 2025
af05c27
docs
lhoestq Jul 10, 2025
3895c8e
mypy
lhoestq Jul 10, 2025
3661cb7
style
lhoestq Jul 10, 2025
13f17c8
minor
lhoestq Jul 10, 2025
5e99d64
remove hfjobs mentions
lhoestq Jul 10, 2025
7efe998
add huggingface-cli jobs uv commands
lhoestq Jul 11, 2025
ab8511e
add some uv options
lhoestq Jul 11, 2025
3c00292
add test
lhoestq Jul 11, 2025
3136ef4
fix for 3.8
lhoestq Jul 11, 2025
9fc3c78
Update src/huggingface_hub/commands/jobs/uv.py
davanstrien Jul 14, 2025
fd926b5
move to HfApi
lhoestq Jul 16, 2025
1bf5f66
minor
lhoestq Jul 16, 2025
aefb493
more comments
lhoestq Jul 16, 2025
31a3d97
uv run local_script.py
lhoestq Jul 17, 2025
f7c8be9
lucain's comments
lhoestq Jul 17, 2025
541aa6a
more lucain's comments
lhoestq Jul 17, 2025
251e719
Apply suggestions from code review
lhoestq Jul 21, 2025
97a856b
style
lhoestq Jul 21, 2025
1102968
minor
lhoestq Jul 21, 2025
99b538a
Remove JobUrl and add url in JobInfo directly
Wauplin Jul 22, 2025
53fb0aa
Apply suggestions from code review
lhoestq Jul 22, 2025
4e3523d
add namespace arg
lhoestq Jul 22, 2025
5db3b42
fix wrong job url
lhoestq Jul 22, 2025
76588ef
add missing methods at top level
lhoestq Jul 22, 2025
63dd90f
add docs
lhoestq Jul 22, 2025
bfd326a
uv script url as env, not secret
lhoestq Jul 22, 2025
c9ab2f1
rename docs
lhoestq Jul 22, 2025
cf59dca
update test
lhoestq Jul 22, 2025
da1d40d
again
lhoestq Jul 22, 2025
334d831
improve docs
lhoestq Jul 22, 2025
028d32a
Merge branch 'main' into jobs
Wauplin Jul 23, 2025
fed7195
add image arg to run_uv_job
lhoestq Jul 23, 2025
eaaa6a1
List flavors from SpaceHardware
Wauplin Jul 23, 2025
c7660d7
Merge branch 'jobs' of github.com:huggingface/huggingface_hub into jobs
Wauplin Jul 23, 2025
af0e9fb
add to overview
lhoestq Jul 23, 2025
e6043ae
remove zero GPU from flavors
Wauplin Jul 23, 2025
c444391
add JobInfo etc. from _jobs_api in top level __init__
lhoestq Jul 23, 2025
ea6579a
add package_reference doc page
lhoestq Jul 23, 2025
3f6a2f7
minor - link JobInfo in docs
lhoestq Jul 23, 2025
3e049db
JobInfo docstring
lhoestq Jul 23, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,8 @@
title: Integrate a library
- local: guides/webhooks
title: Webhooks
- local: guides/jobs
title: Jobs
- title: 'Conceptual guides'
sections:
- local: concepts/git_vs_http
Expand Down Expand Up @@ -92,3 +94,5 @@
title: Strict dataclasses
- local: package_reference/oauth
title: OAuth
- local: package_reference/jobs
title: Jobs
144 changes: 144 additions & 0 deletions docs/source/en/guides/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -604,3 +604,147 @@ Copy-and-paste the text below in your GitHub issue.
- HF_HUB_ETAG_TIMEOUT: 10
- HF_HUB_DOWNLOAD_TIMEOUT: 10
```

## huggingface-cli jobs

Run compute jobs on Hugging Face infrastructure with a familiar Docker-like interface.

`huggingface-cli jobs` is a command-line tool that lets you run anything on Hugging Face's infrastructure (including GPUs and TPUs!) with simple commands. Think `docker run`, but for running code on A100s.

```bash
# Directly run Python code
>>> huggingface-cli jobs run python:3.12 python -c "print('Hello from the cloud!')"

# Use GPUs without any setup
>>> huggingface-cli jobs run --flavor a10g-small pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel \
... python -c "import torch; print(torch.cuda.get_device_name())"

# Run in an organization account
>>> huggingface-cli jobs run --namespace my-org-name python:3.12 python -c "print('Running in an org account')"

# Run from Hugging Face Spaces
>>> huggingface-cli jobs run hf.co/spaces/lhoestq/duckdb duckdb -c "select 'hello world'"

# Run a Python script with `uv` (experimental)
>>> huggingface-cli jobs uv run my_script.py
```

### ✨ Key Features

- 🐳 **Docker-like CLI**: Familiar commands (`run`, `ps`, `logs`, `inspect`) to run and manage jobs
- 🔥 **Any Hardware**: From CPUs to A100 GPUs and TPU pods - switch with a simple flag
- 📦 **Run Anything**: Use Docker images, HF Spaces, or your custom containers
- 🔐 **Simple Auth**: Just use your HF token
- 📊 **Live Monitoring**: Stream logs in real-time, just like running locally
- 💰 **Pay-as-you-go**: Only pay for the seconds you use

### Quick Start

#### 1. Run your first job

```bash
# Run a simple Python script
>>> huggingface-cli jobs run python:3.12 python -c "print('Hello from HF compute!')"
```

This command runs the job and shows the logs. You can pass `--detach` to run the Job in the background and only print the Job ID.

#### 2. Check job status

```bash
# List your running jobs
>>> huggingface-cli jobs ps

# Inspect the status of a job
>>> huggingface-cli jobs inspect <job_id>

# View logs from a job
>>> huggingface-cli jobs logs <job_id>

# Cancel a job
>>> huggingface-cli jobs cancel <job_id>
```

#### 3. Run on GPU

You can also run jobs on GPUs or TPUs with the `--flavor` option. For example, to run a PyTorch job on an A10G GPU:

```bash
# Use an A10G GPU to check PyTorch CUDA
>>> huggingface-cli jobs run --flavor a10g-small pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel \
... python -c "import torch; print(f"This code ran with the following GPU: {torch.cuda.get_device_name()}")"
```

Running this will show the following output!

```bash
This code ran with the following GPU: NVIDIA A10G
```

That's it! You're now running code on Hugging Face's infrastructure.

### Common Use Cases

- **Model Training**: Fine-tune or train models on GPUs (T4, A10G, A100) without managing infrastructure
- **Synthetic Data Generation**: Generate large-scale datasets using LLMs on powerful hardware
- **Data Processing**: Process massive datasets with high-CPU configurations for parallel workloads
- **Batch Inference**: Run offline inference on thousands of samples using optimized GPU setups
- **Experiments & Benchmarks**: Run ML experiments on consistent hardware for reproducible results
- **Development & Debugging**: Test GPU code without local CUDA setup

### Pass Environment variables and Secrets

You can pass environment variables to your job using

```bash
# Pass environment variables
>>> huggingface-cli jobs run -e FOO=foo -e BAR=bar python:3.12 python -c "import os; print(os.environ['FOO'], os.environ['BAR'])"
```

```bash
# Pass an environment from a local .env file
>>> huggingface-cli jobs run --env-file .env python:3.12 python -c "import os; print(os.environ['FOO'], os.environ['BAR'])"
```

```bash
# Pass secrets - they will be encrypted server side
>>> huggingface-cli jobs run -s MY_SECRET=psswrd python:3.12 python -c "import os; print(os.environ['MY_SECRET'])"
```

```bash
# Pass secrets from a local .env.secrets file - they will be encrypted server side
>>> huggingface-cli jobs run --secrets-file .env.secrets python:3.12 python -c "import os; print(os.environ['MY_SECRET'])"
```

### Hardware

Available `--flavor` options:

- CPU: `cpu-basic`, `cpu-upgrade`
- GPU: `t4-small`, `t4-medium`, `l4x1`, `l4x4`, `a10g-small`, `a10g-large`, `a10g-largex2`, `a10g-largex4`,`a100-large`
- TPU: `v5e-1x1`, `v5e-2x2`, `v5e-2x4`

(updated in 07/2025 from Hugging Face [suggested_hardware docs](https://huggingface.co/docs/hub/en/spaces-config-reference))

### UV Scripts (Experimental)

Run UV scripts (Python scripts with inline dependencies) on HF infrastructure:

```bash
# Run a UV script (creates temporary repo)
>>> huggingface-cli jobs uv run my_script.py

# Run with persistent repo
>>> huggingface-cli jobs uv run my_script.py --repo my-uv-scripts

# Run with GPU
>>> huggingface-cli jobs uv run ml_training.py --flavor gpu-t4-small

# Pass arguments to script
>>> huggingface-cli jobs uv run process.py input.csv output.parquet --repo data-scripts

# Run a script directly from a URL
>>> huggingface-cli jobs uv run https://huggingface.co/datasets/username/scripts/resolve/main/example.py
```

UV scripts are Python scripts that include their dependencies directly in the file using a special comment syntax. This makes them perfect for self-contained tasks that don't require complex project setups. Learn more about UV scripts in the [UV documentation](https://docs.astral.sh/uv/guides/scripts/).
220 changes: 220 additions & 0 deletions docs/source/en/guides/jobs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,220 @@
<!--⚠️ Note that this file is in Markdown but contains specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Run and manage Jobs

The Hugging Face Hub provides compute for AI and data workflows via Jobs.
A job runs on Hugging Face infrastructure and are defined with a command to run (e.g. a python command), a Docker Image from Hugging Face Spaces or Docker Hub, and a hardware flavor (CPU, GPU, TPU). This guide will show you how to interact with Jobs on the Hub, especially:

- Run a job.
- Check job status.
- Select the hardware.
- Configure environment variables and secrets.
- Run UV scripts.

If you want to run and manage a job on the Hub, your machine must be logged in. If you are not, please refer to
[this section](../quick-start#authentication). In the rest of this guide, we will assume that your machine is logged in.

## Run a Job

Run compute Jobs defined with a command and a Docker Image on Hugging Face infrastructure (including GPUs and TPUs).

You can only manage Jobs that you own (under your username namespace) or from organizations in which you have write permissions.
This feature is pay-as-you-go: you only pay for the seconds you use.

[`run_job`] lets you run any command on Hugging Face's infrastructure:

```python
# Directly run Python code
>>> from huggingface_hub import run_job
>>> run_job(
... image="python:3.12",
... command=["python", "-c", "print('Hello from the cloud!')"],
... )

# Use GPUs without any setup
>>> run_job(
... image="pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel",
... command=["python", "-c", "import torch; print(torch.cuda.get_device_name())"],
... flavor="a10g-small",
... )

# Run in an organization account
>>> run_job(
... image="python:3.12",
... command=["python", "-c", "print('Running in an org account')"],
... namespace="my-org-name",
... )

# Run from Hugging Face Spaces
>>> run_job(
... image="hf.co/spaces/lhoestq/duckdb",
... command=["duckdb", "-c", "select 'hello world'"],
... )

# Run a Python script with `uv` (experimental)
>>> from huggingface_hub import run_uv_job
>>> run_uv_job("my_script.py")
```

<Tip>

Use [huggingface-cli jobs](./cli#huggingface-cli-jobs) to run jobs in the command line.

</Tip>

[`run_job`] returns the [`JobInfo`] which has the URL of the Job on Hugging Face, where you can see the Job status and the logs.
Save the Job ID from [`JobInfo`] to manage the job:

```python
>>> from huggingface_hub import run_job
>>> job = run_job(
... image="python:3.12",
... command=["python", "-c", "print('Hello from the cloud!')"]
... )
>>> job.url
https://huggingface.co/jobs/lhoestq/687f911eaea852de79c4a50a
>>> job.id
687f911eaea852de79c4a50a
```

Jobs run in the background. The next section guides you through [`inspect_job`] to know a jobs' status and [`fetch_job_logs`] to view the logs.

## Check Job status

```python
# List your jobs
>>> from huggingface_hub import list_jobs
>>> jobs = list_jobs()
>>> jobs[0]
JobInfo(id='687f911eaea852de79c4a50a', created_at=datetime.datetime(2025, 7, 22, 13, 24, 46, 909000, tzinfo=datetime.timezone.utc), docker_image='python:3.12', space_id=None, command=['python', '-c', "print('Hello from the cloud!')"], arguments=[], environment={}, secrets={}, flavor='cpu-basic', status=JobStatus(stage='COMPLETED', message=None), owner=JobOwner(id='5e9ecfc04957053f60648a3e', name='lhoestq'), endpoint='https://huggingface.co', url='https://huggingface.co/jobs/lhoestq/687f911eaea852de79c4a50a')

# List your running jobs
>>> running_jobs = [job for job in list_jobs() if job.status.stage == "RUNNING"]

# Inspect the status of a job
>>> from huggingface_hub import inspect_job
>>> inspect_job(job_id=job_id)
JobInfo(id='687f911eaea852de79c4a50a', created_at=datetime.datetime(2025, 7, 22, 13, 24, 46, 909000, tzinfo=datetime.timezone.utc), docker_image='python:3.12', space_id=None, command=['python', '-c', "print('Hello from the cloud!')"], arguments=[], environment={}, secrets={}, flavor='cpu-basic', status=JobStatus(stage='COMPLETED', message=None), owner=JobOwner(id='5e9ecfc04957053f60648a3e', name='lhoestq'), endpoint='https://huggingface.co', url='https://huggingface.co/jobs/lhoestq/687f911eaea852de79c4a50a')

# View logs from a job
>>> from huggingface_hub import fetch_job_logs
>>> for log in fetch_job_logs(job_id=job_id):
... print(log)
Hello from the cloud!

# Cancel a job
>>> from huggingface_hub import cancel_job
>>> cancel_job(job_id=job_id)
```

Check the status of multiple jobs to know when they're all finished using a loop and [`inspect_job`]:

```python
# Run multiple jobs in parallel and wait for their completions
>>> import time
>>> from huggingface_hub import inspect_job, run_job
>>> jobs = [run_job(image=image, command=command) for command in commands]
>>> for job in jobs:
... while inspect_job(job_id=job.id).status.stage not in ("COMPLETED", "ERROR"):
... time.sleep(10)
```

## Select the hardware

There are numerous cases where running Jobs on GPUs are useful:

- **Model Training**: Fine-tune or train models on GPUs (T4, A10G, A100) without managing infrastructure
- **Synthetic Data Generation**: Generate large-scale datasets using LLMs on powerful hardware
- **Data Processing**: Process massive datasets with high-CPU configurations for parallel workloads
- **Batch Inference**: Run offline inference on thousands of samples using optimized GPU setups
- **Experiments & Benchmarks**: Run ML experiments on consistent hardware for reproducible results
- **Development & Debugging**: Test GPU code without local CUDA setup

Run jobs on GPUs or TPUs with the `flavor` argument. For example, to run a PyTorch job on an A10G GPU:

```python
# Use an A10G GPU to check PyTorch CUDA
>>> from huggingface_hub import run_job
>>> run_job(
... image="pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel",
... command=["python", "-c", "import torch; print(f'This code ran with the following GPU: {torch.cuda.get_device_name()}')"],
... flavor="a10g-small",
... )
```

Running this will show the following output!

```bash
This code ran with the following GPU: NVIDIA A10G
```

Use this to run a fine tuning script like [trl/scripts/sft.py](https://github.com/huggingface/trl/blob/main/trl/scripts/sft.py) with UV:

```python
>>> from huggingface_hub import run_uv_job
>>> run_uv_job(
... "sft.py",
... script_args=["--model_name_or_path", "Qwen/Qwen2-0.5B", ...],
... dependencies=["trl"],
... env={"HF_TOKEN": ...},
... flavor="a10g-small",
... )
```

Available `flavor` options:

- CPU: `cpu-basic`, `cpu-upgrade`
- GPU: `t4-small`, `t4-medium`, `l4x1`, `l4x4`, `a10g-small`, `a10g-large`, `a10g-largex2`, `a10g-largex4`,`a100-large`
- TPU: `v5e-1x1`, `v5e-2x2`, `v5e-2x4`

(updated in 07/2025 from Hugging Face [suggested_hardware docs](https://huggingface.co/docs/hub/en/spaces-config-reference))

That's it! You're now running code on Hugging Face's infrastructure.

## Pass Environment variables and Secrets

You can pass environment variables to your job using `env` and `secrets`:

```python
# Pass environment variables
>>> from huggingface_hub import run_job
>>> run_job(
... image="python:3.12",
... command=["python", "-c", "import os; print(os.environ['FOO'], os.environ['BAR'])"],
... env={"FOO": "foo", "BAR": "bar"},
... )
```


```python
# Pass secrets - they will be encrypted server side
>>> from huggingface_hub import run_job
>>> run_job(
... image="python:3.12",
... command=["python", "-c", "import os; print(os.environ['MY_SECRET'])"],
... secrets={"MY_SECRET": "psswrd"},
... )
```


### UV Scripts (Experimental)

Run UV scripts (Python scripts with inline dependencies) on HF infrastructure:

```python
# Run a UV script (creates temporary repo)
>>> from huggingface_hub import run_uv_job
>>> run_uv_job("my_script.py")

# Run with GPU
>>> run_uv_job("ml_training.py", flavor="gpu-t4-small")

# Run with dependencies
>>> run_uv_job("inference.py", dependencies=["transformers", "torch"])

# Run a script directly from a URL
>>> run_uv_job("https://huggingface.co/datasets/username/scripts/resolve/main/example.py")
```

UV scripts are Python scripts that include their dependencies directly in the file using a special comment syntax. This makes them perfect for self-contained tasks that don't require complex project setups. Learn more about UV scripts in the [UV documentation](https://docs.astral.sh/uv/guides/scripts/).
9 changes: 9 additions & 0 deletions docs/source/en/guides/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,5 +127,14 @@ Take a look at these guides to learn how to use huggingface_hub to solve real-wo
</p>
</a>

<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg"
href="./jobs">
<div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">
Jobs
</div><p class="text-gray-700">
How to run and manage compute Jobs on Hugging Face infrastructure and select the hardware?
</p>
</a>

</div>
</div>
Loading
Loading