Skip to content

Commit 9fb8835

Browse files
julien-cWauplinVaibhavs10
authored
Many cleanups/fixes/improvements (#1837)
* missing word * billing: this is not the case anymore * only 2 are listed? * not true anymore either 😱 * mention the current focus of hf-inference * mention org settings here too TODO: add a thumbnail of team usage * minor tweaks * better more precise name * link to @SBrandeis's table * Update docs/inference-providers/guides/function-calling.md Co-authored-by: Lucain <lucain@huggingface.co> * Update docs/inference-providers/guides/function-calling.md Co-authored-by: vb <vaibhavs10@gmail.com> --------- Co-authored-by: Lucain <lucain@huggingface.co> Co-authored-by: vb <vaibhavs10@gmail.com>
1 parent 0cb9a5c commit 9fb8835

File tree

5 files changed

+18
-9
lines changed

5 files changed

+18
-9
lines changed

docs/inference-providers/guides/function-calling.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ client = OpenAI(
3737
</hfoption>
3838
<hfoption id="huggingface_hub">
3939

40-
In the Hugging Face Hub client, we'll use the `provider` parameter to specify the provider we want to use for the request.
40+
In the Hugging Face Hub client, we'll use the `provider` parameter to specify the provider we want to use for the request. By default, it is `"auto"`.
4141

4242
```python
4343
import json
@@ -400,7 +400,7 @@ Here, we're forcing the model to call the `get_current_weather` function, and no
400400

401401
<Tip warning={true}>
402402

403-
Currently, Hugging Face Hub does not support the `tool_choice` parameters that specify which function to call.
403+
Currently, `huggingface_hub.InferenceClient` does not support the `tool_choice` parameters that specify which function to call.
404404

405405
</Tip>
406406

@@ -471,13 +471,13 @@ Streaming allows you to process responses as they arrive, show real-time progres
471471

472472
<Tip warning={true}>
473473

474-
Streaming is not supported by all providers. You can check the provider's documentation to see if it supports streaming.
474+
Streaming is not supported by all providers. You can check the provider's documentation to see if it supports streaming, or you can refer to this [dynamic model compatibility table](https://huggingface.co/inference-providers/models).
475475

476476
</Tip>
477477

478478
## Next Steps
479479

480-
Now that you've seen how to use function calling with Inference Providers, you can start building your own assistants! Why not try out some of these ideas:
480+
Now that you've seen how to use function calling with Inference Providers, you can start building your own agents and assistants! Why not try out some of these ideas:
481481

482482
- Try smaller models for faster responses and lower costs
483483
- Build an agent that can fetch real-time data

docs/inference-providers/hub-integration.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,3 +65,12 @@ In your user account settings, you are able to:
6565
<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/inference-providers/provider-list-light.png"/>
6666
<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/inference-providers/provider-list-dark.png"/>
6767
</div>
68+
69+
## Organization Settings
70+
71+
Similar settings can be found in your Organization settings. Additionally, you can see a graph of your team member's usage over time, helpful to centralize usage billing at the team level.
72+
73+
<div class="flex justify-center">
74+
<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/inference-providers/enterprise-org-settings-light.png"/>
75+
<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/inference-providers/enterprise-org-settings-dark.png"/>
76+
</div>

docs/inference-providers/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/inference-providers/Inference-providers-banner-dark.png"/>
66
</div>
77

8-
Hugging Face’s Inference Providers give developers access to hundreds of machine learning models, powered by world-class inference providers. They are also integrated into our client SDKs (for JS and Python), making it easy to explore serverless inference of models your favorite providers.
8+
Hugging Face’s Inference Providers give developers access to hundreds of machine learning models, powered by world-class inference providers. They are also integrated into our client SDKs (for JS and Python), making it easy to explore serverless inference of models on your favorite providers.
99

1010
## Partners
1111

docs/inference-providers/pricing.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -44,8 +44,6 @@ See the [Organization Billing section](#organization-billing) below for more det
4444
**PRO users and Enterprise Hub organizations** can continue using the API after exhausting their monthly credits. This ensures uninterrupted access to models for production workloads.
4545

4646

47-
If you have remaining credits, we estimate costs for providers that aren’t fully integrated with our billing system. These estimates are usually higher than the actual cost to prevent abuse, which is why PAYG is currently disabled for those providers.
48-
4947
<Tip>
5048

5149
Hugging Face charges you the same rates as the provider, with no additional fees. We just pass through the provider costs directly.
@@ -56,7 +54,7 @@ You can track your spending anytime on your [billing page](https://huggingface.c
5654

5755
## Hugging Face Billing vs Custom Provider Key (Detailed Comparison)
5856

59-
The documentation above assumes you are making routed requests to external providers. In practice, there are 3 different ways to run inference, each with unique billing implications:
57+
The documentation above assumes you are making routed requests to external providers. In practice, there are 2 different ways to run inference, each with unique billing implications:
6058

6159
- **Hugging Face Routed Requests**: This is the default method for using Inference Providers. Simply use the JavaScript or Python `InferenceClient`, or make raw HTTP requests with your Hugging Face User Access Token. Your request is automatically routed through Hugging Face to the provider's platform. No separate provider account is required, and billing is managed directly by Hugging Face. This approach lets you seamlessly switch between providers without additional setup.
6260

@@ -81,7 +79,7 @@ As you may have noticed, you can select to work with `"hf-inference"` provider.
8179

8280
For instance, a request to [black-forest-labs/FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) that takes 10 seconds to complete on a GPU machine that costs $0.00012 per second to run, will be billed $0.0012.
8381

84-
The `"hf-inference"` provider is currently the default provider when working with the JavaScript and Python SDKs. Note that this default might change in the future.
82+
As of July 2025, hf-inference focuses mostly on CPU inference (e.g. embedding, text-ranking, text-classification, or smaller LLMs that have historical importance like BERT or GPT-2).
8583

8684
## Billing for Team and Enterprise organizations
8785

scripts/inference-providers/templates/providers/hf-inference.handlebars

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,4 +13,6 @@ All supported HF Inference models can be found [here](https://huggingface.co/mod
1313
HF Inference is the serverless Inference API powered by Hugging Face. This service used to be called "Inference API (serverless)" prior to Inference Providers.
1414
If you are interested in deploying models to a dedicated and autoscaling infrastructure managed by Hugging Face, check out [Inference Endpoints](https://huggingface.co/docs/inference-endpoints/index) instead.
1515

16+
As of July 2025, hf-inference focuses mostly on CPU inference (e.g. embedding, text-ranking, text-classification, or smaller LLMs that have historical importance like BERT or GPT-2).
17+
1618
{{{tasksSection}}}

0 commit comments

Comments
 (0)