Many cleanups/fixes/improvements (#1837)

julien-c · Wauplin · Vaibhavs10 · web-flow · commit 9fb8835df7e5 · 2025-07-15T16:13:07.000+02:00
* missing word * billing: this is not the case anymore * only 2 are listed? * not true anymore either 😱 * mention the current focus of hf-inference * mention org settings here too TODO: add a thumbnail of team usage * minor tweaks * better more precise name * link to @SBrandeis's table * Update docs/inference-providers/guides/function-calling.md Co-authored-by: Lucain <lucain@huggingface.co> * Update docs/inference-providers/guides/function-calling.md Co-authored-by: vb <vaibhavs10@gmail.com> --------- Co-authored-by: Lucain <lucain@huggingface.co> Co-authored-by: vb <vaibhavs10@gmail.com>
diff --git a/docs/inference-providers/guides/function-calling.md b/docs/inference-providers/guides/function-calling.md
@@ -37,7 +37,7 @@ client = OpenAI(
 </hfoption>
 <hfoption id="huggingface_hub">
 
-In the Hugging Face Hub client, we'll use the `provider` parameter to specify the provider we want to use for the request.
+In the Hugging Face Hub client, we'll use the `provider` parameter to specify the provider we want to use for the request. By default, it is `"auto"`.
 
 ```python
 import json
@@ -400,7 +400,7 @@ Here, we're forcing the model to call the `get_current_weather` function, and no
 
 <Tip warning={true}>
 
-Currently, Hugging Face Hub does not support the `tool_choice` parameters that specify which function to call.
+Currently, `huggingface_hub.InferenceClient` does not support the `tool_choice` parameters that specify which function to call.
 
 </Tip>
 
@@ -471,13 +471,13 @@ Streaming allows you to process responses as they arrive, show real-time progres
 
 <Tip warning={true}>
 
-Streaming is not supported by all providers. You can check the provider's documentation to see if it supports streaming.
+Streaming is not supported by all providers. You can check the provider's documentation to see if it supports streaming, or you can refer to this [dynamic model compatibility table](https://huggingface.co/inference-providers/models).
 
 </Tip>
 
 ## Next Steps
 
-Now that you've seen how to use function calling with Inference Providers, you can start building your own assistants! Why not try out some of these ideas:
+Now that you've seen how to use function calling with Inference Providers, you can start building your own agents and assistants! Why not try out some of these ideas:
 
 - Try smaller models for faster responses and lower costs
 - Build an agent that can fetch real-time data 
diff --git a/docs/inference-providers/hub-integration.md b/docs/inference-providers/hub-integration.md
@@ -65,3 +65,12 @@ In your user account settings, you are able to:
     <img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/inference-providers/provider-list-light.png"/>
     <img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/inference-providers/provider-list-dark.png"/>
 </div>
+
+## Organization Settings
+
+Similar settings can be found in your Organization settings. Additionally, you can see a graph of your team member's usage over time, helpful to centralize usage billing at the team level.
+
+<div class="flex justify-center">
+    <img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/inference-providers/enterprise-org-settings-light.png"/>
+    <img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/inference-providers/enterprise-org-settings-dark.png"/>
+</div>
diff --git a/docs/inference-providers/index.md b/docs/inference-providers/index.md
@@ -5,7 +5,7 @@
     <img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/inference-providers/Inference-providers-banner-dark.png"/>
 </div>
 
-Hugging Face’s Inference Providers give developers access to hundreds of machine learning models, powered by world-class inference providers. They are also integrated into our client SDKs (for JS and Python), making it easy to explore serverless inference of models your favorite providers.
+Hugging Face’s Inference Providers give developers access to hundreds of machine learning models, powered by world-class inference providers. They are also integrated into our client SDKs (for JS and Python), making it easy to explore serverless inference of models on your favorite providers.
 
 ## Partners
 
diff --git a/docs/inference-providers/pricing.md b/docs/inference-providers/pricing.md
@@ -44,8 +44,6 @@ See the [Organization Billing section](#organization-billing) below for more det
 **PRO users and Enterprise Hub organizations** can continue using the API after exhausting their monthly credits. This ensures uninterrupted access to models for production workloads.
 
 
-If you have remaining credits, we estimate costs for providers that aren’t fully integrated with our billing system. These estimates are usually higher than the actual cost to prevent abuse, which is why PAYG is currently disabled for those providers.
-
 <Tip>
 
 Hugging Face charges you the same rates as the provider, with no additional fees. We just pass through the provider costs directly.
@@ -56,7 +54,7 @@ You can track your spending anytime on your [billing page](https://huggingface.c
 
 ## Hugging Face Billing vs Custom Provider Key (Detailed Comparison)
 
-The documentation above assumes you are making routed requests to external providers. In practice, there are 3 different ways to run inference, each with unique billing implications:
+The documentation above assumes you are making routed requests to external providers. In practice, there are 2 different ways to run inference, each with unique billing implications:
 
 - **Hugging Face Routed Requests**: This is the default method for using Inference Providers. Simply use the JavaScript or Python `InferenceClient`, or make raw HTTP requests with your Hugging Face User Access Token. Your request is automatically routed through Hugging Face to the provider's platform. No separate provider account is required, and billing is managed directly by Hugging Face. This approach lets you seamlessly switch between providers without additional setup.
 
@@ -81,7 +79,7 @@ As you may have noticed, you can select to work with `"hf-inference"` provider.
 
 For instance, a request to [black-forest-labs/FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) that takes 10 seconds to complete on a GPU machine that costs $0.00012 per second to run, will be billed $0.0012.
 
-The `"hf-inference"` provider is currently the default provider when working with the JavaScript and Python SDKs. Note that this default might change in the future.
+As of July 2025, hf-inference focuses mostly on CPU inference (e.g. embedding, text-ranking, text-classification, or smaller LLMs that have historical importance like BERT or GPT-2).
 
 ## Billing for Team and Enterprise organizations
 
diff --git a/scripts/inference-providers/templates/providers/hf-inference.handlebars b/scripts/inference-providers/templates/providers/hf-inference.handlebars
@@ -13,4 +13,6 @@ All supported HF Inference models can be found [here](https://huggingface.co/mod
 HF Inference is the serverless Inference API powered by Hugging Face. This service used to be called "Inference API (serverless)" prior to Inference Providers.
 If you are interested in deploying models to a dedicated and autoscaling infrastructure managed by Hugging Face, check out [Inference Endpoints](https://huggingface.co/docs/inference-endpoints/index) instead.
 
+As of July 2025, hf-inference focuses mostly on CPU inference (e.g. embedding, text-ranking, text-classification, or smaller LLMs that have historical importance like BERT or GPT-2).
+
 {{{tasksSection}}}