Skip to content

Commit 8baed54

Browse files
committed
Update Readme and versions for release 24.09
1 parent ca8ae28 commit 8baed54

File tree

3 files changed

+5
-236
lines changed

3 files changed

+5
-236
lines changed

README.md

Lines changed: 1 addition & 232 deletions
Original file line numberDiff line numberDiff line change
@@ -42,235 +42,4 @@ ___
4242
[![License](https://img.shields.io/badge/License-BSD3-lightgrey.svg)](https://opensource.org/licenses/BSD-3-Clause)
4343

4444
[!WARNING]
45-
46-
##### LATEST RELEASE
47-
You are currently on the `main` branch which tracks under-development progress towards the next release.
48-
The current release is version [2.49.0](https://github.com/triton-inference-server/server/releases/latest) and corresponds to the 24.08 container release on NVIDIA GPU Cloud (NGC).
49-
50-
Triton Inference Server is an open source inference serving software that
51-
streamlines AI inferencing. Triton enables teams to deploy any AI model from
52-
multiple deep learning and machine learning frameworks, including TensorRT,
53-
TensorFlow, PyTorch, ONNX, OpenVINO, Python, RAPIDS FIL, and more. Triton
54-
Inference Server supports inference across cloud, data center, edge and embedded
55-
devices on NVIDIA GPUs, x86 and ARM CPU, or AWS Inferentia. Triton Inference
56-
Server delivers optimized performance for many query types, including real time,
57-
batched, ensembles and audio/video streaming. Triton inference Server is part of
58-
[NVIDIA AI Enterprise](https://www.nvidia.com/en-us/data-center/products/ai-enterprise/),
59-
a software platform that accelerates the data science pipeline and streamlines
60-
the development and deployment of production AI.
61-
62-
Major features include:
63-
64-
- [Supports multiple deep learning
65-
frameworks](https://github.com/triton-inference-server/backend#where-can-i-find-all-the-backends-that-are-available-for-triton)
66-
- [Supports multiple machine learning
67-
frameworks](https://github.com/triton-inference-server/fil_backend)
68-
- [Concurrent model
69-
execution](docs/user_guide/architecture.md#concurrent-model-execution)
70-
- [Dynamic batching](docs/user_guide/model_configuration.md#dynamic-batcher)
71-
- [Sequence batching](docs/user_guide/model_configuration.md#sequence-batcher) and
72-
[implicit state management](docs/user_guide/architecture.md#implicit-state-management)
73-
for stateful models
74-
- Provides [Backend API](https://github.com/triton-inference-server/backend) that
75-
allows adding custom backends and pre/post processing operations
76-
- Supports writing custom backends in python, a.k.a.
77-
[Python-based backends.](https://github.com/triton-inference-server/backend/blob/main/docs/python_based_backends.md#python-based-backends)
78-
- Model pipelines using
79-
[Ensembling](docs/user_guide/architecture.md#ensemble-models) or [Business
80-
Logic Scripting
81-
(BLS)](https://github.com/triton-inference-server/python_backend#business-logic-scripting)
82-
- [HTTP/REST and GRPC inference
83-
protocols](docs/customization_guide/inference_protocols.md) based on the community
84-
developed [KServe
85-
protocol](https://github.com/kserve/kserve/tree/master/docs/predict-api/v2)
86-
- A [C API](docs/customization_guide/inference_protocols.md#in-process-triton-server-api) and
87-
[Java API](docs/customization_guide/inference_protocols.md#java-bindings-for-in-process-triton-server-api)
88-
allow Triton to link directly into your application for edge and other in-process use cases
89-
- [Metrics](docs/user_guide/metrics.md) indicating GPU utilization, server
90-
throughput, server latency, and more
91-
92-
**New to Triton Inference Server?** Make use of
93-
[these tutorials](https://github.com/triton-inference-server/tutorials)
94-
to begin your Triton journey!
95-
96-
Join the [Triton and TensorRT community](https://www.nvidia.com/en-us/deep-learning-ai/triton-tensorrt-newsletter/) and
97-
stay current on the latest product updates, bug fixes, content, best practices,
98-
and more. Need enterprise support? NVIDIA global support is available for Triton
99-
Inference Server with the
100-
[NVIDIA AI Enterprise software suite](https://www.nvidia.com/en-us/data-center/products/ai-enterprise/).
101-
102-
## Serve a Model in 3 Easy Steps
103-
104-
```bash
105-
# Step 1: Create the example model repository
106-
git clone -b r24.08 https://github.com/triton-inference-server/server.git
107-
cd server/docs/examples
108-
./fetch_models.sh
109-
110-
# Step 2: Launch triton from the NGC Triton container
111-
docker run --gpus=1 --rm --net=host -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:24.08-py3 tritonserver --model-repository=/models
112-
113-
# Step 3: Sending an Inference Request
114-
# In a separate console, launch the image_client example from the NGC Triton SDK container
115-
docker run -it --rm --net=host nvcr.io/nvidia/tritonserver:24.08-py3-sdk
116-
/workspace/install/bin/image_client -m densenet_onnx -c 3 -s INCEPTION /workspace/images/mug.jpg
117-
118-
# Inference should return the following
119-
Image '/workspace/images/mug.jpg':
120-
15.346230 (504) = COFFEE MUG
121-
13.224326 (968) = CUP
122-
10.422965 (505) = COFFEEPOT
123-
```
124-
Please read the [QuickStart](docs/getting_started/quickstart.md) guide for additional information
125-
regarding this example. The quickstart guide also contains an example of how to launch Triton on [CPU-only systems](docs/getting_started/quickstart.md#run-on-cpu-only-system). New to Triton and wondering where to get started? Watch the [Getting Started video](https://youtu.be/NQDtfSi5QF4).
126-
127-
## Examples and Tutorials
128-
129-
Check out [NVIDIA LaunchPad](https://www.nvidia.com/en-us/data-center/products/ai-enterprise-suite/trial/)
130-
for free access to a set of hands-on labs with Triton Inference Server hosted on
131-
NVIDIA infrastructure.
132-
133-
Specific end-to-end examples for popular models, such as ResNet, BERT, and DLRM
134-
are located in the
135-
[NVIDIA Deep Learning Examples](https://github.com/NVIDIA/DeepLearningExamples)
136-
page on GitHub. The
137-
[NVIDIA Developer Zone](https://developer.nvidia.com/nvidia-triton-inference-server)
138-
contains additional documentation, presentations, and examples.
139-
140-
## Documentation
141-
142-
### Build and Deploy
143-
144-
The recommended way to build and use Triton Inference Server is with Docker
145-
images.
146-
147-
- [Install Triton Inference Server with Docker containers](docs/customization_guide/build.md#building-with-docker) (*Recommended*)
148-
- [Install Triton Inference Server without Docker containers](docs/customization_guide/build.md#building-without-docker)
149-
- [Build a custom Triton Inference Server Docker container](docs/customization_guide/compose.md)
150-
- [Build Triton Inference Server from source](docs/customization_guide/build.md#building-on-unsupported-platforms)
151-
- [Build Triton Inference Server for Windows 10](docs/customization_guide/build.md#building-for-windows-10)
152-
- Examples for deploying Triton Inference Server with Kubernetes and Helm on [GCP](deploy/gcp/README.md),
153-
[AWS](deploy/aws/README.md), and [NVIDIA FleetCommand](deploy/fleetcommand/README.md)
154-
- [Secure Deployment Considerations](docs/customization_guide/deploy.md)
155-
156-
### Using Triton
157-
158-
#### Preparing Models for Triton Inference Server
159-
160-
The first step in using Triton to serve your models is to place one or
161-
more models into a [model repository](docs/user_guide/model_repository.md). Depending on
162-
the type of the model and on what Triton capabilities you want to enable for
163-
the model, you may need to create a [model
164-
configuration](docs/user_guide/model_configuration.md) for the model.
165-
166-
- [Add custom operations to Triton if needed by your model](docs/user_guide/custom_operations.md)
167-
- Enable model pipelining with [Model Ensemble](docs/user_guide/architecture.md#ensemble-models)
168-
and [Business Logic Scripting (BLS)](https://github.com/triton-inference-server/python_backend#business-logic-scripting)
169-
- Optimize your models setting [scheduling and batching](docs/user_guide/architecture.md#models-and-schedulers)
170-
parameters and [model instances](docs/user_guide/model_configuration.md#instance-groups).
171-
- Use the [Model Analyzer tool](https://github.com/triton-inference-server/model_analyzer)
172-
to help optimize your model configuration with profiling
173-
- Learn how to [explicitly manage what models are available by loading and
174-
unloading models](docs/user_guide/model_management.md)
175-
176-
#### Configure and Use Triton Inference Server
177-
178-
- Read the [Quick Start Guide](docs/getting_started/quickstart.md) to run Triton Inference
179-
Server on both GPU and CPU
180-
- Triton supports multiple execution engines, called
181-
[backends](https://github.com/triton-inference-server/backend#where-can-i-find-all-the-backends-that-are-available-for-triton), including
182-
[TensorRT](https://github.com/triton-inference-server/tensorrt_backend),
183-
[TensorFlow](https://github.com/triton-inference-server/tensorflow_backend),
184-
[PyTorch](https://github.com/triton-inference-server/pytorch_backend),
185-
[ONNX](https://github.com/triton-inference-server/onnxruntime_backend),
186-
[OpenVINO](https://github.com/triton-inference-server/openvino_backend),
187-
[Python](https://github.com/triton-inference-server/python_backend), and more
188-
- Not all the above backends are supported on every platform supported by Triton.
189-
Look at the
190-
[Backend-Platform Support Matrix](https://github.com/triton-inference-server/backend/blob/main/docs/backend_platform_support_matrix.md)
191-
to learn which backends are supported on your target platform.
192-
- Learn how to [optimize performance](docs/user_guide/optimization.md) using the
193-
[Performance Analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md)
194-
and
195-
[Model Analyzer](https://github.com/triton-inference-server/model_analyzer)
196-
- Learn how to [manage loading and unloading models](docs/user_guide/model_management.md) in
197-
Triton
198-
- Send requests directly to Triton with the [HTTP/REST JSON-based
199-
or gRPC protocols](docs/customization_guide/inference_protocols.md#httprest-and-grpc-protocols)
200-
201-
#### Client Support and Examples
202-
203-
A Triton *client* application sends inference and other requests to Triton. The
204-
[Python and C++ client libraries](https://github.com/triton-inference-server/client)
205-
provide APIs to simplify this communication.
206-
207-
- Review client examples for [C++](https://github.com/triton-inference-server/client/blob/main/src/c%2B%2B/examples),
208-
[Python](https://github.com/triton-inference-server/client/blob/main/src/python/examples),
209-
and [Java](https://github.com/triton-inference-server/client/blob/main/src/java/src/main/java/triton/client/examples)
210-
- Configure [HTTP](https://github.com/triton-inference-server/client#http-options)
211-
and [gRPC](https://github.com/triton-inference-server/client#grpc-options)
212-
client options
213-
- Send input data (e.g. a jpeg image) directly to Triton in the [body of an HTTP
214-
request without any additional metadata](https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_binary_data.md#raw-binary-request)
215-
216-
### Extend Triton
217-
218-
[Triton Inference Server's architecture](docs/user_guide/architecture.md) is specifically
219-
designed for modularity and flexibility
220-
221-
- [Customize Triton Inference Server container](docs/customization_guide/compose.md) for your use case
222-
- [Create custom backends](https://github.com/triton-inference-server/backend)
223-
in either [C/C++](https://github.com/triton-inference-server/backend/blob/main/README.md#triton-backend-api)
224-
or [Python](https://github.com/triton-inference-server/python_backend)
225-
- Create [decoupled backends and models](docs/user_guide/decoupled_models.md) that can send
226-
multiple responses for a request or not send any responses for a request
227-
- Use a [Triton repository agent](docs/customization_guide/repository_agents.md) to add functionality
228-
that operates when a model is loaded and unloaded, such as authentication,
229-
decryption, or conversion
230-
- Deploy Triton on [Jetson and JetPack](docs/user_guide/jetson.md)
231-
- [Use Triton on AWS
232-
Inferentia](https://github.com/triton-inference-server/python_backend/tree/main/inferentia)
233-
234-
### Additional Documentation
235-
236-
- [FAQ](docs/user_guide/faq.md)
237-
- [User Guide](docs/README.md#user-guide)
238-
- [Customization Guide](docs/README.md#customization-guide)
239-
- [Release Notes](https://docs.nvidia.com/deeplearning/triton-inference-server/release-notes/index.html)
240-
- [GPU, Driver, and CUDA Support
241-
Matrix](https://docs.nvidia.com/deeplearning/dgx/support-matrix/index.html)
242-
243-
## Contributing
244-
245-
Contributions to Triton Inference Server are more than welcome. To
246-
contribute please review the [contribution
247-
guidelines](CONTRIBUTING.md). If you have a backend, client,
248-
example or similar contribution that is not modifying the core of
249-
Triton, then you should file a PR in the [contrib
250-
repo](https://github.com/triton-inference-server/contrib).
251-
252-
## Reporting problems, asking questions
253-
254-
We appreciate any feedback, questions or bug reporting regarding this project.
255-
When posting [issues in GitHub](https://github.com/triton-inference-server/server/issues),
256-
follow the process outlined in the [Stack Overflow document](https://stackoverflow.com/help/mcve).
257-
Ensure posted examples are:
258-
- minimal – use as little code as possible that still produces the
259-
same problem
260-
- complete – provide all parts needed to reproduce the problem. Check
261-
if you can strip external dependencies and still show the problem. The
262-
less time we spend on reproducing problems the more time we have to
263-
fix it
264-
- verifiable – test the code you're about to provide to make sure it
265-
reproduces the problem. Remove all other problems that are not
266-
related to your request/question.
267-
268-
For issues, please use the provided bug report and feature request templates.
269-
270-
For questions, we recommend posting in our community
271-
[GitHub Discussions.](https://github.com/triton-inference-server/server/discussions)
272-
273-
## For more information
274-
275-
Please refer to the [NVIDIA Developer Triton page](https://developer.nvidia.com/nvidia-triton-inference-server)
276-
for more information.
45+
> You are currently on the `r24.09` branch which tracks under-development progress towards the next release. <br>

TRITON_VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
2.50.0dev
1+
2.50.0

build.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -71,9 +71,9 @@
7171
#
7272
TRITON_VERSION_MAP = {
7373
"2.50.0dev": (
74-
"24.09dev", # triton container
75-
"24.08", # upstream container
76-
"1.18.1", # ORT
74+
"24.09", # triton container
75+
"24.09", # upstream container
76+
"1.19.2", # ORT
7777
"2024.0.0", # ORT OpenVINO
7878
"2024.0.0", # Standalone OpenVINO
7979
"3.2.6", # DCGM version

0 commit comments

Comments
 (0)