Skip to content

wandb/data-flywheel-nvidia

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Data Flywheel Foundational Blueprint with Weights & Biases Weave

Deploy this blueprint to create a production-grade autonomous Data Flywheel service that uses the NeMo microservices to continuously discover and promote more efficient models.

W&B Weave Integration for the Data Flywheel

Weights & Biases has augmented the Data Flywheel Blueprint to give you instant tracing and visibility into your production AI systems, as well as a platform to iterate on them to bring them to production. The following services have been enabled with W&B Weave:

  • Production Traffic Monitoring
  • Model Evaluation Tracking
  • Fine-tuning Experiment Management
  • Model Artifact Versioning
  • Dataset Serialization and Versioning

This provides you with the data to be successful in your implementation of this Flywheel blueprint all the way to production. The integration is activated when setting WANDB_API_KEY when deploying the blueprint.

NOTE:

  • By default, tracing is logged to your default team in W&B, under project data-flywheel. You can override this by setting WANDB_PROJECT as well.
  • Ensure to set WANDB_API_KEY and WANDB_PROJECT in your environment configuration.
  • Model artifacts (including fine-tuned models, LoRA adapters, and evaluation results) are automatically logged to W&B for versioning and tracking.
  • Datasets created from Weave traces are serialized and versioned in W&B, enabling reproducible experiments and easy dataset sharing across teams.

Data Flywheels are a fledgling concept in GenerativeAI, but already real-world tests within NVIDIA have identified instances where using a flywheel can reduce inference costs by up to 98.6%. There are caveats to this which we discuss below, but we believe these early data points warrant attention.

The purpose of this Blueprint is two-fold:

  1. To provide a production-grade reference implementation of a Data Flywheel on top of the NeMo microservices.
  2. To educate the community on what Data Flywheels are: what they can do, what they can't do, and what to expect when building them.

You can get started quickly and achieve similar results using your own infrastructure by following the Quickstart guide.

What is a Data Flywheel?

A data flywheel is a process that uses data exhaust from production applications (for example, LLM prompt/response logs, end user feedback, and expert labeling) to increase the overall accuracy and reduce the latency/cost of Generative AI systems. A very high level flow looks like this:

flowchart TD
  app[Your App] --prompts/responses/feedback--> logs[Weights & Biases Weave]
  logs --Create Datasets--> orch["Orchestrator"]
  orch --> exp1["Exp #1"]
  orch --> exp2["Exp #2"]
  orch --> expN["Exp #N"]
  exp1 --> results[Weights & Biases Models]
  exp2 --> results
  expN --> results
Loading

Production traffic from your application is routed to Weights & Biases Weave for comprehensive logging and monitoring. From there, evaluation and fine-tuning datasets are created and used in a series of offline experiments, with results tracked in Weights & Biases Models. The system automatically:

  • Logs model artifacts (including fine-tuned models and LoRA adapters) to W&B
  • Serializes and versions datasets created from Weave traces
  • Tracks experiment metadata and evaluation results
  • Links evaluation runs to model artifacts for complete lineage tracking

Anyone who has done this manually knows there are a lot of options to consider when designing the experiments, and many of them depend on the kinds of data you have:

  • Which models do you pick?
  • In what ways do you curate your datasets?
  • Which fine-tuning techniques do you try?
  • Which metrics do you use in evals?

It's a lot to decide on. Enter: The NeMo microservice.

Where the NeMo microservices Comes In

The NeMo microservices allows for programmatic control of datasets, fine-tuning, evaluation, and inference. This means that rather than having ML engineers manage each experiment, you can automate the exploration of various configurations using sensible defaults, and then present the most promising candidates to a research engineer or machine learning engineer for further analysis.

flowchart TD

app["Your application"] --Prompt/completion logs--> log_store["Weights & Biases Weave"]
log_store --Datasets--> datasets["NeMo Datastore"]
datasets --"Fine-tuning datasets"--> customizer["NeMo Customizer"]
datasets --"Eval datasets"--> evaluator["NeMo Evaluator"]

subgraph NIMs["Loop across ALL NIMs"]
  customizer --"Customized model"--> NIM
  evaluator --> NIM
  NIM --> evaluator
end

evaluator --> results["Weights & Biases Models"]
Loading

In just a few hours, this automated process built on top of NeMo microservices can:

  1. Pull data from your Weights & Biases Weave logs.
  2. Group it by task (for example, if you have an agent doing multiple things, each node is a different task).
  3. De-dup it.
  4. Create eval and fine-tuning datasets from your production traffic and store them in NeMo Datastore.
  5. Kick off fine-tuning jobs with NeMo Customizer.
  6. Run evaluations with LLM-as-judge comparisons on NeMo Evaluator and save the results to Weights & Biases for visualization and analysis.

With reasonable defaults, the system automatically narrows a vast number of possible options down to a manageable set of promising candidates for further analysis—-no manual experiment design required.

👆 This is the key insight of the Data Flywheel Blueprint built on top of the NeMo microservices.

You can scale this process across any number of NIMs by using NeMo Deployment Manager to dynamically start and stop NIMs as needed, so you don't have to keep every NIM running at all times. This cycle can be repeated as frequently as desired: daily, weekly, or on your own schedule.

How to Use This Blueprint

The strategies in this reference implementation have proven effective in some use cases, but our work is ongoing. Some aspects of this implementation may go against common wisdom:

  • Routing production traffic directly into fine-tuning without PII removal
  • Building evaluation datasets with no ground truth other than what the production model is responding with
  • Not doing any hand-labeling of data

Nonetheless, we've shown it can work. And more importantly, we believe this idea of collecting production traffic prompt/response logs, running automated experiments to explore a huge solution space, and then flagging interesting candidates for further analysis will become an indispensable part of the future GenerativeAI stack.

Therefore, to effectively use this blueprint:

  1. Learn from the reference implementation

    • Play with the Launchable: Walk through setting up NeMo microservices, deploying the reference services, and exercising the flywheel with the provided sample dataset.
    • Read the code & docs: Finish this README and skim the source to understand how the API layer, background tasks, and NeMo microservices integrations fit together.
  2. Prepare your traffic

    • Instrument production applications: Every distinct LLM call (agent node, route, tool, etc.) must emit a stable workload_id. Optionally include a free-form description string—ignored by inference but useful for future workload classification.
    • Export or connect your logs: If you're already logging prompt/response pairs, write a small connector (or use an existing one) to push them into Weights & Biases Weave or directly into the Flywheel API.
  3. Choose how to run the Flywheel

    • Always-on service: Keep the stack running in a shared k8s/VM environment so new traffic is evaluated continuously.
    • Ad-hoc run: Spin it up locally on a workstation with a few GPUs, load a slice of traffic, kick off a job, and shut everything down when the results are in.
  4. Kick off a run

    • Load or stream the tagged traffic into Weights & Biases Weave to launch a job. The system will spin up the necessary NIMs, schedule evaluations and fine-tunes, and track everything automatically.
  5. Interpret the results

    • The response is grouped by NIM. For each NIM the Flywheel currently runs three experiment types: • base – raw production prompts replayed. • icl – few-shot prompts built from random traffic examples. • customized – a LoRA fine-tune evaluated with the base prompts.
    • Scores are an LLM-as-judge similarity metric in the range [0, 1]. Look for high-scoring small models, then: • Review the linked evaluation runs in W&B to understand model performance • Use W&B's visualization tools to compare different model versions and their metrics
  6. Keep the human in the loop

    • Think of the Flywheel as a flashlight, not an autopilot. Promotion to production—as well as any deeper evaluation or dataset curation—remains a human decision.
  7. Stay up to date

    • Follow the repo for UI improvements, new experiment strategies, and future work on accuracy-oriented flywheels.

Preparing your data

The Flywheel treats your production prompt / completion logs as the single source of truth. At run-time it only needs to know where to find the logs (e.g. Weights & Biases Weave) and how the individual documents are shaped. Since this is a reference implementation, you can modify the code to suit your needs, but the current schemas are defined below should you decide to conform to them.

1 – Log schema

Each log entry document must contain the following top-level keys:

Field Type Description
timestamp int (epoch secs) Time the request was issued
workload_id str Stable identifier for the logical task / route / agent node
client_id str Identifier of the application or deployment that generated traffic
request dict Exact openai.ChatCompletion.create payload received by the model
response dict Exact ChatCompletion response returned by the model

A minimal example document therefore looks like:

{
  "timestamp": 1715854074,
  "workload_id": "tool_call_routing",
  "client_id": "acme-prod",
  "request": {
    "model": "llama-3-70b-instruct",
    "messages": [
      {"role": "user", "content": "What approvals do I need for paternity leave?"}
    ],
    "temperature": 0.2,
    "max_tokens": 1024
  },
  "response": {
    "id": "chatcmpl-abc123",
    "object": "chat.completion",
    "created": 1715854074,
    "model": "llama-3-70b-instruct",
    "choices": [
      {
        "index": 0,
        "message": {
          "role": "assistant",
          "content": "According to the HR policy …"
        },
        "finish_reason": "stop"
      }
    ],
    "usage": {"prompt_tokens": 15, "completion_tokens": 72, "total_tokens": 87}
  }
}

Why this shape? Keeping the full request/response allows the Flywheel to replay prompts, build few-shot demonstrations, and fine-tune without lossy conversions.

Tagging mattersclient_id is meant to identify who produced the traffic (for example a specific micro-service, environment, or customer). Multiple workloads can share the same client_id. workload_id is much stricter: it represents a single type of request. If your application is an agent with several nodes you must assign a different workload_id to every node so the Flywheel can evaluate them independently. Treat it as the primary key for slicing, deduplication, and dataset creation.

2 – Instrumenting an application

If you already write request/response logs, you can either route that traffic to Weights & Biases Weave using the Weave client or bulk import them using the provided helper scripts. For new projects the snippet below shows how a synchronous OpenAI call can be wrapped so every interaction is logged in the expected format:

# examples/log_to_weave.py
import os, time
import weave
from openai import OpenAI
from weave.trace.context.weave_client_context import get_weave_client

# Initialize Weights & Biases Weave client
weave.init(project_name="your-project-name")

openai_client = OpenAI()

CLIENT_ID = "my_demo_app"

# Example agent nodes (each with its own workload_id)
WORKLOADS = {
    "simple_chat": "agent.chat",
    "tool_router": "agent.tool_router",
}

# This is tracks all the chat interactions
@weave.op()
def log_chat(workload_id: str, messages: list[dict], model: str = "gpt-3.5-turbo", temperature: float = 0.3, max_tokens: int = 1024):
    # 1) call the LLM
    response = openai_client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=temperature,
        max_tokens=max_tokens,
    )

    # 2) build the document
    doc = {
        "timestamp": int(time.time()),
        "workload_id": workload_id,
        "client_id": CLIENT_ID,
        "request": {
            "model": response.model,
            "messages": messages,
            "temperature": temperature,
            "max_tokens": max_tokens,
        },
        "response": response.model_dump(),  # OpenAI python-sdk v1 returns a pydantic model
    }

    # 3) The weave.op decorator automatically logs the document to Weave
    return doc

# --- Example usage -----------------------------------------------------------
messages_chat = [{"role": "user", "content": "Hello!"}]
log_chat(WORKLOADS["simple_chat"], messages_chat)

messages_tool = [
    {"role": "user", "content": "Who won the 2024 Super Bowl?"},
    {
        "role": "system",
        "content": "You are a router that decides whether to call the Wikipedia tool or answer directly.",
    },
]
log_chat(WORKLOADS["tool_router"], messages_tool)

💡 Streaming responses: the OpenAI SDK delivers tokens incrementally in streaming mode. If you are using streaming mode in your clients, you will need to take that into account and either buffer the stream and reconstruct a full response object before logging, or modify the Flywheel importer to reconstruct the full response.

3 – Import helpers and customization

The reference implementation already bundles helpers you can reuse:

  • src/scripts/load_test_data_weave.py – CLI to bulk-load a JSONL file into the Weights & Biases Weave.
  • src/tasks/tasks.py::create_datasets – Celery task that reads logs, deduplicates them, and turns them into evaluation & training datasets.

Feel free to adjust the logging integration to suit your needs – just make sure create_datasets can still retrieve documents in the schema above.

Real-World Results and What to Expect

The Data Flywheel Foundational Blueprint is a reference implementation of some of the techniques we have explored within NVIDIA. While far from a silver bullet, real-world tests within NVIDIA have found instances where using a Data Flywheel can reduce inference costs by up to 98.6%, while maintaining comparable accuracy. These use cases tend to center around simpler tool calling use cases where an agent is using a tool call to route between a small set of tools. In the 98.6% example, we took traffic from an internal HR chatbot. This chat used an LLM -- llama-3.1-70b-instruct -- for several purposes:

  1. Chat
  2. Query rewriting for RAG
  3. Summarization
  4. Tool calling

The Flywheel identified that the Tool Calling use case ended up being simple enough that a fine-tuned llama-3.2-1b-instruct was able to achieve ~98% accuracy relative to the 70b model being used in production.

We have also found instances where Qwen-2.5-32b-coder did as well as Llama-3.1-70b-instruct without any fine-tuning, and so were able to quickly reduce inference costs and time to first token by >50% by swapping out a NIM.

Additional Reading

You can also learn more about Flywheels here:

Technical Details

Key Features

  • Data Collection and Storage:
    • Weights & Biases Weave for logging prompt/completion data and production monitoring
    • MongoDB for API and metadata storage
    • Redis for task queue management
  • Model Integration:
    • Support for Meta Llama 3.2 1B Instruct model
    • Configurable context length up to 32768 tokens
  • Training and Evaluation:
    • In-context learning (ICL) with configurable parameters
    • LoRA-based fine-tuning support
    • Automated data splitting for evaluation
  • Deployment Infrastructure:
    • Docker Compose setup for development
    • Celery workers for background processing
    • Health monitoring for core services
  • Experiment Tracking:
    • Weights & Biases integration for experiment tracking and evaluation results visualization
    • Model registry and dataset versioning in Weights & Biases Models
    • Rich visualization of results and metrics from evaluations

Design Philosophy

The Data Flywheel Foundational Blueprint empowers organizations to accelerate the optimization of AI models for cost and performance. Rather than offering a one-size-fits-all solution, this blueprint provides a practical framework and proven tools to guide your journey toward more efficient, production-ready models.

  1. Reference Implementation for Real-World Impact This blueprint demonstrates the capabilities of NeMo microservices, providing a foundation you can adapt and extend to meet your specific production requirements.

  2. Streamlined Human Oversight The Flywheel process is designed to minimize manual intervention. Human review is reserved for evaluating candidate models after automated assessments, eliminating the need for ongoing user feedback or manual labeling.

  3. Cost and Latency Optimization The primary focus is on reducing inference costs and latency by distilling larger models into smaller, high-quality alternatives. Future updates will further enhance accuracy and introduce advanced prompt and agent optimization features.

  4. Iterative, Data-Driven Improvement Each iteration provides valuable insights, even when a smaller model is not immediately identified. This iterative approach ensures continuous learning and improvement.

  5. Seamless Integration with Existing Workflows

    • Designed for teams with existing generative AI applications in production.
    • Easily integrates with your current logging and workload tagging practices.
    • Supports enhanced workload descriptions for improved future classification.
    • Leverages robust infrastructure—including Weights & Biases Weave for production monitoring, Weights & Biases Models for experiment tracking, MongoDB, Redis, and NeMo microservices—to store data, build datasets, run evaluations, fine-tune models, and re-evaluate results.

To get the most value from the Data Flywheel Foundational Blueprint, ensure you have:

  • An existing generative AI application in production.
  • Logging of prompt/completion traffic, with workload tagging (such as routes, nodes, or agent steps).
  • (Optional, but recommended) Descriptive metadata for each workload to support future classification.
  • The ability to deploy and operate supporting infrastructure (Weights & Biases Weave, MongoDB, Redis, and NeMo microservices) for data storage, dataset creation, evaluation, and fine-tuning.

By following this blueprint, you can confidently advance your AI model optimization initiatives, leveraging a process that is transparent, adaptable, and focused on measurable outcomes.

Future Roadmap

The blueprint purposely keeps the first release simple. Areas we are actively exploring for future versions include:

Theme Example Ideas
Automated Data Collection Integrated collection of model inputs/outputs, latency, and metadata
Visualization Dashboards Enhanced W&B dashboards for cost, latency, drift and accuracy trends
Agentic Observability & Prompt Insights Detect regression, drift, or improvement trends based on performance telemetry
Dynamic Configuration Overrides Runtime overrides for config.yaml settings via API or environment variables
Data Governance & Privacy PII redaction pipeline support for logs and datasets; fine-grained RBAC on dataset access and usage
Data Governance & Privacy Enable experiment tracking tooling for granular tracking of fine-tune runs, metrics, artifacts, and config diffs
Hyper-parameter Sweeps Support launching NeMo microservices hyper-parameter sweeps from external tools (e.g. Flywheel) and pulling results back for analysis and visualization
Smarter Dataset Construction Heuristics or LLM-based parsing to derive eval vs. fine-tune splits from logs; support for DPO/KTO pipelines or filtering by thumbs-up/down signal
Model & Backend Extensibility Add support for additional NIMs such as Qwen, LLaMA-Nemotron, and Mistral; include testing and evaluation support for quantized models

Software Components

The blueprint consists of the following implemented components:

  • API Layer:
    • FastAPI-based REST endpoints (src/api/endpoints.py)
    • Data models and schemas (src/api/models.py, src/api/schemas.py)
    • Job service for task management (src/api/job_service.py)
  • Data Storage:
    • Weights & Biases Weave for log storage and production monitoring
    • Weights & Biases Models for experiment tracking
    • MongoDB for API data persistence (src/api/db.py)
    • Redis for task queue
  • Task Processing:
    • Celery workers for background jobs (src/tasks/tasks.py)
    • Configurable concurrency and monitoring
  • NeMo Microservices Integration:
    • Datastore client for dataset management
    • Model evaluation and customization interfaces
    • Configurable NeMo microservices endpoints
  • Experiment Tracking:
    • Weights & Biases integration for tracking experiments and visualizing evaluation results (src/lib/integration/wandb_callback.py)
    • Model registry for versioning and tracking (src/lib/nemo/model_manager.py)

Technical Diagrams

For details on the architecture of a Flywheel and the components of this Blueprint, view the Architecture Overview.

Minimum System Requirements

Requirement Type Details
Minimum GPU Self-hosted LLM Judge: 6× (NVIDIA H100 or A100 GPUs)
Remote LLM Judge: 2× (NVIDIA H100 or A100 GPUs)
Cluster Single-node NVIDIA GPU cluster on Linux with cluster-admin permissions
Disk Space At least 200 GB free
Software Python 3.11
Docker Engine
Docker Compose v2
Services Weights & Biases Weave
MongoDB 7.0
Redis 7.2
FastAPI (API server)
Celery (task processing)
Resource Minimum Memory: 1GB
Storage: Varies by log volume/model size
Network: Ports 8000 (API), 27017 (MongoDB), 6379 (Redis)
Development Docker Compose for local dev with hot reloading
Supports macOS (Darwin) and Linux
Optional: GPU support for model inference
Production Kubernetes cluster (recommended)
Resources scale with workload
Persistent volume support for data storage

Task Serialization Safeguard 🌐

Why only one Flywheel run at a time? When the Flywheel kicks off a run it may need to spin up multiple NIMs and customization jobs, each of which can claim one or more GPUs. The reference implementation does not yet discover the number of free GPUs in the cluster, so it uses a simple guardrail: all invocations of run_nim_workflow_dag are serialized.

  • The task is bound to a dedicated Celery queue (parent_queue). In the docker-compose.yaml there is a worker dedicated to this queue whose concurrency is set to 1. There is a second worker bound to the default celery queue which can handle running other tasks (e.g. evals) in parallel.
  • Inside the task we wait for the full DAG to complete via async_result.get(...) before returning.
  • The call to create a job (i.e. POST /jobs) will not block, however. It will return immediately with a Job ID

This ensures that only one Flywheel experiment can allocate GPUs at any given time, preventing accidental overallocation that would lead to failed NIM deployments or customizations.

Roadmap – Automatic GPU introspection and smarter scheduling are planned for a future version of the Blueprint so multiple Flywheel runs can execute in parallel when resources permit.

W&B Integration Details

The blueprint provides comprehensive integration with Weights & Biases:

  • Model Artifact Management
    • Automatic logging of fine-tuned models and LoRA adapters
    • Version tracking for all model artifacts
    • Metadata storage including model specs, configurations, and performance metrics
    • Linking of evaluation runs to model artifacts for complete lineage

  • Dataset Management
    • Serialization of datasets from Weave traces
    • Version control for training, validation, and evaluation datasets
    • Automatic dataset splitting and formatting
    • Support for both raw and processed dataset versions

  • Experiment Tracking
    • Detailed logging of evaluation metrics and scores
    • Customizable tags and metadata for experiment organization
    • Real-time progress monitoring and visualization
    • Integration with W&B's experiment comparison tools

  • Production Monitoring
    • Comprehensive logging of production traffic
    • Performance metrics tracking
    • Automatic dataset creation from production logs
    • Support for custom metrics and visualizations

Next Steps

Available Customizations

The following are some of the customizations that you can make after you complete the steps in the Quickstart Guide.

Category Description Available Options
Environment Variables Configure system using environment variables Required Variables: NGC_API_KEY, HF_TOKEN, WANDB_API_KEY
Optional Variables: WANDB_PROJECT, WANDB_ENTITY, MONGODB_URL, REDIS_URL
Configuration: Via .env file or system environment
Model Integration Configure and deploy LLM models Currently Supported: Meta Llama 3.2 1B Instruct
Context Length: Up to 32768 tokens
Hardware Config: GPU support (configurable), PVC size (configurable)
Version Control: Model tags supported
Evaluation Settings Configure data splitting and evaluation parameters Data Split: Eval size (default: 20), validation ratio (0.1)
Minimum Records: 50 records required
Reproducibility: Optional random seed
ICL Settings: Context length (max 32768), reserved tokens (4096), examples (min 1, max 3)
Fine-tuning Options Customize model training Training Type: SFT (Supervised Fine-Tuning)
Method: LoRA with configurable parameters
Parameters: epochs (2), batch size (16), learning rate (0.0001)
LoRA Config: adapter dimension (32), dropout (0.1)
Data Infrastructure Configure data storage and processing Storage: Weights & Biases Weave for logs
Queue: Redis for task processing
Database: MongoDB for API data
Processing: Celery workers with configurable concurrency
Experiment Tracking Configure experiment tracking Weights & Biases: Project name, entity, API key
Model Registry: Tracking model artifacts
Metrics: Customizable metrics logging for evaluation results
Deployment Options Infrastructure configuration Development: Docker Compose with hot reloading
Services: API, Celery Worker, Redis, MongoDB
Resource Config: Network mode, volume mounts, health checks
Environment: Configurable URLs and API keys
W&B Configuration Configure W&B integration Project Settings: Project name, entity, API key
Artifact Management: Model and dataset versioning
Experiment Tracking: Custom metrics and visualizations
Production Monitoring: Traffic logging and analysis

Refer to the Configuration Guide for more information.

Contributing

  1. Install development dependencies:

    uv sync --dev

    This command installs all dependencies needed to build the container and run the tests.

  2. Start required services:

    ./scripts/run.sh

    This starts the necessary services via docker compose that are required for testing.

  3. Run the tests:

    • For unit tests (requires MongoDB from docker compose):

      uv run pytest
    • For integration tests (with mocked NeMo microservices components):

      uv run pytest -m integration
  4. Clean up after development:

    • Stop all services:

      ./scripts/stop.sh
    • (Optional) Clear all database volumes:

      ./scripts/clear_all_volumes.sh

If you modify the API, regenerate the openapi.json with the following command:

uv run python scripts/generate_openapi.py

License

This NVIDIA AI BLUEPRINT is licensed under the Apache License, Version 2.0. This project will download and install additional third-party open source software projects and containers. Review the license terms of these open source projects before use.

The software and materials are governed by the NVIDIA Software License Agreement (found at https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-software-license-agreement/) and the Product-Specific Terms for NVIDIA AI Products (found at https://www.nvidia.com/en-us/agreements/enterprise-software/product-specific-terms-for-ai-products/), except that models are governed by the AI Foundation Models Community License Agreement (found at NVIDIA Agreements | Enterprise Software | NVIDIA Community Model License) and NVIDIA dataset is governed by the NVIDIA Asset License Agreement found here.

For Meta/llama-3.1-70b-instruct model the Llama 3.1 Community License Agreement, for nvidia/llama-3.2-nv-embedqa-1b-v2model the Llama 3.2 Community License Agreement, and for the nvidia/llama-3.2-nv-rerankqa-1b-v2 model the Llama 3.2 Community License Agreement. Built with Llama.

Disclaimer:

The Data Flywheel Blueprint is shared as reference and is provided "as is". The security in the production environment is the responsibility of the end users deploying it. When deploying in a production environment, please have security experts review any potential risks and threats; define the trust boundaries, implement logging and monitoring capabilities,secure the communication channels, integrate AuthN & AuthZ with appropriate controls.

About

No description, website, or topics provided.

Resources

License

Apache-2.0, Unknown licenses found

Licenses found

Apache-2.0
LICENSE
Unknown
LICENSE-dataset

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published