Skip to content

Vertexai pipeline #17

@jmandivarapu1

Description

@jmandivarapu1

Expected Behavior

I am using VertexAI pipeline which does have only two modules.

  1. M1 : Create Datasets (which is just pytorch dataloader) and sent the output to
  2. M2: Training

But the problem is that when print the dataset in M2. It is not sending the pytorch dataloader the previous model is sending artifact_types.Datase . It would be great if any can help how to send pytorch dataloader directly. As I seen in other examples is ostly about sending datasets url and paths. It would be great if you can provide with an example of how to send pytorch loader from M1 to M2.
you can follow sample example from this https://github.com/pytorch/examples/blob/main/mnist/main.py

print(dataloader)
Train Loader <kfp.v2.components.types.artifact_types.Dataset object at 0x7f2dc3dfe6d0>
print(dir(dataloader))

 ['TYPE_NAME', 'VERSION', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_get_path', '_set_path', 'metadata', 'name', 'path', 'uri']
ERROR 'Dataset' object is not iterable

Actual Behavior

  1. I sent the output of the M1 to M2.

Steps to Reproduce the Problem

M1

@component(
    output_component_file="pipeline/create_dataset.yaml", 
    base_image=BASE_IMAGE,
)
def create_dataset(
  # An input parameter of type string.
    cfg_url: str,
    # Use Output to get a metadata-rich handle to the output artifact
    # of type `Dataset`.
    train_loader: Output[Dataset],
    test_loader: Output[Dataset],
    # A locally accessible filepath for another output artifact of type
    # `Dataset`.
    # output_dataset_two_path: OutputPath("Dataset"),
    # A locally accessible filepath for an output parameter of type string.
    # output_parameter_path: OutputPath(str),
):
 
    train_kwargs = {'batch_size': 12}
    test_kwargs = {'batch_size': 25}
    transform=transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))
        ])
    dataset1 = datasets.MNIST('../data', train=True, download=True,
                       transform=transform)
    dataset2 = datasets.MNIST('../data', train=False,
                       transform=transform)
    train_loader = torch.utils.data.DataLoader(dataset1,**train_kwargs)
    test_loader = torch.utils.data.DataLoader(dataset2, **test_kwargs)

M2

@component(
    output_component_file="pipeline/train.yaml", 
    base_image=TRAIN_IMAGE
)
def train_model(
    dataloader_train: Input[Dataset],
    dataloader_test: Input[Dataset],
    cfg_url       : str,
    ds_urls       : str,
    models_path   : Input[Artifact],
    tb_logs       : Input[Artifact],
):
print(dataloader_train,dataloader_test)
for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)

Specifications

  • Version:
  • Platform:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions