-
Notifications
You must be signed in to change notification settings - Fork 128
Open
Description
Expected Behavior
I am using VertexAI pipeline which does have only two modules.
- M1 : Create Datasets (which is just pytorch dataloader) and sent the output to
- M2: Training
But the problem is that when print the dataset in M2. It is not sending the pytorch dataloader the previous model is sending artifact_types.Datase
. It would be great if any can help how to send pytorch dataloader directly. As I seen in other examples is ostly about sending datasets url and paths. It would be great if you can provide with an example of how to send pytorch loader from M1 to M2.
you can follow sample example from this https://github.com/pytorch/examples/blob/main/mnist/main.py
print(dataloader)
Train Loader <kfp.v2.components.types.artifact_types.Dataset object at 0x7f2dc3dfe6d0>
print(dir(dataloader))
['TYPE_NAME', 'VERSION', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_get_path', '_set_path', 'metadata', 'name', 'path', 'uri']
ERROR 'Dataset' object is not iterable
Actual Behavior
- I sent the output of the M1 to M2.
Steps to Reproduce the Problem
M1
@component(
output_component_file="pipeline/create_dataset.yaml",
base_image=BASE_IMAGE,
)
def create_dataset(
# An input parameter of type string.
cfg_url: str,
# Use Output to get a metadata-rich handle to the output artifact
# of type `Dataset`.
train_loader: Output[Dataset],
test_loader: Output[Dataset],
# A locally accessible filepath for another output artifact of type
# `Dataset`.
# output_dataset_two_path: OutputPath("Dataset"),
# A locally accessible filepath for an output parameter of type string.
# output_parameter_path: OutputPath(str),
):
train_kwargs = {'batch_size': 12}
test_kwargs = {'batch_size': 25}
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])
dataset1 = datasets.MNIST('../data', train=True, download=True,
transform=transform)
dataset2 = datasets.MNIST('../data', train=False,
transform=transform)
train_loader = torch.utils.data.DataLoader(dataset1,**train_kwargs)
test_loader = torch.utils.data.DataLoader(dataset2, **test_kwargs)
M2
@component(
output_component_file="pipeline/train.yaml",
base_image=TRAIN_IMAGE
)
def train_model(
dataloader_train: Input[Dataset],
dataloader_test: Input[Dataset],
cfg_url : str,
ds_urls : str,
models_path : Input[Artifact],
tb_logs : Input[Artifact],
):
print(dataloader_train,dataloader_test)
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
Specifications
- Version:
- Platform:
Metadata
Metadata
Assignees
Labels
No labels