Skip to content

Failure to create the BPE dataset for custom dataset #22

@guanqun-yang

Description

@guanqun-yang

Hi

I am trying to train a style transfer model for a style (i.e., profane vs. civil) that is not supported in the paper. However, when I tried to run the first step as is instructed in the repository

python datasets/dataset2bpe.py --dataset datasets/golbeck

where datasets/golbeck is a dataset on toxicity comments with required directory structure.

golbeck/
├── dev.label
├── dev.txt
├── test.label
├── test.txt
├── train.label
└── train.txt

a series of errors on some dependencies are reported.

[...]lib/python3.6/site-packages/torch/include/ATen/core/TensorMethods.h:1417:15: note:   no known conversion for argument 1 from ‘<brace-enclosed initializer list>’ to ‘at::TensorList {aka c10::ArrayRef<at::Tensor>}’
error: command 'gcc' failed with exit status 1

It seems to me that this should be related to the installation. Here are the full installation commands I used.

conda create --name strap python==3.6.3
conda activate strap

conda install pytorch==1.4.0 torchvision==0.5.0 cudatoolkit=10.1 -c pytorch

pip install -r requirements.txt
pip install -e .

cd fairseq
pip install -e .

# this line is required because otherwise running dataset2bpe.py will report missing dependencies
conda install -c conda-forge hydra-core omegaconf

I am wondering how I could resolve this issue. In order to reproduce this error, the data is provided here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions