Skip to content

Trying to train on an existing model #47

@ghost

Description

Hi!
This is a really great tool and it's been fun using it.
I am trying to train the model 'bert-base-multilingual-uncased' using a tokenized dataset in the correct format. But every time I run the script, it loads the file and weights and promptly stops as weights of the pre trained model aren't initialised.
This is the message I get:

07/21/2022 13:48:50 - WARNING - main - Process rank: -1, device: cpu, n_gpu: 0, distributed training: False, 16-bits training: False
07/21/2022 13:48:50 - INFO - awesome_align.configuration_utils - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased-config.json from cache at /Users/devi/.cache/torch/awesome-align/45629519f3117b89d89fd9c740073d8e4c1f0a70f9842476185100a8afe715d1.65df3cef028a0c91a7b059e4c404a975ebe6843c71267b67019c0e9cfa8a88f0
07/21/2022 13:48:50 - INFO - awesome_align.configuration_utils - Model config BertConfig {
"architectures": [
"BertForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"bos_token_id": null,
"directionality": "bidi",
"do_sample": false,
"eos_token_ids": null,
"finetuning_task": null,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"id2label": {
"0": "LABEL_0",
"1": "LABEL_1"
},
"initializer_range": 0.02,
"intermediate_size": 3072,
"is_decoder": false,
"label2id": {
"LABEL_0": 0,
"LABEL_1": 1
},
"layer_norm_eps": 1e-12,
"length_penalty": 1.0,
"max_length": 20,
"max_position_embeddings": 512,
"model_type": "bert",
"num_attention_heads": 12,
"num_beams": 1,
"num_hidden_layers": 12,
"num_labels": 2,
"num_return_sequences": 1,
"output_attentions": false,
"output_hidden_states": false,
"output_past": true,
"pad_token_id": 0,
"pooler_fc_size": 768,
"pooler_num_attention_heads": 12,
"pooler_num_fc_layers": 3,
"pooler_size_per_head": 128,
"pooler_type": "first_token_transform",
"repetition_penalty": 1.0,
"temperature": 1.0,
"top_k": 50,
"top_p": 1.0,
"torchscript": false,
"type_vocab_size": 2,
"use_bfloat16": false,
"vocab_size": 119547
}

07/21/2022 13:48:51 - INFO - awesome_align.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased-vocab.txt from cache at /Users/devi/.cache/torch/awesome-align/96435fa287fbf7e469185f1062386e05a075cadbf6838b74da22bf64b080bc32.99bcd55fc66f4f3360bc49ba472b940b8dcf223ea6a345deb969d607ca900729
07/21/2022 13:48:52 - INFO - awesome_align.modeling_utils - loading weights file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased-pytorch_model.bin from cache at /Users/devi/.cache/torch/awesome-align/5b5b80054cd2c95a946a8e0ce0b93f56326dff9fbda6a6c3e02de3c91c918342.7131dcb754361639a7d5526985f880879c9bfd144b65a0bf50590bddb7de9059
07/21/2022 13:48:56 - INFO - awesome_align.modeling_utils - Weights of BertForMaskedLM not initialized from pretrained model: ['cls.predictions.decoder.bias', 'psi_cls.bias', 'psi_cls.transform.weight', 'psi_cls.transform.bias', 'psi_cls.decoder.weight', 'psi_cls.decoder.bias']
07/21/2022 13:48:56 - INFO - awesome_align.modeling_utils - Weights from pretrained model not used in BertForMaskedLM: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
07/21/2022 13:48:56 - INFO - main - Training/evaluation parameters Namespace(train_data_file=‘de-en_tmx_align.txt', output_dir='align/train_model', train_mlm=False, train_tlm=False, train_tlm_full=False, train_so=False, train_psi=False, train_co=False, train_gold_file=None, eval_gold_file=None, ignore_possible_alignments=False, gold_one_index=False, cache_data=False, align_layer=8, extraction='softmax', softmax_threshold=0.001, eval_data_file='examples/deen_param_test', should_continue=False, model_name_or_path='bert-base-multilingual-cased', mlm_probability=0.15, config_name=None, tokenizer_name=None, cache_dir=None, block_size=512, do_train=False, do_eval=False, per_gpu_train_batch_size=2, per_gpu_eval_batch_size=2, gradient_accumulation_steps=4, learning_rate=2e-05, weight_decay=0.0, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=1.0, max_steps=-1, warmup_steps=0, logging_steps=500, save_steps=500, save_total_limit=None, no_cuda=False, overwrite_output_dir=False, overwrite_cache=False, seed=42, fp16=False, fp16_opt_level='O1', local_rank=-1, n_gpu=0, device=device(type='cpu'))

Please help with a solution or if I'm doing something wrong!
Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions