The main goal of the CLEVER Project is to gather evidence on Italian argument structure from three different source of knowledge that are typically used in distinct linguistics subfields: human acceptability judgments, behavioral data, distributional representations.
The main outcome of the project will consist of two types of resources that will be made publicy available:
- a wide-coverage dataset of sentences covering a series of linguistic phenomena concerning the verb argument structure of Italian verbs, annotated with human judgments and behavioral data;
- neural language models for Italian trained on a cognitively plausible corpus, which will be used to develop and test novel applications for the Italian language and as a source of evidence for investigating the property of the argument structure of Italian verbs.
CLEVER includes two RUs, with long-standing mutual collaborations: Università di Pisa (UPI) Università Ca’ Foscari Venezia (UVE)
The project consists of five work packages (WP) arranged into several activities.
WP1. Corpus, linguistic analyses and dataset specification
WP2. Training of the Neural Language Model
WP3. Creation of the Linguistic Dataset
WP4. Computational modeling and linguistic analysis
WP5. Project Management
Trained models can be found at: https://huggingface.co/colinglab
Capone L, A Suozzi, GE Lebani, A Lenci (2024), CliC-it 2024, CEUR WORKSHOP PROCEEDINGS. BaBIEs: A Benchmark for the Linguistic Evaluation of Italian Baby Language Models.
Capone L, A Bondielli, A Lenci (2024), The 2nd BabyLM Challenge at the 28th CoNLL. ConcreteGPT: A Baby GPT-2 Based on Lexical Concreteness and Curriculum Learning.