Skip to content

Commit d175b87

Browse files
Joan GinerJoan Giner
Joan Giner
authored and
Joan Giner
committed
Merging v1.0
2 parents 17468e9 + db94f8a commit d175b87

File tree

1 file changed

+152
-165
lines changed

1 file changed

+152
-165
lines changed

README.md

Lines changed: 152 additions & 165 deletions
Original file line numberDiff line numberDiff line change
@@ -1,165 +1,152 @@
1-
<div align="center">
2-
3-
# DescribeML ![GitHub tag (latest by date)](https://img.shields.io/github/v/tag/SOM-Research/DescribeML?label=Version&style=for-the-badge)
4-
5-
DescribeML is a VSCode language plugin to describe machine-learning datasets. <br>
6-
7-
Precisely describe your data's provenance, composition, and social concerns in a structured format.
8-
9-
10-
Make it easy to **reproduce your experiments** to others when you cannot share your data. <br>
11-
<br>
12-
Check out the quick video [presentating](https://www.youtube.com/watch?v=Bf3bhWB-UJY) of the tool, and the [tutorial](https://www.youtube.com/watch?v=1Of1qfuJKvY) presented in the MODELS '22 Conference
13-
14-
</div>
15-
16-
## Installation
17-
18-
### Via marketplace
19-
20-
The easiest way to install the plugin is by using the **Visual Studio Code Market**. Just type "describeML" in the extension tab, and that's it!
21-
22-
### Manually
23-
24-
Instead, you can install it manually using the packaged release of the plugin in this [repository](https://github.com/SOM-Research/DescribeML) that can be found at the root of the project.
25-
26-
The file is **DescribeML-1.0.0vsix**
27-
28-
Open your terminal (or the terminal inside the VSCode) and write this:
29-
30-
```
31-
32-
git clone https://github.com/SOM-Research/DescribeML.git datasets
33-
cd datasets
34-
code --install-extension DescribeML-1.0.0vsix
35-
```
36-
37-
<span style="font-size:0.7em;">*Troubles: If you cannot see the syntax highlight in the examples files (p.e. *Melanoma.descml*) as the image below. Please, reload the VSCode editor and write the code --install command again* </span>
38-
39-
Great! That's it.
40-
41-
42-
43-
## Getting Started
44-
45-
1) The first step is to create a *.descml* file
46-
47-
2) The easy way to start using our tool is to use the *preloader data service*, located at the top left of your editor, clicking at: <img
48-
src="https://github.com/SOM-Research/DescribeML/blob/main/fileicons/cloud-computing.png?raw=true"
49-
alt="preloader service"
50-
title="Optional title"
51-
style="display: inline-block; margin: 0 auto; width: 40px">
52-
53-
3) Select your dataset file (*.csv*), and the tool will generate a draft of your description file.
54-
55-
4) To help you, look to the [Language Reference Guide](https://github.com/SOM-Research/DescribeML/blob/main/documentation/language-reference-guide.md) and follow the examples in the **examples/evaluation** [folders](https://github.com/SOM-Research/DescribeML/tree/main/examples/evaluation) to get a sense of the tool's possibilities. Take a look at the *Melanoma.descml* file, for example.
56-
5) During the documentation process, hitting CTRL + Space (equivalent in other OS) gives you auto-completion help. In addition, the part marked with the points below gives you hints to complete the documentation, and the outline in the right part shows you the document structure.
57-
58-
<div align="center">
59-
60-
![Autocompletion feature](https://github.com/SOM-Research/DescribeML/blob/main/fileicons/Autcomplete.gif?raw=true)
61-
62-
</div>
63-
64-
6) Once you are happy with your documentation, you can generate HTML documentation by clicking the generator button next to the prealoder service: <img
65-
src="https://github.com/SOM-Research/DescribeML/blob/main/fileicons/html.png?raw=true"
66-
alt="HTML generator"
67-
title="Optional title"
68-
style="display: inline-block; margin: 0 auto; width: 40px">
69-
70-
71-
72-
73-
74-
75-
76-
For more information, check out the **quick [presentation](https://www.youtube.com/watch?v=Bf3bhWB-UJY) video** and the [**tutorial**](https://www.youtube.com/watch?v=1Of1qfuJKvY) presented in the MODELS '22 Conference
77-
78-
79-
80-
81-
## Contributing
82-
83-
This project is being development as part of a research line of the [SOM Research Lab](https://som-research.github.io/), but we are open to contributions from the community. If you are interested in contributing to this project, please first read the [CONTRIBUTING.md](CONTRIBUTING.md) guidelines file.
84-
85-
### Repository structure
86-
87-
The following tree shows the list of the repository's relevant sections:
88-
89-
- The *documentation* and *examples* folders contains the mentioend examples and the language reference guide.
90-
- The *out* folder contains the executable plugin in JS. You may not want to dive in as it is generated by the TypeScrpit compiler
91-
- The *src* folder contains the project's source code
92-
- The *cli* folder is the generated grammar and AST from Langium. You may not want to dive in it as it is a generated asset
93-
- The *generator-service* folder contains all the code of the tools generation service. Could be a good place to start if you want to improve the generation of the tool.
94-
- The *uploader-service* folder contains all the code of the uploader service. Could be a good place to contribute new statistical metrics, or ML techniques to do dataset reverse engineering
95-
- The *language-server* folder contains all the language features, and the grammar declaration. If you want to improve the grammar, or some of the features the plugin offers here is the place you may want to start
96-
- The *dataset-description.langium* file contains the main grammar declaration. This grammar is developed using the [Langium Grammar Language](https://langium.org/docs/grammar-language/). Please refer to the linked documentation to more insights on how to develop the grammar.
97-
98-
99-
100-
101-
```
102-
├── documentation
103-
│ └── language-reference-guide.md // The language reference guide
104-
├── examples
105-
│ ├── evaluation
106-
│ ├── Gender.descml // Gender dataset example
107-
| ├── Melanoma.descml // Melanoma dataset example
108-
| └── Polarity.descml // Polarity dataset example
109-
├── out // The generated JS from the src folder
110-
└── src // The source code of the project
111-
├── cli // Langium framework utils
112-
├── generator-service // The tool's HTML generator service
113-
├── uploader-service // The tool's HTML uploader service
114-
└── language-server // The tool's language features
115-
├── generated // Generated grammar and AST from Langium
116-
├── dataset-description-index.ts // Custom index feature
117-
├── dataset-description-module.ts // Declaration of the custom language features
118-
├── dataset-description-validator.ts // Custom language features
119-
└── dataset-description.langium // The main grammar file of the tool
120-
121-
```
122-
123-
124-
#### Tips to contribute
125-
126-
You may need extra steps to contribute or dive into the plugin or the language. (to match with the exact version of the Langium, the base framework we used)
127-
128-
1 - "npm install" to install dependencies.
129-
130-
2 - Then go to /node_modules folder and delete "langium" and "langium-cli" folder
131-
132-
3 - Copy the folder "langium" and "langium-cli" from folder /packages to /node_modules
133-
134-
4 - Get the folder /packages/langium-vscode and paste it inside your VSCode extension folder (typically <user home>/.vscode/extensions)
135-
136-
5 - Install the Langium plugin through the UI of VSCode
137-
138-
139-
#### Debugging the extensions
140-
141-
This repo comes with an already built-in config to debug. Just go to Debug in VSCode, and launch the Extension config. Please check your port 6009 is free.
142-
143-
For more information about how the framework works and how the language can be extended, please refer to https://github.com/langium/langium or the VSCode extension API documentation https://code.visualstudio.com/api
144-
145-
## Research background
146-
147-
DescribeML is part of an ongoing research project to improve dataset documentation for machine learning. The core of our proposal is a domain-specific language ([preprint here](https://www.researchgate.net/publication/361836238_A_domain-specific_language_for_describing_machine_learning_datasets)) that allows data creators to describe relevant aspects of their data for the machine learning field and beyond. The [Critical Dataset Studios](https://knowingmachines.org/reading-list#dataset_documentation_practices) of the [Knowing Machines](https://knowingmachines.org) project have compiled an excellent list of current documentation practices.
148-
149-
The tool has been presented at the ACM/IEEE 25th International Conference on [Model Driven Engineering Languages and Systems](https://conf.researchr.org/home/models-2022) and a preprint of the tool publication can be seen [here](https://www.researchgate.net/publication/363256430_DescribeML_A_Tool_for_Describing_Machine_Learning_Datasets)
150-
151-
152-
153-
# Code of Conduct
154-
155-
At SOM Research Lab we are dedicated to creating and maintaining welcoming, inclusive, safe, and harassment-free development spaces. Anyone participating will be subject to and agrees to sign on to our [Code of Conduct](CODE_OF_CONDUCT.md).
156-
157-
## License
158-
159-
Shield: [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
160-
161-
162-
The source code for the site is licensed under the MIT license, which you can find in the MIT-LICENSE file.
163-
164-
All graphical assets are licensed under the
165-
[Creative Commons Attribution 3.0 Unported License](https://creativecommons.org/licenses/by/3.0/).
1+
<div align="center">
2+
3+
# DescribeML ![GitHub tag (latest by date)](https://img.shields.io/github/v/tag/SOM-Research/DescribeML?label=Version&style=for-the-badge)
4+
5+
DescribeML is a VSCode language plugin to describe machine-learning datasets. <br>
6+
7+
Precisely describe your data's provenance, composition, and social concerns in a structured format.
8+
9+
10+
Make it easy to **reproduce your experiments** to others when you cannot share your data. <br>
11+
<br>
12+
Check out the quick video [presentating](https://www.youtube.com/watch?v=Bf3bhWB-UJY) of the tool, and the [tutorial](https://www.youtube.com/watch?v=1Of1qfuJKvY) presented in the MODELS '22 Conference
13+
14+
</div>
15+
16+
## Installation
17+
18+
### Via marketplace
19+
20+
The easiest way to install the plugin is by using the **Visual Studio Code Market**. Just type "describeML" in the extension tab, and that's it!
21+
22+
### Manually
23+
24+
Instead, you can install it manually using the packaged release of the plugin in this [repository](https://github.com/SOM-Research/DescribeML) that can be found at the root of the project.
25+
26+
The file is **DescribeML-1.0.0vsix**
27+
28+
Open your terminal (or the terminal inside the VSCode) and write this:
29+
30+
```
31+
32+
git clone https://github.com/SOM-Research/DescribeML.git datasets
33+
cd datasets
34+
code --install-extension DescribeML-1.0.0vsix
35+
```
36+
37+
<span style="font-size:0.7em;">*Troubles: If you cannot see the syntax highlight in the examples files (p.e. *Melanoma.descml*) as the image below. Please, reload the VSCode editor and write the code --install command again* </span>
38+
39+
Great! That's it.
40+
41+
42+
43+
## Getting Started
44+
45+
1) The first step is to create a *.descml* file
46+
47+
2) The easy way to start using our tool is to use the *preloader data service*, located at the top left of your editor, clicking at: <img
48+
src="https://github.com/SOM-Research/DescribeML/blob/main/fileicons/cloud-computing.png?raw=true"
49+
alt="preloader service"
50+
title="Optional title"
51+
style="display: inline-block; margin: 0 auto; width: 40px">
52+
53+
3) Select your dataset file (*.csv*), and the tool will generate a draft of your description file.
54+
55+
4) To help you, look to the [Language Reference Guide](https://github.com/SOM-Research/DescribeML/blob/main/documentation/language-reference-guide.md) and follow the examples in the **examples/evaluation** [folders](https://github.com/SOM-Research/DescribeML/tree/main/examples/evaluation) to get a sense of the tool's possibilities. Take a look at the *Melanoma.descml* file, for example.
56+
5) During the documentation process, hitting CTRL + Space (equivalent in other OS) gives you auto-completion help. In addition, the part marked with the points below gives you hints to complete the documentation, and the outline in the right part shows you the document structure.
57+
58+
<div align="center">
59+
60+
![Autocompletion feature](https://github.com/SOM-Research/DescribeML/blob/main/fileicons/Autcomplete.gif?raw=true)
61+
62+
</div>
63+
64+
6) Once you are happy with your documentation, you can generate HTML documentation by clicking the generator button next to the prealoder service: <img
65+
src="https://github.com/SOM-Research/DescribeML/blob/main/fileicons/html.png?raw=true"
66+
alt="HTML generator"
67+
title="Optional title"
68+
style="display: inline-block; margin: 0 auto; width: 40px">
69+
70+
71+
72+
73+
74+
75+
76+
For more information, check out the **quick [presentation](https://www.youtube.com/watch?v=Bf3bhWB-UJY) video** and the [**tutorial**](https://www.youtube.com/watch?v=1Of1qfuJKvY) presented in the MODELS '22 Conference
77+
78+
79+
80+
81+
## Contributing
82+
83+
This project is being development as part of a research line of the [SOM Research Lab](https://som-research.github.io/), but we are open to contributions from the community. If you are interested in contributing to this project, please first read the [CONTRIBUTING.md](CONTRIBUTING.md) guidelines file.
84+
85+
### Repository structure
86+
87+
The following tree shows the list of the repository's relevant sections:
88+
89+
- The *documentation* and *examples* folders contains the mentioend examples and the language reference guide.
90+
- The *out* folder contains the executable plugin in JS. You may not want to dive in as it is generated by the TypeScrpit compiler
91+
- The *src* folder contains the project's source code
92+
- The *cli* folder is the generated grammar and AST from Langium. You may not want to dive in it as it is a generated asset
93+
- The *generator-service* folder contains all the code of the generation service. Could be a good place to start if you want to improve the generation of the tool.
94+
- The *uploader-service* folder contains all the code of the uploader service. Could be a good place to contribute new statistical metrics, or ML techniques to do dataset reverse engineering
95+
- The *language-server* folder contains all the language features, and the grammar declaration. If you want to improve the grammar, or some of the features the plugin offers here is the place you may want to start
96+
- The *dataset-description.langium* file contains the main grammar declaration. This grammar is developed using the [Langium Grammar Language](https://langium.org/docs/grammar-language/). Please refer to the linked documentation to more insights on how to develop the grammar.
97+
98+
99+
100+
101+
```
102+
├── documentation
103+
│ └── language-reference-guide.md // The language reference guide
104+
├── examples
105+
│ ├── evaluation
106+
│ ├── Gender.descml // Gender dataset example
107+
| ├── Melanoma.descml // Melanoma dataset example
108+
| └── Polarity.descml // Polarity dataset example
109+
├── out // The generated JS from the src folder
110+
└── src // The source code of the project
111+
├── cli // Langium framework utils
112+
├── generator-service // The tool's HTML generator service
113+
├── uploader-service // The tool's HTML uploader service
114+
└── language-server // The tool's language features
115+
├── generated // Generated grammar and AST from Langium
116+
├── dataset-description-index.ts // Custom index feature
117+
├── dataset-description-module.ts // Declaration of the custom language features
118+
├── dataset-description-validator.ts // Custom language features
119+
└── dataset-description.langium // The main grammar file of the tool
120+
121+
```
122+
123+
124+
125+
126+
#### Debugging the extensions
127+
128+
This repo comes with an already built-in config to debug. Just go to Debug in VSCode, and launch the Extension config. Please check your port 6009 is free.
129+
130+
For more information about how the framework works and how the language can be extended, please refer to https://github.com/langium/langium or the VSCode extension API documentation https://code.visualstudio.com/api
131+
132+
## Research background
133+
134+
DescribeML is part of an ongoing research project to improve dataset documentation for machine learning. The core of our proposal is a domain-specific language ([preprint here](https://www.researchgate.net/publication/361836238_A_domain-specific_language_for_describing_machine_learning_datasets)) that allows data creators to describe relevant aspects of their data for the machine learning field and beyond. The [Critical Dataset Studios](https://knowingmachines.org/reading-list#dataset_documentation_practices) of the [Knowing Machines](https://knowingmachines.org) project have compiled an excellent list of current documentation practices.
135+
136+
The tool has been presented at the ACM/IEEE 25th International Conference on [Model Driven Engineering Languages and Systems](https://conf.researchr.org/home/models-2022) and a preprint of the tool publication can be seen [here](https://www.researchgate.net/publication/363256430_DescribeML_A_Tool_for_Describing_Machine_Learning_Datasets)
137+
138+
139+
140+
# Code of Conduct
141+
142+
At SOM Research Lab we are dedicated to creating and maintaining welcoming, inclusive, safe, and harassment-free development spaces. Anyone participating will be subject to and agrees to sign on to our [Code of Conduct](CODE_OF_CONDUCT.md).
143+
144+
## License
145+
146+
Shield: [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
147+
148+
149+
The source code for the site is licensed under the MIT license, which you can find in the MIT-LICENSE file.
150+
151+
All graphical assets are licensed under the
152+
[Creative Commons Attribution 3.0 Unported License](https://creativecommons.org/licenses/by/3.0/).

0 commit comments

Comments
 (0)