Skip to content

Commit e7e0e44

Browse files
authored
Merge pull request #45 from IntelLabs/nhasabni/php_training_dataset
Adding training dataset for PHP
2 parents 2c92ea5 + b79b702 commit e7e0e44

File tree

3 files changed

+9341
-11
lines changed

3 files changed

+9341
-11
lines changed

README.md

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -78,13 +78,14 @@ Verilog support is WIP.
7878

7979
#### Using patterns obtained from 6000 GitHub repos to scan repository of your choice
8080

81-
Download the training data for C language depending on the memory constraints of your device. Note, however, that using smaller datasets may lead to reduced accuracy in the results ControlFlag produces and possibly an increase in the number of false positives it generates.
81+
Download the training data for the language of interest depending on the memory constraints of your device. Note, however, that using smaller datasets may lead to reduced accuracy in the results ControlFlag produces and possibly an increase in the number of false positives it generates.
8282

83-
Dataset name | Size on disk | Memory requirements | Direct link | gdown ID | MD5 checksum
84-
-------------|--------------|---------------------|-------------|----------|-------------
85-
Small | ~100MB | ~400MB | [link](https://drive.google.com/file/d/1gvUyRXq1SeZD9g3i__RaamYAMo_QaQIb/view?usp=sharing) | 1gvUyRXq1SeZD9g3i__RaamYAMo_QaQIb | 2825f209aba0430993f7a21e74d99889
86-
Medium | ~450MB | ~1.3GB | [link](https://drive.google.com/file/d/1zsCFJAKlZlSAWKPfBcVGcQNlFB5Gtwo3/view?usp=sharing) | 1zsCFJAKlZlSAWKPfBcVGcQNlFB5Gtwo3 | aab2427edebe9ed4acab75c3c6227f24
87-
Large | ~9GB | ~13GB | [link](https://drive.google.com/file/d/1-jzs3zrKU541hwChaciXSk8zrnMN1mYc/view?usp=sharing) | 1-jzs3zrKU541hwChaciXSk8zrnMN1mYc | 1ba954d9716765d44917445d3abf8e85
83+
Language | Dataset name | Size on disk | Memory requirements | Direct link | gdown ID | MD5 checksum
84+
---------|--------------|--------------|---------------------|-------------|----------|-------------
85+
C | Small | ~100MB | ~400MB | [link](https://drive.google.com/file/d/1gvUyRXq1SeZD9g3i__RaamYAMo_QaQIb/view?usp=sharing) | 1gvUyRXq1SeZD9g3i__RaamYAMo_QaQIb | 2825f209aba0430993f7a21e74d99889
86+
C | Medium | ~450MB | ~1.3GB | [link](https://drive.google.com/file/d/1zsCFJAKlZlSAWKPfBcVGcQNlFB5Gtwo3/view?usp=sharing) | 1zsCFJAKlZlSAWKPfBcVGcQNlFB5Gtwo3 | aab2427edebe9ed4acab75c3c6227f24
87+
C | Large | ~9GB | ~13GB | [link](https://drive.google.com/file/d/1-jzs3zrKU541hwChaciXSk8zrnMN1mYc/view?usp=sharing) | 1-jzs3zrKU541hwChaciXSk8zrnMN1mYc | 1ba954d9716765d44917445d3abf8e85
88+
PHP | Small | ~120MB | ~1GB | [Link](https://drive.google.com/file/d/1zUnBHMXPIXmlrCfWze8nNoMEQnc0W2K5/view?usp=sharing) | 1zUnBHMXPIXmlrCfWze8nNoMEQnc0W2K5 | 5a1cc4c24a20de7dad1b9f40661d517a
8889

8990
```
9091
$ python -m pip install gdown && gdown https://drive.google.com/uc?id=<id_from_table>
@@ -164,7 +165,7 @@ place of <training_repo_dir>.
164165
Usage: ./mine_patterns.sh -d <directory_to_mine_patterns_from> -o <output_file_to_store_training_data>
165166
Optional:
166167
[-n number_of_processes_to_use_for_mining] (default: num_cpus_on_system)
167-
[-l source_language_number] (default: 1 (C), supported: 1 (C), 2 (Verilog)
168+
[-l source_language_number] (default: 1 (C), supported: 1 (C), 2 (Verilog), 3 (PHP)
168169
```
169170

170171
We use it as:
@@ -188,7 +189,7 @@ Optional:
188189
[-n max_number_of_results_for_autocorrect] (default: 5)
189190
[-j number_of_scanning_threads] (default: num_cpus_on_systems)
190191
[-o output_log_dir] (default: /tmp)
191-
[-l source_language_number] (default: 1 (C), supported: 1 (C), 2 (Verilog))
192+
[-l source_language_number] (default: 1 (C), supported: 1 (C), 2 (Verilog), 3 (PHP))
192193
[-a anomaly_threshold] (default: 3.0)
193194
```
194195

0 commit comments

Comments
 (0)