-
-
Notifications
You must be signed in to change notification settings - Fork 167
Project Ideas Improve File Classification
ScanCode currently detects the programming language, file type and MIME type for files, but this detection is not as accurate as it could be. We also need a better way to classify files for further automation particularly in the area of identifying the likely "purpose" of a file - e.g. focus on source and binary files that represent code versus files that are documentation, scripts, etc. This is similar to the concept of "facets" from the Clearly Defined project.
The first goal of this project is to improve the quality of detecting file characteristics including programming language (which currently use only Pygments) and Linux "magic" file type. The second goal is to create and implement a flexible framework of rules to automate assigning "purpose" to files, possibly with machine learning.
-
- Level
-
- Intermediate to Advanced
-
- Tech
-
- Python
-
- Mentors
-
- @pombredanne https://github.com/pombredanne