-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Smells are an indication and not concrete, subjective and romantic. Tie smells to productivity.
Do code smells impact review time?
Definition of code smells, and catalogue.
Assumptions:
Developers know what they are
Developers care at review time
Developers disagree of severity/importance
Evolution of code smells: Which ones can be ignored?
Identify code smells first and then evaluate them.
Tie code smells to:
Reproducibility
Performance
*ilities - definition of stakeholders, hard to make concrete
Commented out code indicative of versioning in addition to version control (Not just DS). Solved through education, workstyle of people -> experimentation.
Reading code that has commented out of code.
Does commented out code impact readability of the source code?
Code smells: personal preference of people who made up these phrases, mere guidelines
Small companies don’t care about code smells (experiential)
Code smells should be avoided (low impact and unreliable)
Data versioning tool -> large datasets, experiments, storage perspective.
Motivation: data storage is a problem for cloud service providers. Redundancy between versions of the data and you shouldn’t be storing all the features.
Store code transformations rather than datasets
Efficient caching
Think about the scale of data storage.
Is the transformed data the challenge with storing versions of the data?
Linting data with diffs applied at the smells level -> in the context of ML, data and code smells are all impactful.
Identify smells that change the behaviour of ML. There is a need to define the definition of an ML smell -> this is different from code smells. Thus, ML smells are a) technical debt, b) actual defect, and c) are concrete.
Guidelines for dealing with technical debt ignores the commercial reality and focuses on the ideal. This ties in with the context idea.
-> Case study of ML in small clients/companies (Related work exists for this???)efficiency is vital. Tool support is key to realise the solutions in organisations.
Is static analysis sufficient for ML smells? Interactive environment: development phase, deployment phase.
CI/CD for ML (Thoughtworks)
Start at the upstream process at the data level rather than modelling.
Conclusions:
Code smells should be avoided (low impact and unreliable)
Focus on the messy upstream process at data collection rather than modelling
There is a need to define the definition of an ML smell
Does commented out code impact readability of the source code?
Can diff algorithms be used for data versioning?
Data quality issues should be easy to validate/verify -> portable as data their own
The barrier between data and code is still valid