Skip to content

Commit 199dc16

Browse files
authored
Merge pull request #199 from LabeliaLabs/release-2023H1
Umbrella branch for release 2023-H1
2 parents 65ab38b + bb4d34e commit 199dc16

File tree

3 files changed

+91
-23
lines changed

3 files changed

+91
-23
lines changed

assessment_framework_eng.md

Lines changed: 45 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
The [evaluation framework](#evaluation-framework-to-assess-the-maturity-of-an-organisation) below is the result of the participatory work initiated in the spring of 2019 by Labelia Labs (ex- Substra Foundation) and ongoing since then. It is based on the identification of the risks that we are trying to prevent by aiming for a responsible and trustworthy practice of data science, and best practices to mitigate them. It also brings together for each topic technical resources that can be good entry points for interested organisations.
44

5-
Last update: 2nd semester 2022.
5+
Last update: 1st semester 2023.
66

77
## Evaluation framework to assess the maturity of an organisation
88

@@ -17,14 +17,14 @@ The evaluation is composed of the following 6 sections:
1717

1818
---
1919

20-
### Section 1 - Protecting personal or confidential data
20+
### Section 1 - Protecting personal or confidential data and comply with regulatory requirements
2121

22-
**[Data privacy]**
22+
**[Data privacy and regulatory compliance]**
2323

24-
The use of personal or confidential data carries the risk of exposure of such data, which can have very detrimental consequences for the producers, controllers or subjects of such data. Particularly in data science projects, they must therefore be protected and the risks of their leakage or exposure must be minimised.
24+
The use of personal or confidential data carries the risk of exposure of such data, which can have very detrimental consequences for the producers, controllers or subjects of such data. Particularly in data science projects, they must therefore be protected and the risks of their leakage or exposure must be minimised. Additionnally, AI models themselves can be attacked and must be protected. Finally, regulatory requirements specific to AI systems but be identified, known, and the data science activities of the organization must be compliant.
2525

2626
[_[⇧ back to the list of sections](#evaluation-framework-to-assess-the-maturity-of-an-organisation)_]
27-
[_[⇩ next section](#section-2---preventing-bias-developing-non-discriminatory-models)
27+
[_[⇩ next section](#section-2---preventing-bias-developing-non-discriminatory-models)_]
2828

2929
---
3030

@@ -100,6 +100,13 @@ In addition to identifying regulations and compliance approaches, it is importan
100100

101101
</details>
102102

103+
<details>
104+
<summary>Ressources1.3 :</summary>
105+
106+
- (Academic paper) *[Do Foundation Model Providers Comply with the Draft EU AI Act?](https://crfm.stanford.edu/2023/06/15/eu-ai-act.html)*, Rishi Bommasani and Kevin Klyman and Daniel Zhang and Percy Liang (Stanford University, Center for Research on Foundation Models), June 2023
107+
108+
</details>
109+
103110
---
104111

105112
Q1.4 : **Applicable legislation and contractual requirements - Auditing and certification**
@@ -1016,6 +1023,27 @@ Using automatic systems based on models whose rules have been "learned" (and not
10161023

10171024
</details>
10181025

1026+
---
1027+
1028+
Q5.6 : **Logging predictions from AI models**
1029+
If your organisation provides or operates AI model-based applications to customers or third parties, to enable auditability of such applications and facilitate their continuous improvement, it is key to implement predictions logging. On that topic:
1030+
1031+
R5.6 :
1032+
_(Type: single answer)_
1033+
_(Select one answer only, which best corresponds to the level of maturity of the organisation on this topic)_
1034+
_(Specific risk domain: use of AI models, provision or operation of AI model-based applications for customers or third parties)_
1035+
1036+
- [ ] 5.6.a Our organisation does not use AI models on its own behalf or on behalf of its clients, and does not provide its clients with applications based on AI models | _(Concerned / Not concerned)_
1037+
- [ ] 5.6.b Logging predictions from AI models used in production is not yet systematically implemented
1038+
- [ ] 5.6.c We systematically log all predictions from AI models used in production (coupled with the input data and the associated models references)
1039+
1040+
<details>
1041+
<summary>Expl5.6 :</summary>
1042+
1043+
Using automatic systems based on AI models whose rules have been learned questions the way organisations design and operate their products and services. It is important to preserve the responsiveness and resilience of organisations using those AI models, particularly in dealing with situations where AI models have led to an undesirable outcome for the organisation or its stakeholders. To that end, logging predictions from AI models used in production (coupled with the input data and the associated models references) is key to enable ex-post auditability on concrete use cases. It should be noted that predictions might involve personal data and be regulated by GDPR. Anonymization of processed data, when logged & made available to customers or internal operators, could be part of a solution to avoid leaking sensitive information.
1044+
1045+
</details>
1046+
10191047
---
10201048
---
10211049

@@ -1029,24 +1057,26 @@ The implementation of an automatic system based on an AI model can generate nega
10291057

10301058
---
10311059

1032-
Q6.1 : **CO2 impact**
1033-
About the CO2 impact of the data science activity in your organisation:
1060+
Q6.1 : **Environmental impact (energy consumption and carbon footprint)**
1061+
About the environmental impact of the data science activity in your organisation:
10341062

10351063
R6.1 :
10361064
_(Type: multiple responses possible)_
10371065
_(Select all the answer items that correspond to practices in your organisation)_
10381066

1039-
- [ ] 6.1.a At this stage we have not looked at the CO2 impact of our data science activity or our AI models
1040-
- [ ] 6.1.b We have developed indicators that define what we want to measure regarding the CO2 impact of our data science activity or our models
1067+
- [ ] 6.1.a At this stage we have not studied specifically the environmental impact of our data science activity or our AI models
1068+
- [ ] 6.1.b We have developed indicators that define what we want to measure regarding the energy consumption and the carbon footprint of our data science activity or our models
10411069
- [ ] 6.1.c We measure our indicators regularly
10421070
- [ ] 6.1.d We include their measurements in the model identity cards
10431071
- [ ] 6.1.e Monitoring our indicators on a regular basis is a formalised and controlled process, from which we define and drive improvement objectives
1044-
- [ ] 6.1.f The CO2 impact of our data science activity or our models is made transparent to our counterparts and the general public
1072+
- [ ] 6.1.f We consolidate an aggregated view of the energy consumtion and carbon footprint of our data science activities
1073+
- [ ] 6.1.g This aggregated view is taken into account in the global environmental impact evaluation of our organization (e.g. carbon footprint, regulatory GHG evaluation, Paris Agreement compatibility score...)
1074+
- [ ] 6.1.h The energy consumption and carbon footprint of our data science activity or our models is made transparent to our counterparts and the general public
10451075

10461076
<details>
10471077
<summary>Expl6.1 :</summary>
10481078

1049-
It is important to question and raise awareness of environmental costs. In particular one can: (i) measure the environmental cost of data science projects, (ii) publish transparently their environmental impact, expliciting the split between train and production phases, (iii) improve on these indicators by working on different levers (e.g. infrastructure, model architecture, transfer learning, etc.).
1079+
It is important to question and raise awareness of environmental costs. In particular one can: (i) measure the environmental cost of data science projects, (ii) publish transparently their environmental impact, expliciting the split between train and production phases, (iii) improve on these indicators by working on different levers (e.g. infrastructure, model architecture, transfer learning, etc.). It has been demonstrated that such choices can impact the carbon footprint of model training up to x100-x1000 (see resources below).
10501080

10511081
</details>
10521082

@@ -1055,7 +1085,11 @@ It is important to question and raise awareness of environmental costs. In parti
10551085

10561086
- (Software & Tools) *[ML Impact Calculator](https://mlco2.github.io/impact/)*
10571087
- (Software & Tools) *[Code Carbon](https://codecarbon.io/)*: python library for evaluation the carbon cost of executing a script
1088+
- (Web article) (In French) *[La frugalité, ou comment empêcher l’IA de franchir les limites](https://www.quantmetry.com/blog/ia-confiance-frugalite/)*, Geoffray Brelurut (Quantmetry), June 2023
1089+
- (Academic paper) *[Carbon Emissions and Large Neural Network Training](https://arxiv.org/ftp/arxiv/papers/2104/2104.10350.pdf)*, David Patterson, Joseph Gonzalez, Quoc Le, Chen Liang, Lluis-Miquel Munguia, Daniel Rothchild, David So, Maud Texier, Jeff Dean, 2021. Extract : *Remarkably, the choice of DNN, datacenter, and processor can reduce the carbon footprint up to ~100-1000X*
1090+
- (Academic paper) *[Estimating the carbon footprint of Bloom, a 176B parameter language model](https://arxiv.org/pdf/2211.02001.pdf)*, Alexandra Sasha Luccioni, Sylvain Viguier, Anne-Laure Ligozat, 2022. Extract: *While we will predominantly focus on model training, we will also take into account the emissions produced by manufacturing the computing equipment used for running the training, the energy-based operational emissions, as well as the carbon footprint of model deployment and inference*
10581091
- (Web article) (In French) *[IA durable : ce que les professionnels de la donnée peuvent faire](https://medium.com/quantmetry/ia-durable-et-sobri%C3%A9t%C3%A9-num%C3%A9rique-ce-que-les-professionnels-de-la-donn%C3%A9e-peuvent-faire-5782289b73cc)*, Geoffray Brerelut and Grégoire Martinon, May 2021
1092+
- (Academic paper) *[Sustainable AI: Environmental Implications, Challenges and Opportunities](https://arxiv.org/abs/2111.00364)*, Facebook AI, 2021
10591093
- (Web article) *[The carbon impact of artificial intelligence](https://www.nature.com/articles/s42256-020-0219-9)*, Payal Dhar, 2020
10601094
- (Web article) *[AI and Compute](https://openai.com/blog/ai-and-compute/)*, OpenAI, 2018
10611095
- (Academic paper) *[Green AI](https://cacm.acm.org/magazines/2020/12/248800-green-ai/fulltext)*, R. Schwart et al. 2020

0 commit comments

Comments
 (0)