You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: assessment_framework_eng.md
+45-11Lines changed: 45 additions & 11 deletions
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
3
3
The [evaluation framework](#evaluation-framework-to-assess-the-maturity-of-an-organisation) below is the result of the participatory work initiated in the spring of 2019 by Labelia Labs (ex- Substra Foundation) and ongoing since then. It is based on the identification of the risks that we are trying to prevent by aiming for a responsible and trustworthy practice of data science, and best practices to mitigate them. It also brings together for each topic technical resources that can be good entry points for interested organisations.
4
4
5
-
Last update: 2nd semester 2022.
5
+
Last update: 1st semester 2023.
6
6
7
7
## Evaluation framework to assess the maturity of an organisation
8
8
@@ -17,14 +17,14 @@ The evaluation is composed of the following 6 sections:
17
17
18
18
---
19
19
20
-
### Section 1 - Protecting personal or confidential data
20
+
### Section 1 - Protecting personal or confidential data and comply with regulatory requirements
21
21
22
-
**[Data privacy]**
22
+
**[Data privacy and regulatory compliance]**
23
23
24
-
The use of personal or confidential data carries the risk of exposure of such data, which can have very detrimental consequences for the producers, controllers or subjects of such data. Particularly in data science projects, they must therefore be protected and the risks of their leakage or exposure must be minimised.
24
+
The use of personal or confidential data carries the risk of exposure of such data, which can have very detrimental consequences for the producers, controllers or subjects of such data. Particularly in data science projects, they must therefore be protected and the risks of their leakage or exposure must be minimised. Additionnally, AI models themselves can be attacked and must be protected. Finally, regulatory requirements specific to AI systems but be identified, known, and the data science activities of the organization must be compliant.
25
25
26
26
[_[⇧ back to the list of sections](#evaluation-framework-to-assess-the-maturity-of-an-organisation)_]
27
-
[_[⇩ next section](#section-2---preventing-bias-developing-non-discriminatory-models)
27
+
[_[⇩ next section](#section-2---preventing-bias-developing-non-discriminatory-models)_]
28
28
29
29
---
30
30
@@ -100,6 +100,13 @@ In addition to identifying regulations and compliance approaches, it is importan
100
100
101
101
</details>
102
102
103
+
<details>
104
+
<summary>Ressources1.3 :</summary>
105
+
106
+
- (Academic paper) *[Do Foundation Model Providers Comply with the Draft EU AI Act?](https://crfm.stanford.edu/2023/06/15/eu-ai-act.html)*, Rishi Bommasani and Kevin Klyman and Daniel Zhang and Percy Liang (Stanford University, Center for Research on Foundation Models), June 2023
107
+
108
+
</details>
109
+
103
110
---
104
111
105
112
Q1.4 : **Applicable legislation and contractual requirements - Auditing and certification**
@@ -1016,6 +1023,27 @@ Using automatic systems based on models whose rules have been "learned" (and not
1016
1023
1017
1024
</details>
1018
1025
1026
+
---
1027
+
1028
+
Q5.6 : **Logging predictions from AI models**
1029
+
If your organisation provides or operates AI model-based applications to customers or third parties, to enable auditability of such applications and facilitate their continuous improvement, it is key to implement predictions logging. On that topic:
1030
+
1031
+
R5.6 :
1032
+
_(Type: single answer)_
1033
+
_(Select one answer only, which best corresponds to the level of maturity of the organisation on this topic)_
1034
+
_(Specific risk domain: use of AI models, provision or operation of AI model-based applications for customers or third parties)_
1035
+
1036
+
-[ ] 5.6.a Our organisation does not use AI models on its own behalf or on behalf of its clients, and does not provide its clients with applications based on AI models | _(Concerned / Not concerned)_
1037
+
-[ ] 5.6.b Logging predictions from AI models used in production is not yet systematically implemented
1038
+
-[ ] 5.6.c We systematically log all predictions from AI models used in production (coupled with the input data and the associated models references)
1039
+
1040
+
<details>
1041
+
<summary>Expl5.6 :</summary>
1042
+
1043
+
Using automatic systems based on AI models whose rules have been learned questions the way organisations design and operate their products and services. It is important to preserve the responsiveness and resilience of organisations using those AI models, particularly in dealing with situations where AI models have led to an undesirable outcome for the organisation or its stakeholders. To that end, logging predictions from AI models used in production (coupled with the input data and the associated models references) is key to enable ex-post auditability on concrete use cases. It should be noted that predictions might involve personal data and be regulated by GDPR. Anonymization of processed data, when logged & made available to customers or internal operators, could be part of a solution to avoid leaking sensitive information.
1044
+
1045
+
</details>
1046
+
1019
1047
---
1020
1048
---
1021
1049
@@ -1029,24 +1057,26 @@ The implementation of an automatic system based on an AI model can generate nega
1029
1057
1030
1058
---
1031
1059
1032
-
Q6.1 : **CO2 impact**
1033
-
About the CO2 impact of the data science activity in your organisation:
1060
+
Q6.1 : **Environmental impact (energy consumption and carbon footprint)**
1061
+
About the environmental impact of the data science activity in your organisation:
1034
1062
1035
1063
R6.1 :
1036
1064
_(Type: multiple responses possible)_
1037
1065
_(Select all the answer items that correspond to practices in your organisation)_
1038
1066
1039
-
-[ ] 6.1.a At this stage we have not looked at the CO2 impact of our data science activity or our AI models
1040
-
-[ ] 6.1.b We have developed indicators that define what we want to measure regarding the CO2 impact of our data science activity or our models
1067
+
-[ ] 6.1.a At this stage we have not studied specifically the environmental impact of our data science activity or our AI models
1068
+
-[ ] 6.1.b We have developed indicators that define what we want to measure regarding the energy consumption and the carbon footprint of our data science activity or our models
1041
1069
-[ ] 6.1.c We measure our indicators regularly
1042
1070
-[ ] 6.1.d We include their measurements in the model identity cards
1043
1071
-[ ] 6.1.e Monitoring our indicators on a regular basis is a formalised and controlled process, from which we define and drive improvement objectives
1044
-
-[ ] 6.1.f The CO2 impact of our data science activity or our models is made transparent to our counterparts and the general public
1072
+
-[ ] 6.1.f We consolidate an aggregated view of the energy consumtion and carbon footprint of our data science activities
1073
+
-[ ] 6.1.g This aggregated view is taken into account in the global environmental impact evaluation of our organization (e.g. carbon footprint, regulatory GHG evaluation, Paris Agreement compatibility score...)
1074
+
-[ ] 6.1.h The energy consumption and carbon footprint of our data science activity or our models is made transparent to our counterparts and the general public
1045
1075
1046
1076
<details>
1047
1077
<summary>Expl6.1 :</summary>
1048
1078
1049
-
It is important to question and raise awareness of environmental costs. In particular one can: (i) measure the environmental cost of data science projects, (ii) publish transparently their environmental impact, expliciting the split between train and production phases, (iii) improve on these indicators by working on different levers (e.g. infrastructure, model architecture, transfer learning, etc.).
1079
+
It is important to question and raise awareness of environmental costs. In particular one can: (i) measure the environmental cost of data science projects, (ii) publish transparently their environmental impact, expliciting the split between train and production phases, (iii) improve on these indicators by working on different levers (e.g. infrastructure, model architecture, transfer learning, etc.). It has been demonstrated that such choices can impact the carbon footprint of model training up to x100-x1000 (see resources below).
1050
1080
1051
1081
</details>
1052
1082
@@ -1055,7 +1085,11 @@ It is important to question and raise awareness of environmental costs. In parti
- (Software & Tools) *[Code Carbon](https://codecarbon.io/)*: python library for evaluation the carbon cost of executing a script
1088
+
- (Web article) (In French) *[La frugalité, ou comment empêcher l’IA de franchir les limites](https://www.quantmetry.com/blog/ia-confiance-frugalite/)*, Geoffray Brelurut (Quantmetry), June 2023
1089
+
- (Academic paper) *[Carbon Emissions and Large Neural Network Training](https://arxiv.org/ftp/arxiv/papers/2104/2104.10350.pdf)*, David Patterson, Joseph Gonzalez, Quoc Le, Chen Liang, Lluis-Miquel Munguia, Daniel Rothchild, David So, Maud Texier, Jeff Dean, 2021. Extract : *Remarkably, the choice of DNN, datacenter, and processor can reduce the carbon footprint up to ~100-1000X*
1090
+
- (Academic paper) *[Estimating the carbon footprint of Bloom, a 176B parameter language model](https://arxiv.org/pdf/2211.02001.pdf)*, Alexandra Sasha Luccioni, Sylvain Viguier, Anne-Laure Ligozat, 2022. Extract: *While we will predominantly focus on model training, we will also take into account the emissions produced by manufacturing the computing equipment used for running the training, the energy-based operational emissions, as well as the carbon footprint of model deployment and inference*
1058
1091
- (Web article) (In French) *[IA durable : ce que les professionnels de la donnée peuvent faire](https://medium.com/quantmetry/ia-durable-et-sobri%C3%A9t%C3%A9-num%C3%A9rique-ce-que-les-professionnels-de-la-donn%C3%A9e-peuvent-faire-5782289b73cc)*, Geoffray Brerelut and Grégoire Martinon, May 2021
0 commit comments