4
4
5
5
![ college] ( docs/college.png )
6
6
7
- This is a Julia v.0.6 implementation for the Invariant Causal Prediction algorithm of [ Peters, Bühlmann and Meinshausen] ( https://doi.org/10.1111/rssb.12167 ) . The method uncovers direct causes of a target variable from datasets under different environments (e.g., interventions or experimental settings).
7
+ This is a ** Julia 1.x ** implementation for the ** Invariant Causal Prediction** algorithm of [ Peters, Bühlmann and Meinshausen] ( https://doi.org/10.1111/rssb.12167 ) . The method uncovers direct causes of a target variable from datasets under different environments (e.g., interventions or experimental settings).
8
8
9
9
See also this [ R package] ( https://cran.r-project.org/package=InvariantCausalPrediction ) and [ this report] ( docs/InvariantCausal.pdf ) .
10
10
11
11
#### Changelog
12
12
13
- - 2018/06/20: version 0.1.1
13
+ - 2020/12/03: version 1.0.0 (Julia 1.x)
14
+ - 2018/06/20: version 0.1.1 (Julia 0.6)
14
15
15
16
#### Dependencies
16
17
@@ -44,122 +45,146 @@ Generate a simple [Gaussian structure equation model](https://en.wikipedia.org/w
44
45
``` julia
45
46
julia> using InvariantCausal
46
47
julia> using Random
47
- julia> Random. seed! (1926 )
48
+ julia> Random. seed! (77 )
48
49
julia> sem_obs = random_gaussian_SEM (21 , 3 )
50
+
49
51
Gaussian SEM with 21 variables:
50
52
B =
51
53
Sparsity Pattern
52
54
┌───────────┐
53
- 1 │⠀⠀⡄⠀⠐⡠⠀⡀⢢⡄ ⠀│ > 0
54
- │⠠⠀⠄⡀⡸⡠⠠⡀⠀⢠ ⠀│ < 0
55
- │⠀⠈⠠⠀⠈⠉⠀⠄⠀⠀⠀ │
56
- │⠀⢂⠢⠀⢨⢀⠀⡀⢀⠂⡒ │
57
- │⠀⠀⠀⠀⠠⢲⠀⠄⠀⠀⠐ │
58
- 21 │⠀⠀⠹⠀⠀⠐⠐⠆⠐⠥ ⠀│
55
+ 1 │⠀⠠⠀⠀⢐⠀⠀⠄⠀⢔ ⠀│ > 0
56
+ │⠠⠀⠠⠨⠁⠀⠄⠀⠀⠸ ⠀│ < 0
57
+ │⠠⠈⠈⠀⠌⠠⠀⠅⠀⠩⠉ │
58
+ │⠠⣨⠴⠰⠪⠠⠄⠀⠸⠉⣐ │
59
+ │⢀⠲⠈⢠⠠⠀⠀⠂⠀⠲⠁ │
60
+ 21 │⠀⠐⠀⠀⠠⠠⠀⠀⠀⠔ ⠀│
59
61
└───────────┘
60
62
1 21
61
- nz = 63 σ ² = [1.3995969539576336 , 1.3797542626927117 , 1.8725924411035275 , 1.1558670231511754 , 0.6313157118985134 , 1.3861564933413408 , 1.4515091017758692 , 1.7392330458711087 , 1.55834175481778 , 1.1102263218265493 , 1.2459898446608833 , 0.9582172366364653 , 0.8341414371776826 , 1.9452530507977812 , 1.48880401416046 , 1.5359339337413704 , 1.691737599591161 , 0.6496166911064964 , 1.1210005303098285 , 1.1459738623697713 , 0.6920288559801938 ]
63
+ nz = 70 σ ² = [1.9727697778060356 , 1.1224733663047743 , 1.1798805640594814 , 1.2625825149076064 , 0.8503782631176267 , 0.5262963446298372 , 1.3835334059064883 , 1.788996301274282 , 1.759286517329432 , 0.842571682652995 , 1.713382150423666 , 1.4524484793202235 , 1.9464648511794784 , 1.7729995603828317 , 0.7110857327642559 , 1.6837378902964577 , 1.085405687408806 , 1.3069888003095986 , 1.3933773717634643 , 1.0571823834646068 , 1.9187793877731028 ]
62
64
```
63
65
64
- Suppose we want to infer the direct causes for the last variables, which are
66
+ Suppose we want to infer the direct causes for the last variables, i.e., 9, 11 and 18.
65
67
66
68
``` julia
67
69
julia> causes (sem_obs, 21 )
68
- 2 - element Array{Int64,1 }:
69
- 2
70
- 5
70
+ 3 - element Array{Int64,1 }:
71
+ 9
72
+ 11
73
+ 18
71
74
```
72
75
73
- Firstly, let us generate some observational data and call it environment 1.
76
+ Firstly, let us generate some observational data and call it ** environment 1** .
74
77
75
78
``` julia
76
79
julia> X1 = simulate (sem_obs, 1000 )
77
80
```
78
81
79
- Then, we simulate from environment 2 by performing do-intervention on variables 3, 4, 5, 6. Here we set them to fixed random values.
82
+ Then, we simulate from ** environment 2** by performing ** do-intervention** on variables 3, 4, 5, 6. Here we set them to fixed random values.
80
83
81
84
``` julia
82
85
julia> X2 = simulate (sem_obs, [3 ,4 ,5 ,6 ], randn (4 ), 1000 )
83
86
```
84
87
85
- We run the algorithm on environments 1 and 2.
88
+ We run the algorithm on ** environments 1 and 2** .
86
89
87
90
``` julia
88
91
julia> causalSearch (vcat (X1, X2)[:,1 : 20 ], vcat (X1, X2)[:,21 ], repeat ([1 ,2 ], inner= 1000 ))
89
92
90
- 8 variables are screened out from 20 variables with lasso: [2 , 5 , 6 , 8 , 13 , 15 , 16 , 20 ]
91
- Causal invariance search across 2 environments with at α= 0.01 (| S| = 8 , method = chow)
92
-
93
- S = [] : p- value = 0.0000 [ ] ⋂ = [2 , 5 , 6 , 8 , 13 , 15 , 16 , 20 ]
94
- S = [2 ] : p- value = 0.1376 [* ] ⋂ = [2 ]
95
- S = [20 ] : p- value = 0.0000 [ ] ⋂ = [2 ]
96
- S = [16 ] : p- value = 0.0000 [ ] ⋂ = [2 ]
97
- S = [15 ] : p- value = 0.0000 [ ] ⋂ = [2 ]
98
- ...
99
- S = [2 , 5 , 6 ] : p- value = 0.3557 [* ] ⋂ = [2 ]
100
- S = [5 , 6 , 20 ] : p- value = 0.1879 [* ] ⋂ = Int64[]
93
+ 8 variables are screened out from 20 variables with lasso: [5 , 7 , 8 , 9 , 11 , 12 , 15 , 17 ]
94
+ Causal invariance search across 2 environments with at α= 0.01 (| S| = 8 , method = chow, model = linear)
95
+
96
+ S = [] : p- value = 0.0000 [ ] ⋂ = [5 , 7 , 8 , 9 , 11 , 12 , 15 , 17 ]
97
+ S = [5 ] : p- value = 0.0000 [ ] ⋂ = [5 , 7 , 8 , 9 , 11 , 12 , 15 , 17 ]
98
+ S = [17 ] : p- value = 0.0000 [ ] ⋂ = [5 , 7 , 8 , 9 , 11 , 12 , 15 , 17 ]
99
+ S = [15 ] : p- value = 0.0000 [ ] ⋂ = [5 , 7 , 8 , 9 , 11 , 12 , 15 , 17 ]
100
+ S = [12 ] : p- value = 0.0000 [ ] ⋂ = [5 , 7 , 8 , 9 , 11 , 12 , 15 , 17 ]
101
+ S = [11 ] : p- value = 0.0144 [* ] ⋂ = [11 ]
102
+ S = [9 ] : p- value = 0.0000 [ ] ⋂ = [11 ]
103
+ S = [8 ] : p- value = 0.0000 [ ] ⋂ = [11 ]
104
+ S = [7 ] : p- value = 0.0000 [ ] ⋂ = [11 ]
105
+ S = [11 , 5 ] : p- value = 0.0000 [ ] ⋂ = [11 ]
106
+ S = [11 , 12 ] : p- value = 0.0000 [ ] ⋂ = [11 ]
107
+ S = [11 , 15 ] : p- value = 0.0007 [ ] ⋂ = [11 ]
108
+ S = [7 , 11 ] : p- value = 0.0082 [ ] ⋂ = [11 ]
109
+ S = [11 , 8 ] : p- value = 0.0000 [ ] ⋂ = [11 ]
110
+ S = [9 , 11 ] : p- value = 0.0512 [* ] ⋂ = [11 ]
111
+ S = [17 , 11 ] : p- value = 0.0000 [ ] ⋂ = [11 ]
112
+ S = [9 , 12 ] : p- value = 0.0000 [ ] ⋂ = [11 ]
113
+ S = [9 , 15 ] : p- value = 0.0064 [ ] ⋂ = [11 ]
114
+ S = [7 , 9 ] : p- value = 0.0000 [ ] ⋂ = [11 ]
115
+ S = [9 , 8 ] : p- value = 0.0000 [ ] ⋂ = [11 ]
116
+ S = [9 , 5 ] : p- value = 0.7475 [* ] ⋂ = Int64[]
117
+
118
+ Tested 21 sets: 3 sets are accepted.
101
119
102
120
* Found no causal variable (empty intersection).
103
121
104
- ⋅ Variables considered include [2 , 5 , 6 , 8 , 13 , 15 , 16 , 20 ]
122
+ ⋅ Variables considered include [5 , 7 , 8 , 9 , 11 , 12 , 15 , 17 ]
105
123
```
106
124
107
- The algorithm cannot find any direct causal variables (parents) of variable 21 due to insufficient power of two environments. The algorithm tends to discover more with more environments. Let us define a new environment where we perform a noise (soft) intervention that changes the equations for 5 variables other than the target. Note it is important that the target is left untouched.
125
+ The algorithm ** cannot find any** direct causal variables (parents) of variable 21 due to ** insufficient power** of two environments. The algorithm tends to ** discover more** with ** more environments** . Let us define a new environment where we perform a ** noise (soft) intervention** that changes the equations for 5 variables other than the target. Note it is important that the ** target** is left ** untouched** .
108
126
109
127
``` Julia
110
128
julia> sem_noise, variables_intervened = random_noise_intervened_SEM (sem_obs, p_intervened= 5 , avoid= [21 ])
111
129
112
130
(Gaussian SEM with 21 variables:
113
131
B =
114
132
Sparsity Pattern
115
- ┌───────────── ┐
116
- 1 │⠀⠀⠂⠄⠀⠔⠀⠀⠂⠂⡆ │ > 0
117
- │⢀⢠⠈⡀⠠⠠⣀⠀⠀⠅ ⠀│ < 0
118
- │⠀⠐⠉⠀⠈⠠⠘⠀⠀⠆ ⠉│
119
- │⠀⠐⢠⠀⠀⡀⠐⠀⢂⠀⡂ │
120
- │⠀⠠⢐⠀⠉⠵⠠⠁⠄⠈⠂ │
121
- 21 │⠈⠄⠸⠀⠀⠈⠀⠀⠉⠀⠁ │
122
- └───────────── ┘
133
+ ┌───────────┐
134
+ 1 │⠀⠠⠀⠀⢐⠀⠀⠄⠀⢔⠀ │ > 0
135
+ │⠠⠀⠠⠨⠁⠀⠄⠀⠀⠸ ⠀│ < 0
136
+ │⠠⠈⠈⠀⠌⠠⠀⠅⠀⠩ ⠉│
137
+ │⠠⣨⠴⠰⠪⠠⠄⠀⠸⠉⣐ │
138
+ │⢀⠲⠈⢠⠠⠀⠀⠂⠀⠲⠁ │
139
+ 21 │⠀⠐⠀⠀⠠⠠⠀⠀⠀⠔⠀ │
140
+ └───────────┘
123
141
1 21
124
- nz = 63
125
- σ² = [1.3996 , 1.20882 , 1.87259 , 1.15587 , 0.631316 , 1.38616 , 1.45151 , 1.73923 , 2.55396 , 1.11023 , 1.24599 , 0.958217 , 0.506628 , 1.94525 , 2.16212 , 1.53593 , 1.69174 , 0.649617 , 1.121 , 2.19366 , 0.692029 ], [9 , 15 , 13 , 2 , 20 ])
142
+ nz = 70 σ² = [1.9727697778060356 , 1.1224733663047743 , 1.1798805640594814 , 1.2625825149076064 , 0.8503782631176267 , 0.5262963446298372 , 1.3835334059064883 , 1.788996301274282 , 1.759286517329432 , 0.5837984015051159 , 3.01957479564807 , 0.9492838187140921 , 1.9398913901673531 , 1.7729995603828317 , 0.7110857327642559 , 1.6837378902964577 , 1.2089053651343495 , 1.3069888003095986 , 1.3933773717634643 , 1.0571823834646068 , 1.9187793877731028 ], [17 , 13 , 10 , 11 , 12 ])
126
143
```
127
144
128
- Here the equations for variables 9, 15, 13, 2, 20 have been changed. Now we simulate from this modified SEM and call it environment 3. We run the algorithm on all 3 environments.
145
+ Here the equations for variables 17, 13, 10, 11, 12 have been changed. Now we simulate from this modified SEM and call it ** environment 3** . We run the algorithm on all ** 3 environments** .
129
146
130
147
``` Julia
131
148
julia> X3 = simulate (sem_noise, 1000 )
132
149
julia> causalSearch (vcat (X1, X2, X3)[:,1 : 20 ], vcat (X1, X2, X3)[:,21 ], repeat ([1 ,2 ,3 ], inner= 1000 ))
133
150
```
134
151
135
- The algorithm searches over subsets for a while and successfully discovers variables 2.
152
+ The algorithm searches over subsets for a while and successfully ** discovers** variables 11. The other two causes, 9 and 18, can hopefully be discovered given even more environments.
136
153
137
154
```
138
- 8 variables are screened out from 20 variables with lasso: [1, 2, 5, 6, 8, 13, 15, 20]
139
- Causal invariance search across 3 environments with at α=0.01 (|S| = 8, method = chow)
140
-
141
- S = [] : p-value = 0.0000 [ ] ⋂ = [1, 2, 5, 6, 8, 13, 15, 20]
142
- S = [1] : p-value = 0.0000 [ ] ⋂ = [1, 2, 5, 6, 8, 13, 15, 20]
143
- S = [20] : p-value = 0.0000 [ ] ⋂ = [1, 2, 5, 6, 8, 13, 15, 20]
144
- S = [15] : p-value = 0.0000 [ ] ⋂ = [1, 2, 5, 6, 8, 13, 15, 20]
145
- S = [13] : p-value = 0.0000 [ ] ⋂ = [1, 2, 5, 6, 8, 13, 15, 20]
146
- S = [8] : p-value = 0.0000 [ ] ⋂ = [1, 2, 5, 6, 8, 13, 15, 20]
147
- S = [6] : p-value = 0.0000 [ ] ⋂ = [1, 2, 5, 6, 8, 13, 15, 20]
148
- S = [5] : p-value = 0.0001 [ ] ⋂ = [1, 2, 5, 6, 8, 13, 15, 20]
149
- S = [2] : p-value = 0.1714 [*] ⋂ = [2]
150
- S = [5, 1] : p-value = 0.0000 [ ] ⋂ = [2]
151
- S = [2, 5] : p-value = 0.2211 [*] ⋂ = [2]
152
- S = [5, 20] : p-value = 0.0000 [ ] ⋂ = [2]
153
- ...
154
- S = [1, 13, 2, 5, 8, 15, 6] : p-value = 0.4380 [*] ⋂ = [2]
155
- S = [20, 6, 13, 2, 5, 8, 15, 1] : p-value = 0.6916 [*] ⋂ = [2]
156
-
157
- * Causal variables include: [2]
158
-
159
- variable 1.0 % 99.0 %
160
- 2 0.5831 0.7054
161
-
162
- ⋅ Variables considered include [1, 2, 5, 6, 8, 13, 15, 20]
155
+ causalSearch(vcat(X1, X2, X3)[:,1:20], vcat(X1, X2, X3)[:,21], repeat([1,2,3], inner=1000))
156
+ 8 variables are screened out from 20 variables with lasso: [4, 5, 7, 8, 9, 11, 12, 16]
157
+ Causal invariance search across 3 environments with at α=0.01 (|S| = 8, method = chow, model = linear)
158
+
159
+ S = [] : p-value = 0.0000 [ ] ⋂ = [4, 5, 7, 8, 9, 11, 12, 16]
160
+ S = [4] : p-value = 0.0000 [ ] ⋂ = [4, 5, 7, 8, 9, 11, 12, 16]
161
+ S = [16] : p-value = 0.0000 [ ] ⋂ = [4, 5, 7, 8, 9, 11, 12, 16]
162
+ S = [12] : p-value = 0.0000 [ ] ⋂ = [4, 5, 7, 8, 9, 11, 12, 16]
163
+ S = [11] : p-value = 0.0084 [ ] ⋂ = [4, 5, 7, 8, 9, 11, 12, 16]
164
+ S = [9] : p-value = 0.0000 [ ] ⋂ = [4, 5, 7, 8, 9, 11, 12, 16]
165
+ S = [8] : p-value = 0.0000 [ ] ⋂ = [4, 5, 7, 8, 9, 11, 12, 16]
166
+ S = [7] : p-value = 0.0000 [ ] ⋂ = [4, 5, 7, 8, 9, 11, 12, 16]
167
+ S = [5] : p-value = 0.0000 [ ] ⋂ = [4, 5, 7, 8, 9, 11, 12, 16]
168
+ S = [4, 11] : p-value = 0.0000 [ ] ⋂ = [4, 5, 7, 8, 9, 11, 12, 16]
169
+ S = [11, 5] : p-value = 0.0000 [ ] ⋂ = [4, 5, 7, 8, 9, 11, 12, 16]
170
+ S = [11, 8] : p-value = 0.0000 [ ] ⋂ = [4, 5, 7, 8, 9, 11, 12, 16]
171
+ S = [7, 11] : p-value = 0.0000 [ ] ⋂ = [4, 5, 7, 8, 9, 11, 12, 16]
172
+ S = [9, 11] : p-value = 0.0000 [ ] ⋂ = [4, 5, 7, 8, 9, 11, 12, 16]
173
+ S = [16, 11] : p-value = 0.0709 [*] ⋂ = [11, 16]
174
+ S = [11, 12] : p-value = 0.0000 [ ] ⋂ = [11, 16]
175
+ ...
176
+ S = [7, 9, 4, 16, 11, 5, 12] : p-value = 0.0000 [ ] ⋂ = [11]
177
+ S = [7, 9, 4, 16, 11, 8, 12] : p-value = 0.0001 [ ] ⋂ = [11]
178
+ S = [7, 4, 9, 16, 11, 5, 8, 12] : p-value = 0.0002 [ ] ⋂ = [11]
179
+
180
+ Tested 256 sets: 6 sets are accepted.
181
+
182
+ * Causal variables include: [11]
183
+
184
+ variable 1.0 % 99.0 %
185
+ 11 0.1123 1.1017
186
+
187
+ ⋅ Variables considered include [4, 5, 7, 8, 9, 11, 12, 16]
163
188
```
164
189
165
190
### Functionalities
@@ -181,24 +206,8 @@ variable 1.0 % 99.0 %
181
206
182
207
### Features
183
208
184
- - High performance implementation in Julia v.0.6
209
+ - High performance implementation in Julia v1.x
185
210
- Faster search:
186
211
- skipping testing supersets of A if A is accepted ( under ` selection_only ` mode)
187
212
- Priority queue to prioritize testing sets likely to be invariant
188
213
189
- ### Todo
190
-
191
- - ~~ Confidence intervals~~
192
- - ~~ Logistic regression~~
193
- - ~~ Variable screening~~
194
- - ~~ glmnet~~
195
- - ~~ HOLP~~
196
- - ~~ Subsampling for large n in Chow's test~~
197
- - Nonparametric two-sample tests
198
- - Hidden variable case
199
- - ~~ Inference of graph and plotting~~
200
-
201
- ### Issues
202
-
203
- - ~~ Better reporting~~
204
-
0 commit comments