@@ -14,36 +14,111 @@ this data after any operation that ClimaCore performs.
14
14
15
15
## Example
16
16
17
- ``` @example
17
+ ### Print ` NaNs ` when they are found
18
+
19
+ In this example, we add a callback that simply prints ` NaNs found ` every
20
+ instance when they are detected in a ` ClimaCore ` operation.
21
+
22
+ To do this, we need two ingredients:
23
+
24
+ First, we need to enable the callback system:
25
+ ``` @example clima_debug
18
26
import ClimaCore
19
- using ClimaCore: DataLayouts
20
27
ClimaCore.DebugOnly.call_post_op_callback() = true
28
+ ```
29
+
30
+ The line ` ClimaCore.DebugOnly.call_post_op_callback() = true ` means that at the
31
+ end of every ` ClimaCore ` operation, the function
32
+ ` ClimaCore.DebugOnly.post_op_callback ` is called. By default, this function does
33
+ nothing. So, the second ingredient is to define a method:
34
+ ``` @example clima_debug
21
35
function ClimaCore.DebugOnly.post_op_callback(result, args...; kwargs...)
22
- if any(isnan, parent(data ))
23
- println("NaNs found!")
24
- end
36
+ if any(isnan, parent(result ))
37
+ println("NaNs found!")
38
+ end
25
39
end
26
-
27
- FT = Float64;
28
- data = DataLayouts.VIJFH{FT}(Array{FT}, zeros; Nv=5, Nij=2, Nh=2)
29
- @. data = NaN
30
40
```
41
+ If needed, ` post_op_callback ` can be specialized or behave differently in
42
+ different cases, but here, it only checks if ` NaN ` s are in the given that.
31
43
32
44
Note that, due to dispatch, ` post_op_callback ` will likely need a very general
33
45
method signature, and using `post_op_callback
34
46
(result::DataLayouts.VIJFH, args...; kwargs...)` above fails (on the CPU),
35
47
because ` post_op_callback ` ends up getting called multiple times with different
36
48
datalayouts.
37
49
50
+ Now, let us put everything together and demonstrate a complete example:
51
+
52
+ ``` @example clima_debug
53
+ import ClimaCore
54
+ ClimaCore.DebugOnly.call_post_op_callback() = true
55
+ function ClimaCore.DebugOnly.post_op_callback(result, args...; kwargs...)
56
+ if any(isnan, parent(result))
57
+ println("NaNs found!")
58
+ end
59
+ end
60
+
61
+ FT = Float64
62
+ data = ClimaCore.DataLayouts.VIJFH{FT}(Array{FT}, zeros; Nv=5, Nij=2, Nh=2)
63
+ @. data = NaN
64
+ ClimaCore.DebugOnly.call_post_op_callback() = false # hide
65
+ ```
66
+ This example should print ` NaN ` on your standard output.
67
+
68
+ ### Infiltrating
69
+
70
+ [ Infiltrator.jl] ( https://github.com/JuliaDebug/Infiltrator.jl ) is a simple
71
+ debugging tool for Julia packages.
72
+
73
+ Here is an example, where we can use Infiltrator.jl to find where NaNs is coming
74
+ from interactively.
75
+
76
+ ``` julia
77
+ import ClimaCore
78
+ import Infiltrator # must be in your default environment
79
+ ClimaCore. DebugOnly. call_post_op_callback () = true
80
+ function ClimaCore. DebugOnly. post_op_callback (result, args... ; kwargs... )
81
+ if any (isnan, parent (result))
82
+ println (" NaNs found!" )
83
+ # Let's define the stack trace so that we know where this came from
84
+ st = stacktrace ()
85
+
86
+ # Let's use Infiltrator.jl to exfiltrate to drop into the REPL.
87
+ # Now, `Infiltrator.safehouse` will be a NamedTuple
88
+ # containing `result`, `args` and `kwargs`.
89
+ Infiltrator. @exfiltrate
90
+ end
91
+ end
92
+
93
+ FT = Float64
94
+ data = ClimaCore. DataLayouts. VIJFH {FT} (Array{FT}, zeros; Nv= 5 , Nij= 2 , Nh= 2 )
95
+ @. data = NaN
96
+ # Let's see what happened
97
+ (;result, args, kwargs, st) = Infiltrator. safehouse;
98
+
99
+ # You can print the stack trace, to see where the NaNs were found:
100
+ ClimaCore. DebugOnly. print_depth_limited_stack_trace (st;maxtypedepth= 1 )
101
+
102
+ # Once there, you can see that the call lead you to `copyto!`,
103
+ # Inspecting `args` shows that the `Broadcasted` object used to populate the
104
+ # result was:
105
+ julia> args[2 ]
106
+ Base. Broadcast. Broadcasted {Base.Broadcast.DefaultArrayStyle{0}} (identity, (NaN ,))
107
+
108
+ # And there's your problem, NaNs is on the right-hand-side of that assignment.
109
+ ```
110
+
111
+ ### Caveats
112
+
38
113
!!! warn
39
114
40
- While this debugging tool may be helpful, it's not bullet proof. NaNs can
115
+ While `post_op_callback` may be helpful, it's not bullet proof. NaNs can
41
116
infiltrate user data any time internals are used. For example `parent
42
117
(data) .= NaN` will not be caught by ClimaCore.DebugOnly, and errors can be
43
118
observed later than expected.
44
119
45
120
!!! note
46
121
47
- This method is called in many places, so this is a performance-critical code
48
- path and expensive operations performed in `post_op_callback` may
49
- significantly slow down your code.
122
+ `post_op_callback` is called in many places, so this is a
123
+ performance-critical code path and expensive operations performed in
124
+ `post_op_callback` may significantly slow down your code.
0 commit comments