Skip to content

Commit a9aa47e

Browse files
author
Thomas Koenig
committed
Add a short chapter about STAT and image states.
1 parent c13ddec commit a9aa47e

File tree

1 file changed

+61
-1
lines changed

1 file changed

+61
-1
lines changed

tutorial.md

Lines changed: 61 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -413,7 +413,11 @@ can adjust the bounds. This, for example, would be legal:
413413
allocate (a(from:to)[*])
414414
```
415415
and give you an index running from `1` to `num_images * n`, but
416-
you would still have to specify the correct coarray.
416+
you would still have to specify the correct coindices.
417+
418+
`ALLOCATE` and `DEALLOCATE` also do implicit synchronization,
419+
so you can use the allocated coarrays directly, no need to
420+
specifcy any `SYNC` variant.
417421

418422
# More advanced synchronization - `SYNC IMAGES`
419423

@@ -631,6 +635,62 @@ And here is its output:
631635
1 T T T
632636
All: T F F
633637
```
638+
# Errors, error discovery and program termination
639+
640+
What happens when errors occur and images terminate needs to be
641+
defined carefully. Fortran has facilities to detect failure on
642+
individual compute nodes and offers possibilities to deal with them.
643+
644+
## Image states
645+
646+
There are three states that an image can be in: It can be an
647+
- *active image* if it is running normally
648+
- *stopped image* if it has been terminated normally by reaching
649+
the end of the main program or by executing a `STOP` statement.
650+
- *failed image* when an image stopped working for some reason
651+
(for example a hardware failure) or execution of a `FAIL IMAGE`
652+
statement.
653+
654+
Once an image is in a stopped or failed state, there is no coming
655+
back - it will always remain in that state. An image can also be
656+
terminated by an *error condition*; all other images should then also
657+
be terminated by the system as soon as possible. This is what
658+
usually happens when you try to allocate an already allocated
659+
variable, open a non-existent file for reading without specifying
660+
a `STAT` variable.
661+
662+
## Look at the state you are in
663+
664+
If you synchronize with a failed or stopped image, try to
665+
allocate or deallocate a variable there or other similar things,
666+
what is the system to do? Without direction from the programmer,
667+
it will simply terminate the program (an error condition, as above).
668+
This is not very useful as a fail-safe tactic.
669+
670+
However, the programmer can specify a `STAT` and optionally the
671+
`ERRMSG` arguments to catch the error and act accordingly. It
672+
is then possible to compare the value returned for the `STAT`
673+
argument against predefined values from `iso_fortran_env` and
674+
then use the intrinsic functions `FAILED_IMAGES()` and
675+
`STOPPED_IMAGES()` too look up which ones failed.
676+
677+
```
678+
program main
679+
use iso_fortran_env, only : STAT_FAILED_IMAGE, STAT_STOPPED_IMAGE
680+
integer :: sync_stat, alloc_stat
681+
sync all (stat=sync_stat)
682+
if (stat /= 0) then
683+
if (stat == STAT_FAILED_IMAGE) then
684+
print *,"Failed images: ", failed_images()
685+
else if (stat == STAT_STOPPED_IMAGE) then
686+
print *,"Stopped images: ", stopped_images()
687+
else
688+
print *,"Unforseen error, aborting"
689+
error stop
690+
end if
691+
end if
692+
```
693+
634694
# Getting it to work
635695

636696
## Using gfortran

0 commit comments

Comments
 (0)