Skip to content

Commit f3d4208

Browse files
committed
Merge branch 'master' of https://github.com/abulmo/edax-reversi
2 parents d4090b9 + ea50a0c commit f3d4208

File tree

2 files changed

+23
-113
lines changed

2 files changed

+23
-113
lines changed

.github/workflows/build.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,11 @@ jobs:
1717
os: [ubuntu-latest, windows-latest, macos-latest]
1818
include:
1919
- os: ubuntu-latest
20-
build_command: make build ARCH=x64-modern COMP=gcc OS=linux
20+
build_command: make build ARCH=x86-64-v3 COMP=gcc OS=linux
2121
- os: windows-latest
22-
build_command: make build ARCH=x64 COMP=gcc OS=windows
22+
build_command: make build ARCH=x86-64-v3 COMP=gcc OS=windows
2323
- os: macos-latest
24-
build_command: make build ARCH=arm COMP=gcc OS=osx
24+
build_command: make build ARCH=armv8.5-a COMP=gcc OS=osx
2525

2626
steps:
2727
- uses: actions/checkout@v2

README.md

Lines changed: 20 additions & 110 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ mkdir -p bin
2121
cd src
2222

2323
# e.g. OS X sample
24-
make build ARCH=x64 COMP=gcc OS=osx
24+
make pgo-build ARCH=armv8-5a COMP=clang OS=osx
2525
cd ..
2626
./bin/mEdax
2727
```
@@ -35,7 +35,7 @@ docker run --name "edax" -v "$(pwd)/:/home/edax/" -it edax
3535
cd /home/edax/
3636
mkdir -p bin
3737
cd src
38-
make build ARCH=x64 COMP=gcc OS=linux
38+
make build ARCH=x86-64-v3 COMP=clang OS=linux
3939

4040
cd ..
4141
curl -OL https://github.com/abulmo/edax-reversi/releases/download/v4.4/eval.7z # e.g. use v4.4 eval.dat
@@ -51,111 +51,21 @@ cd src
5151
doxygen
5252
open ../doc/html/index.html
5353
```
54-
=======
55-
# edax-reversi-AVX
56-
Automatically exported from code.google.com/p/okuharaandroid-edax-reversi
57-
58-
=======
59-
# edax-reversi-AVX
60-
Automatically exported from code.google.com/p/okuharaandroid-edax-reversi
61-
62-
Edax is a strong othello program. Its main features are:
63-
64-
fast bitboard based & multithreaded engine.
65-
accurate midgame-evaluation function.
66-
opening book learning capability.
67-
text based rich interface.
68-
multi-protocol support to connect to graphical interfaces or play on Internet (GGS).
69-
multi-OS support to run under MS-Windows, Linux and Mac OS X.
70-
71-
>>>>>>> 81dec96 (Kindergarten last flip for arm32; MSVC arm Windows build (not tested))
72-
This is SSE/AVX optimized version of Edax 4.4.0. Functionally equivalent to the parent project, provided no bugs are introduced.
73-
74-
Thanks to AVX2, x64-modern build solves fforum-40-59.obf 60% faster than official edax-4.4 on Haswell, and runs level 30 autoplay 80% faster.
75-
76-
See http://www.amy.hi-ho.ne.jp/okuhara/bitboard.htm and http://www.amy.hi-ho.ne.jp/okuhara/edaxopt.htm for optimization details in Japanese.
77-
78-
## 1. Mobility (board_sse.c, board_mmx.c)
79-
80-
### 1.1 new SSE2 version of get_moves
81-
Diagonals are SIMD'd using vertical mirroring by bswap.
82-
83-
Athlon -get_moves_sse
84-
problem\fforum-20-39.obf: 111349635 nodes in 0:07.998 (13922185 nodes/s).
85-
mobility: 81.10 < 81.28 +/- 0.17 < 82.03
86-
Athlon +get_moves_sse
87-
problem\fforum-20-39.obf: 111349635 nodes in 0:07.889 (14114544 nodes/s).
88-
mobility: 71.08 < 71.72 +/- 0.34 < 73.53
89-
Core2 -get_moves_sse
90-
problem/fforum-20-39.obf: 111349635 nodes in 0:10.180 (10938078 nodes/s).
91-
mobility: 78.06 < 78.18 +/- 0.08 < 78.41
92-
Core2 +get_moves_sse
93-
problem/fforum-20-39.obf: 111349635 nodes in 0:09.978 (11159514 nodes/s).
94-
mobility: 60.84 < 61.19 +/- 0.13 < 61.47
95-
96-
### 1.2 can_move
97-
Now calls SIMD'd get_moves for x86/x64 build.
98-
99-
## 2. Stability (board.c, board_sse.c, board_mmx.c)
100-
101-
### 2.1 get_full_lines_h, get_full_lines_v
102-
get_full_lines for horizontal and vertical are simplified. The latter is compiled into rotation instrunction.
103-
104-
### 2.2 rearranged loop
105-
The last while loop is rearranged not to call bit_count in case stable == 0.
106-
107-
### 2.3 new SSE2 version with bswap and pcmpeqb
108-
Athlon -get_stability_sse
109-
stability: 90.10 < 90.28 +/- 0.24 < 91.20
110-
Athlon +get_stability_sse
111-
stability: 81.59 < 81.93 +/- 0.73 < 86.25
112-
Core2 -get_stability_sse
113-
stability: 79.24 < 79.39 +/- 0.15 < 79.93
114-
Core2 +get_stability_sse
115-
stability: 71.80 < 71.85 +/- 0.06 < 72.07
116-
117-
### 2.4 get_corner_stability
118-
Kindergarten version eliminates bit_count call.
119-
120-
### 2.5 find_edge_stable
121-
Loop optimization and flip using carry propagation. One time execution but affect total solving time.
122-
123-
## 3. eval.c (4.4.5)
124-
Eval feature calculation using SSE2 / AVX2 (now in eval_sse.c) improves midgame by 15-30% and endgame by 8-12%.
125-
Restoring eval from backup instead of rewinding.
126-
eval_open (one time execution) is also optimized.
127-
128-
## 4. hash.c
129-
I think hash->data.move[0] on line 677 should be hash->data.move[1].
130-
131-
## 5. board_symetry, board_unique (board.c, board_sse.c)
132-
SSE optimization and mirroring reduction. (Not used in solving game)
133-
134-
## 6. endgame_sse.c (4.4.7)
135-
Keep more variables in SSE registers. SSE optimized count_last_flip. Parity sort by shuffle.
136-
137-
## 7. board_get_hash_code (4.5.0)
138-
Changed to use CRC32c. This enables hardware acceleration on modern build.
139-
140-
## 8. AVX2 versions (x64-modern build only)
141-
In many cases AVX2 version is simplest, thanks to variable shift instructions (although they are 3 micro-op instructions).
142-
143-
Benchmarks are on Core i5-4260U (Haswell) 1.4GHz (TB 2.7GHz) single thread.
144-
145-
4.4.0 original x64-modern clang
146-
problem/fforum-20-39.obf: 111349635 nodes in 0:05.726 (19446321 nodes/s).
147-
+optimizations 1-5 above, no-avx2
148-
problem/fforum-20-39.obf: 111349635 nodes in 0:05.342 (20844185 nodes/s).
149-
+get_moves (board_sse.c)
150-
problem/fforum-20-39.obf: 111349635 nodes in 0:05.142 (21654927 nodes/s).
151-
+flip_avx.c
152-
problem/fforum-20-39.obf: 111349635 nodes in 0:04.946 (22513068 nodes/s).
153-
+count_last_flip_sse.c
154-
problem/fforum-20-39.obf: 111349635 nodes in 0:04.906 (22696624 nodes/s).
155-
156-
## 9. makefile
157-
gcc-old, x86 build should be -m32, not -m64. Some flags and defines added for optimization.
158-
<<<<<<< HEAD
159-
>>>>>>> b9d48c1 (Create README.md)
160-
=======
161-
>>>>>>> 81dec96 (Kindergarten last flip for arm32; MSVC arm Windows build (not tested))
54+
## version 4.6
55+
version 4.6 is an evolution of version 4.4 that tried to incorporate changes made by Toshihiko Okuhara in version 4.5.3 and :
56+
- keep the code encapsulated: I revert many pieces of code from version 4.5.3 with manually inlined code.
57+
- remove assembly code (intrinsics are good enough)
58+
- make some changes easily reversible with a macro switch (USE_SIMD, USE_SOLID, etc.)
59+
- remove buggy code and/or buggy file path.
60+
- disable code (#if 0) that I found too slow on my cpu.
61+
- make soft CRC32c behave the same as the hardware CRC32c (version 4.5.3 is buggy here).
62+
- the code switch from c99 to c17 and use stdatomic.h threads.h (if available) stdalign.h
63+
- remove bench.c: most of the functions get optimized out and could not be measured.
64+
- support only 64 bit OSes.
65+
- this version is still in development and may change before I release it.
66+
67+
## makefile
68+
the major change is that the ARCH options are no longer the same, as they are too many possible options to enable avx2, avx512, CRC32c, etc.
69+
Use make -help for a list of options.
70+
71+

0 commit comments

Comments
 (0)