You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/blog/northpole-ibm-neuromorphic-ai-hardware/index.md
+5-4Lines changed: 5 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -1,10 +1,7 @@
1
1
---
2
2
title: "NorthPole, IBM's latest Neuromorphic AI Hardware"
3
-
description: "Translating the NorthPole paper from IBM to human language."
3
+
description: "A deep dive into IBM's NorthPole, a brain-inspired AI accelerator. Understand its architecture, 10 core axioms, and how it achieves groundbreaking energy efficiency for neural inference."
@@ -59,6 +56,7 @@ When you want to skip zero-computations, you need to introduce a structured appr
59
56
fig
60
57
src="dally-sparsity.png"
61
58
caption="Sparse neural networks support in hardware [[William J. Dally]](https://www.computer.org/csdl/proceedings-article/hcs/2023/10254716/1QKTnGyUPbG)."
59
+
alt="Sparse neural networks support in hardware"
62
60
>}}
63
61
64
62
## Axiom 2 - Getting inspired by biological neurons
@@ -116,6 +114,7 @@ fig
116
114
src="simd-mac.png"
117
115
width=760px
118
116
caption="The single-instruction-multiple-data MAC unit of NorthPole."
117
+
alt="The single-instruction-multiple-data MAC unit of NorthPole."
119
118
>}}
120
119
121
120
Above it is shown a visual description of how this parallelism is exploited. The total word width is always 8 bit, but more values can be glued together to be processed in parallel in the MAC, which produces more outputs at once for the INT4 and INT2 precisions. This is why in the "Silicon implementation" section of the paper it is written:
@@ -139,6 +138,7 @@ fig
139
138
src="temporal-vs-spatial.png"
140
139
width=760px
141
140
caption="Spatial (left) and temporal (right) architectures."
141
+
alt="Spatial (left) and temporal (right) architectures."
142
142
>}}
143
143
144
144
Eyeriss [[Chen et al.](https://dspace.mit.edu/bitstream/handle/1721.1/101151/eyeriss_isscc_2016.pdf)] proposed this approach and taxonomy in 2016. Field programmable gate arrays (FPGAs) have been doing this since the beginning, with distributed SRAM near the logic or the special purpose macros available on the silicon. I do not know if it is brain-inspired but it makes sense from a silicon perspective if you want to maximize efficiency.
@@ -154,6 +154,7 @@ Take-home message: PEs communicate using dedicated busses, in what is called a n
154
154
fig
155
155
src="northpole-arch.png"
156
156
caption="A snippet of NorthPole architecture [[Modha et al.](https://www.science.org/doi/10.1126/science.adh1174)]"
0 commit comments