As I redid this solution in verilog, I decided to change my rust implementation. Sadly
already wrote my rust solution out. My implmentation in rust will not much the soruce code.
Instead of trying to rotate v = (1,0,0) to (sin(
Cordic stands for Coordinate Rotation Digitial Computer, it is boardly speaking an algorithm but also an idea for computer transcendental functions like Sine, Cosine, and Tangent. These functions could already be solved by using taylor series as its commonly known that
These work great but they are rather computationally expensive. The amount of percision you need is challenging because intermediate values and successive multiplications will cost accuracy. While you can solve this by using floating point numbers these come with there own challenges and your system potentially might not have floating point. But there is a bigger issue if you are using floating point you are missing out on the fun that is fixed point integer solution!
Instead Cordic offers a method that is hardware friendly that can implement even when you only have shifters. The method is also surprisingly accurate!
What is Cordic doing than? Well first lets set that stage say we want to calculate sin(u) or cos(u). If this is the case we get the following diagram
pi/2
| .
| /|
| / |
| C / |
| / |
| / | B
| / |
|/ t |
------------->-----
| A 1
|
|
|
|
|
-pi/2
Let C be a line of length 1 that terminates on the unit circle. Than we can say that
(A,B) = (cos(
You might be seeing a problem if we don't know cos(
This way even if we can't predict the exact rotation we need, we
can pick a set of rotations that approach and in a sense binary search for
This splits our problem into two pieces a rotation and than a scalar multiply
which we can calculate seperately. So for our binary search we need to select a
series of angles but what set? Well we could chose in degrees
Now while we are at we can't forget that we also have these scalar
I also want to save the square of this value which for these iterations was 0.3687561272. The reason why we be come clear in a bit.
Now lets list the angles we are going to use to converge to our target
So we can than calculate (x_{i+1}, y_{i+1},
After each iteration we should have have the parts for that next iteration and
after every iteration we should be able to return
($Kx_n$= cos($\theta$), $Ky_n$ = sin(
Fixed point is just normal integer math on an ALU however what changes is how we
interpert intermediate values. The normal way of look at a binary number is to
say that if it is n bits long it can represent either 0 to
Thus to represent 1 in Q16 our encoding would be 0x00010000. But you could also represent 1.5 as the halfway point which would be 0x00018000. The smallest positive value you can represent is 0x00000001 = \frac{1}{2^16} = 0.0000152587890625
I think a plan or a record of how I came to implementing the algorithm might be useful.
When I was implementing this algorithm I first started with a Fxp type that was a A(15,16) signed type and making a fixed2float and float2fixed function to get the easy decimal printing noting that when you convert between the two you might get signficant noise error! Noise is something I haven't mentioned to much but when working with decmials if an expression requires more bits of accuracy below or above than are available you are going to get an ever so slightly incorrect value. This matters when you implement algorithms with more and more additions and multiplications etc. Fixed point gets complex I might make a follow on when I feel confident in explaing it fully.
After I had these types I implemented the * / + - operators in every fasion in rust. I am implementing these on a x86 machine which means that my 64 bit registers are essentially free so rather than implementing smart 32 bit multiplications and additions I simply upcasted and did standard operations before shifting back into range for my expression. For mutliplication it would (A
- B) >> 16 to scale back down. Why scale back down well, If you are
going to multiply 2 numbers by a * b and they can be
$2^16$ than your resultant will be$2^32$ which is not the correct range for the resultant. Thus we scall back down. Note this does not prevent overflow it just for selecting the correct range for your representation. If you overflow you have a big issue. Divison works similar but like ( a << 16 ) / b. This is because b could be$2^16$ and if we divide by it our fixed point scale will be showing the wrong thing.
Finally I implemented Cordic, But I started with some constant tables, you can
use the ones I made but you can also use your float2fixed function to generate
the tables for you like I did. After I made the Cordic function (my original
implmentation didn't have the Cordic step broken out). Fold works really well in
rust for this function since you use the Cordic angles one time for each time.
The iter step works well as (
A full implementation shouldn't take more than 30 lines if you get the intermediate values passed correctly.
After I did I implemented a taylor expansion with my fixed point type which quickly had overflow issues but could estimate with at most 8 terms before overflowing but 7 was more stable. The difference between the two was very marginal from actual cosine with 16 iterations I had 4 bits of accuacy. I read online that Hardware will usually do 40 iterations after scaling inputs but will gain around 1 bit of accuracy per iteration. I would need to expand my fixed point representation to support such accuracy but I felt content after getting 5 bits of accuracy.
I didn't do synthesis because I lacked a FPGA and verilog experience but I wanted to try verilog genvar keyword out because it looked so elegant and lisp like. To do the code gen? hw gen I used yosys and wrote a neat makefile for generating cxxrtl backends + testbench checking for. Ideally i would have done the cxx in rust but that seemed like something i didn't want to do at this time. The Verilog solution also uses Q8.16 which uses less flops than Q16.16