Your task is to combine loop unrolling and SIMD operations with the FAXPY operation. Some notes:
Do this lab at the C-level. The C-level solution is sufficient for full credit. Once you finish, if you want to challenge yourself, then do
the lab at the x86-level.
Note the EUs/execution engine of the Broadwell processor, which should limit the number of times you can effectively unroll the loop.
This lab should use SSE/SSE2 operations, which further restrict the pipelines used by our FAXPY operation.
Use only SSE/SSE2 SIMD operations, do not use AVX. No need to do run-time checking for SSE/SSE2. SSE/SSE2 are considered monolithic.
In the run-time/compile-time checking lab we needed something to fall back on if AVX wasn't supported by the processor. Again, no need
for that here.
Do not forget the -O0 flag for gcc. Values of -O1 through -O3 may unroll the loop for you, or insert other optimizations, and we need to
prevent unintentional optimizations.
An older version of this lab manual had you do IAXPY, but doing this with SSE/SSE2 isn't straightforeward. Please do FAXPY instead
Loop Unrolling And SIMD Operation is rated 4.8/5 based on 243 customer reviews.