Friday, 14th May, 2010
x86_64 Assembler, Part 2
So, once again I set forth into the wilderness of asm programming, with very little to help me, bar a 64-bit Intel Core 2 Duo (I think… it was a uni box), a few friends busying themselves with work beside me, and the help of those on IRC. It was going to be a long night.
Having already overcome the main hurdles of working out how to print an answer to the screen, how to get it to assemble and link, and the basics of GCC, I just had to get two loops working nicely together. No biggy. Problem 6, possibly one of the easiest out of the first 10.
The first part was simply about getting the sum 1^2 + 2^2 + … + 100^2.
sum_squares: ; Begin the calculation, rax = 1^2 + 2^2 + ... rbx^2 ; First, save the registers to the stack. push rbx push rcx push rdx push rsi ; Next, set the initial values mov rcx, 0 mov rsi, 1 mov rsi, rax loop_sum_squares: mul rax ; rax = rax^2 add rcx, rax ; Add it to the accumulator inc rsi ; Increment the loop counter mov rax, rsi ; Prepare for the next iteration cmp rsi, rbx ; See if we have to do the next iteration je sum_sq_ret ; We reached the limit - go to the exit sequence. jl loop_sum_squares ; Not reached it yet, keep looping. sum_sq_ret: ; Exit sequence. mov rax,rcx ; Put the result in rcx. ; Restore the registers to their original state before leaving. pop rsi pop rdx pop rcx pop rbx ret
The sum_squares is simply a driver function, ensuring that the accumulator is 0, saving the registers and so forth. loop_sum_squares is where it all happens, rsi is the loop counter, keeping track of where we are, and rax is where the squaring is done. Once it’s squared, it’s added to the accumulator in rcx. It then checks rsi and rbx to see if we’ve reached our limit. If we have, jump to the exit procedure. If not, continue looping.
The code for (1 + 2 + … + 100)^2 is vastly similar, relying on a counter and an accumulator.
sum_to: ; Save the registers push rcx push rdx ; Initial accumulator value mov rcx, 0 loop_sum_to: add rcx, rax ; acc = acc + current inc rax ; current++ cmp rax, rbx jl loop_sum_to ; if rax < rbx, continue looping je sum_to_ret ; else go to the return sequence sum_to_ret: mov rax, rcx ; put the result in the right place ; restore the registers pop rcx pop rdx mul rax ; Square the sum. ret ; return
Now all that I needed was to call these and find the difference between them. Not hard at all.
section .text global _start section .data msg db '%lld', 10, 0 extern printf extern _exit _start: mov rax, 1 ; 1^2 + 2^2 + ... +101^2 mov rbx, 101 ; limit call sum_squares push rax ; save the result for later mov rax, 1; call sum_to ; rax = (1 + 2 + 3 + ... + 100)^2 pop rbx ; rbx = 1^2 + 2^2 + ... + 100^2 sub rax, rbx jmp write_result write_result: ; write the result to the screen mov rsi, rax mov rdi, msg mov rax, 0 call printf jmp exit exit: ; Call the exit system call. push 42 call _exit
Simple, really. Mathematically simpler than problem 5, actually. And a brief comparison between this implementation and the Python one:
dst502@milan:~/Code/asm$ time ./euler6 XXXXXXXX real 0m0.002s user 0m0.000s sys 0m0.002s dst502@milan:~/Code/asm$ time python ./euler6.py XXXXXXXX real 0m0.064s user 0m0.047s sys 0m0.004s
Clearly, the asm is faster. But bear in mind, it took me around 3 days to write the asm versions of problem 5 and problem 6, whereas the Python versions took barely 10 minutes between them.
Done! :-D