Saturday, 8th May, 2010
x86_64 Assembler, Part 1
Oh god. Oh god. WHY IS rcx BEING SET TO 0?!
This was effectively an exercise in getting to grips with x86_64 and gdb. Both powerful, both requiring years of use to really “know”.
Either way, I had a dabble, tackling both problem 5 and problem 6 in assembler. This blog post will cover problem 5.
A real challenge was that I’d never had used an assembler, or a linker, so I had to get those working along side getting my code to work. I chose to use the Linux system calls for getting output from my program, since I don’t actually know how those work deep down, but they’re there.
For my assembler, I settled for nasm, and for linking, I left that to GCC. I used nasm because it was what I first came across (with regards to tutorials) and it seemed to be the most likely to “just work” without additional complexity.
So in total, I had to work out:
- x86_64 Assembler
- nasm
- GCC linking stuff
- GDB
- Use of Linux system calls at a low level
To get both problem 5 and 6 working correctly took well over 3 days of on/off hacking.
I first set about writing a Greatest Common Divisor algorithm, using subtraction as opposed to division, because I’ve always been told that division is slow. Also, it seemed easier to implement a subtraction based algorithm in asm than a division one.
gcd: ; calculate the GCD of RAX and RBX. Leave the result in RAX push rbx gcd_st: ; Start of the GCD loop cmp rax, rbx je gcd_ret ; If a = b, return (a) jl gcd_lt ; elif a < b, return gcd(a-b, b), jg gcd_gt ; elif a > b, return gcd(b-a, a). gcd_ret: pop rbx ; Put rbx back where it was :-p ret gcd_lt: sub rbx, rax ; b = b - a jmp gcd_st gcd_gt: sub rax, rbx ; a = a - b jmp gcd_st
Oddly enough, to actually get that written down on paper was surprisingly easy! After that, I just needed to use it for a Lowest Common Divisor function:
lcm: ; Calculate the lcm of rax and rbx, leaving the result in rax. push rbx push rcx push rdx push rax mul rbx ; a = a * b mov rcx, rax ; c = a * b pop rax ; a = a call gcd ; a = gcd ( a, b ) xchg rax, rcx ; swap a*b and gcd(a, b) div rcx ; a = ( a * b ) / gcd ( a, b ) pop rdx pop rcx pop rbx ret
Quite simple really… note the pushes an pop’s at the beginning and end respectively. I wanted to save the state of the registers before entering the procedure. After that, it is a simple implementation of (a*b) / gcd(a,b).
That’s the bulk of the program right there, the rest of it is simply looping trough the numbers 1 … 20 and finding their gcd with an accumulator.
; Entry point section .text global _start ; The printf format string, "%lld\n". section .data msg db '%lld', 10, 0 ; Get access to the two system calls we need. extern printf extern _exit _start: mov rax, 1 mov rbx, 2 call loop_start loop_start: ; Begin the calculation, note that rax is an accumulator. call lcm ; rax = lcm ( rax, rbx ) inc rbx ; rbx++ cmp rbx, 21 ; if rbx >= 21... je write_result ; then exit jmp loop_start ; else continue looping write_result: ; write the result to the screen mov rsi, rax mov rdi, msg mov rax, 0 call printf jmp exit exit: ; Call the exit system call. push 42 call _exit
Ow! That took me the better part of a day and a half. Calling printf especially. Most of the tutorials out there are aimed at x86, not x86_64. This causes quite a lot of confusion. Especially when you use GCC & C to write some assembler that uses printf and it just uses 32-bit methods.
And for good measure, I put the lines to assemble and link it in a shell script:
nasm -f elf64 euler5.s gcc -nostartfiles -o euler5 euler5.o -lc
And of course, let’s see how long it takes to run (With the python version I wrote much earlier for comparison):
dst502@milan:~/Code/asm$ time ./euler5 XXXXXXXXX real 0m0.022s user 0m0.020s sys 0m0.002s dst502@milan:~/Code/asm$ time python euler5.py XXXXXXXXX real 0m0.047s user 0m0.039s sys 0m0.007s
A massive thanks to those on ##proggit who helped me suss out nasm, gcc and x86_64 in general, GeDaMo, ryan[WIN], I’m looking at you :-p
Done! :-D