Introduction to programming in x86-64 assembly
May 17, 2016
4 minutes read

After messing around with high-level languages like C, Go and python. I wanted to go lower, so I took to assembly. I’m jolting down everything I learn, hoping that this would help the present you or the future me in some way. I’m running a 64bit Operating System (Arch Linux) on a core i7 3rd Gen Processor.

Installation

We’ll be using NASM as our assembler. On Arch Linux NASM can be easily installed using pacman:

sudo pacman -S nasm

You can find the general installation guide here. But I suggest looking into your Operating System’s package manager.

x86 Instruction Set

The x86 is both an instruction set as well as an architecture. It is famously named after the line of processors made by Intel whose names ended with ‘86’.

The x86 Architecture has 8 General-Purpose Registers (GPR), 6 Segment Registers, 1 Flag Register and an Instruction pointer. The 64bit version has a few more additional registers.

The 8 GPRs are RAX, RCX, RDX, RBX, RSP, RDP, RSI and RDI. The 6 Segment Registers are SS, CS, DS, ES, FS, and GS.

NASM Syntax

The NASM Syntax was designed to be easy to use and understand. The general syntax for NASM code is:

[label:] Instruction Operands

The first operand acts as the destination and if required the source also (Look at the ADD instruction below), whereas the second operand acts as the source. For e.g.

ADD RDX, RAX

This instruction will basically perform RDX = RDX + RAX.

Hello, World!

There is one thing left to understand before we get our hands dirty with assembly code. From here on I shall be using asm and assembly interchangeably. Asm programs can be divided into 3 sections.

  1. Data Section: This is used for declaring constants.

  2. BSS Section: This is used for declaring variables.

  3. Text Section: This is used for keeping the actual code.

To print “hello, world!” we need to store the string into the data segment. This is done as follow:

section .data
	str: db "hello, world!", 10, 0

Here the name of the constant is str and db says store as data byte array. The 10 after the string is ASCII for newline and the 0 stands for the string delimiter.

To actually print to screen we need to invoke one of the system calls made available by the kernel. This is similar to using printf() in C which would internally invoke a system call. We make a system call by calling a software interrupt using int $0x80 or syscall. int $0x80 is deprecated for x86-64, hence we’ll be using syscall.

While making a system call we need to pass all the arguments to the GPRs. Each system call has a corresponding number. We specify the system call by passing that number to the RAX register. You can find the mapping between the system calls and their numbers over here.

The mapping between the GPRs and the parameters are:

Syscall No Param 1 Param 2 Param 3 Param 4 Param 5 Param 6
RAX RDI RSI RDX R10 R8 R9

We can see that the system call we require for printing is sys_write(), the syntax for which is:

asmlinkage long sys_write(unsigned int fd, const char __user *buf, size_t count);

Seeing this we can say that we need to pass 1 (stdout) to RDI, str to RSI and the length of the string i.e. 15 to RSI.

The asm code for it being:

MOV RAX, 1
MOV RDI, 1
MOV RSI, str
MOV RDX, 15
syscall

Putting all of this together and adding code to exit gracefully, we have:

section .data
	str: db "hello, world!", 10, 0

section .text
global _start
_start:
  MOV RAX, 1
  MOV RDI, 1
  MOV RSI, str
  MOV RDX, 15
  syscall

  MOV RAX, 60
  MOV RDI, 0
  syscall

Here 60 is the system call number for sys_exit(). The global directive is NASM specific. It’s basically used to export the symbols used in our code to where it points in the object code generated. Here we mark _start symbol global so that its name is added in the object code. Here _start (which is default) acts as the entry point for our code similar to main in C.

We compile and link to create the executable as follows (assuming the file is saved as hello.asm):

nasm -f elf64 hello.asm -o hello.o
ld -o hello hello.o -m elf_x86_64

Now you can run the executable to print “Hello, world!”.

Extras

  1. You can also use int 0x80 instead of syscall, but then system call numbers change. You can find the numbers for int 0x80 here.

  2. You can use a different symbol instead of _start (which is default), but then you’ll need to tell ld using the -e parameter.


Back to posts


comments powered by Disqus