After messing around with high-level languages like C, Go and python. I wanted to go lower, so I took to assembly. I’m jolting down everything I learn, hoping that this would help the present you or the future me in some way. I’m running a 64bit Operating System (Arch Linux) on a core i7 3rd Gen Processor.
We’ll be using NASM as our assembler. On Arch Linux NASM can be easily installed using pacman:
sudo pacman -S nasm
You can find the general installation guide
here. But I suggest looking
into your Operating System’s package manager.
x86 Instruction Set
The x86 is both an instruction set as well as an architecture. It is famously named after the line of processors made by Intel whose names ended with ‘86’.
The x86 Architecture has 8 General-Purpose Registers (GPR), 6 Segment Registers, 1 Flag Register and an Instruction pointer. The 64bit version has a few more additional registers.
The 8 GPRs are
RDI. The 6 Segment Registers are
The NASM Syntax was designed to be easy to use and understand. The general syntax for NASM code is:
[label:] Instruction Operands
The first operand acts as the destination and if required the source also (Look at the ADD instruction below), whereas the second operand acts as the source. For e.g.
ADD RDX, RAX
This instruction will basically perform
RDX = RDX + RAX.
There is one thing left to understand before we get our hands dirty with assembly code. From here on I shall be using asm and assembly interchangeably. Asm programs can be divided into 3 sections.
Data Section: This is used for declaring constants.
BSS Section: This is used for declaring variables.
Text Section: This is used for keeping the actual code.
To print “hello, world!” we need to store the string into the data segment. This is done as follow:
section .data str: db "hello, world!", 10, 0
Here the name of the constant is str and db says store as data byte array. The 10 after the string is ASCII for newline and the 0 stands for the string delimiter.
To actually print to screen we need to invoke one of the system calls
made available by the kernel. This is similar to using
C which would internally invoke a system call. We make a system call
by calling a software interrupt using
int $0x80 or
$0x80 is deprecated for x86-64, hence we’ll be using
While making a system call we need to pass all the arguments to the
GPRs. Each system call has a corresponding number. We specify the
system call by passing that number to the
RAX register. You can find
the mapping between the system calls and their numbers over
The mapping between the GPRs and the parameters are:
|Syscall No||Param 1||Param 2||Param 3||Param 4||Param 5||Param 6|
We can see that the system call we require for printing is
sys_write(), the syntax for which is:
asmlinkage long sys_write(unsigned int fd, const char __user *buf, size_t count);
Seeing this we can say that we need to pass 1 (stdout) to
RDI, str to
the length of the string i.e. 15 to
The asm code for it being:
MOV RAX, 1 MOV RDI, 1 MOV RSI, str MOV RDX, 15 syscall
Putting all of this together and adding code to exit gracefully, we have:
section .data str: db "hello, world!", 10, 0 section .text global _start _start: MOV RAX, 1 MOV RDI, 1 MOV RSI, str MOV RDX, 15 syscall MOV RAX, 60 MOV RDI, 0 syscall
Here 60 is the system call number for
directive is NASM specific. It’s basically used to export the symbols
used in our code to where it points in the object code generated. Here
_start symbol global so that its name is added in the object
_start (which is default) acts as the entry point for our
code similar to
main in C.
We compile and link to create the executable as follows (assuming the file is saved as hello.asm):
nasm -f elf64 hello.asm -o hello.o ld -o hello hello.o -m elf_x86_64
Now you can run the executable to print “Hello, world!”.
You can also use
int 0x80instead of
syscall, but then system call numbers change. You can find the numbers for
You can use a different symbol instead of
_start(which is default), but then you’ll need to tell ld using the -e parameter.