Assembly format and structure of the code
Categories:
Understanding Assembly: Format, Structure, and Control Flow

Dive into the fundamental structure of assembly language, exploring its syntax, common directives, and how control flow mechanisms like 'goto' are implemented at a low level.
Assembly language serves as a bridge between high-level programming languages and the machine's native instruction set. Understanding its format and structure is crucial for anyone looking to optimize code, reverse engineer software, or delve into operating system development. This article will break down the typical components of an assembly program, illustrate how instructions are organized, and explain the mechanics of control flow, including the infamous goto
equivalent.
Basic Assembly Instruction Format
At its core, assembly language consists of a series of instructions that directly correspond to operations the CPU can perform. Each instruction typically follows a specific format, though this can vary slightly between different architectures (e.g., x86, ARM, MIPS). Generally, an instruction includes an opcode (operation code) and zero or more operands. Operands specify the data or memory locations that the operation will act upon.
label: opcode operand1, operand2 ; comment
_start:
mov eax, 1 ; Move the value 1 into register EAX
mov ebx, 0 ; Move the value 0 into register EBX
int 0x80 ; Call interrupt 0x80 (system call)
Example of basic assembly instruction format (x86 syntax).
In the example above:
label:
is an optional identifier for a line of code, allowing other instructions to jump to it.opcode
(e.g.,mov
,add
,jmp
) specifies the operation.operand1, operand2
are the arguments for the operation. These can be registers, memory addresses, or immediate values.; comment
is an optional explanation for the instruction, ignored by the assembler.
Program Structure and Directives
An assembly program is more than just a list of instructions; it also includes directives that guide the assembler. These directives don't translate into machine code directly but provide metadata, define data segments, reserve memory, or declare entry points. Common sections in an assembly program include:
.data
: For initialized data (e.g., strings, constants)..bss
: For uninitialized data (e.g., buffers)..text
: For the actual executable code.
Assemblers like NASM or MASM use specific directives to define these sections and manage memory.
section .data
message db 'Hello, World!', 0xA ; Define a string with a newline
msg_len equ $ - message ; Calculate string length
section .bss
buffer resb 256 ; Reserve 256 bytes for a buffer
section .text
global _start ; Declare _start as the entry point
_start:
; Program execution begins here
mov eax, 4 ; sys_write system call number
mov ebx, 1 ; stdout file descriptor
mov ecx, message ; Address of string to write
mov edx, msg_len ; Length of string
int 0x80 ; Call kernel
mov eax, 1 ; sys_exit system call number
mov ebx, 0 ; Exit code 0
int 0x80 ; Call kernel
Typical structure of an assembly program using NASM syntax.
Control Flow: The Assembly 'GOTO'
Unlike high-level languages with structured constructs like if/else
, for
, and while
loops, assembly language implements all control flow using conditional and unconditional jump instructions. These jumps are essentially the low-level equivalent of a goto
statement, directing the CPU's instruction pointer to a different part of the code identified by a label. This fundamental mechanism allows for branching, looping, and function calls.
flowchart TD A[Start Program] --> B{Check Condition?} B -- Yes --> C[Execute Block 1] C --> E[Continue] B -- No --> D[Execute Block 2] D --> E[Continue] E --> F[End Program]
Basic conditional jump (if/else) logic in assembly.
mov eax, 10
cmp eax, 5 ; Compare EAX with 5
jg is_greater ; Jump if greater to 'is_greater' label
; This code executes if EAX <= 5
mov ebx, 0
jmp end_check ; Unconditional jump to 'end_check'
is_greater:
; This code executes if EAX > 5
mov ebx, 1
end_check:
; Program continues here
; EBX will be 1 if EAX > 5, else 0
Assembly code demonstrating conditional and unconditional jumps.
Common jump instructions include:
jmp
: Unconditional jump.je
/jz
: Jump if equal/zero.jne
/jnz
: Jump if not equal/not zero.jg
/jnle
: Jump if greater/not less or equal (signed).jl
/jnge
: Jump if less/not greater or equal (signed).ja
/jnbe
: Jump if above/not below or equal (unsigned).jb
/jnae
: Jump if below/not above or equal (unsigned).
These instructions manipulate the instruction pointer (EIP
or RIP
on x86/x64) to control the flow of execution. Loops are constructed by jumping back to an earlier label, and functions are implemented using call
(which pushes the return address onto the stack before jumping) and ret
(which pops the return address and jumps back).
goto
is often discouraged in high-level programming for leading to 'spaghetti code', it is the fundamental building block of control flow in assembly. Understanding how these low-level jumps form higher-level constructs is key to grasping compiler optimizations and program execution.