Assembly format and structure of the code

Learn assembly format and structure of the code with practical examples, diagrams, and best practices. Covers assembly, formatting, goto development techniques with visual explanations.

Understanding Assembly: Format, Structure, and Control Flow

Hero image for Assembly format and structure of the code

Dive into the fundamental structure of assembly language, exploring its syntax, common directives, and how control flow mechanisms like 'goto' are implemented at a low level.

Assembly language serves as a bridge between high-level programming languages and the machine's native instruction set. Understanding its format and structure is crucial for anyone looking to optimize code, reverse engineer software, or delve into operating system development. This article will break down the typical components of an assembly program, illustrate how instructions are organized, and explain the mechanics of control flow, including the infamous goto equivalent.

Basic Assembly Instruction Format

At its core, assembly language consists of a series of instructions that directly correspond to operations the CPU can perform. Each instruction typically follows a specific format, though this can vary slightly between different architectures (e.g., x86, ARM, MIPS). Generally, an instruction includes an opcode (operation code) and zero or more operands. Operands specify the data or memory locations that the operation will act upon.

label:  opcode  operand1, operand2  ; comment

_start:
    mov eax, 1      ; Move the value 1 into register EAX
    mov ebx, 0      ; Move the value 0 into register EBX
    int 0x80        ; Call interrupt 0x80 (system call)

Example of basic assembly instruction format (x86 syntax).

In the example above:

  • label: is an optional identifier for a line of code, allowing other instructions to jump to it.
  • opcode (e.g., mov, add, jmp) specifies the operation.
  • operand1, operand2 are the arguments for the operation. These can be registers, memory addresses, or immediate values.
  • ; comment is an optional explanation for the instruction, ignored by the assembler.

Program Structure and Directives

An assembly program is more than just a list of instructions; it also includes directives that guide the assembler. These directives don't translate into machine code directly but provide metadata, define data segments, reserve memory, or declare entry points. Common sections in an assembly program include:

  • .data: For initialized data (e.g., strings, constants).
  • .bss: For uninitialized data (e.g., buffers).
  • .text: For the actual executable code.

Assemblers like NASM or MASM use specific directives to define these sections and manage memory.

section .data
    message db 'Hello, World!', 0xA ; Define a string with a newline
    msg_len equ $ - message         ; Calculate string length

section .bss
    buffer resb 256                 ; Reserve 256 bytes for a buffer

section .text
    global _start                   ; Declare _start as the entry point

_start:
    ; Program execution begins here
    mov eax, 4                      ; sys_write system call number
    mov ebx, 1                      ; stdout file descriptor
    mov ecx, message                ; Address of string to write
    mov edx, msg_len                ; Length of string
    int 0x80                        ; Call kernel

    mov eax, 1                      ; sys_exit system call number
    mov ebx, 0                      ; Exit code 0
    int 0x80                        ; Call kernel

Typical structure of an assembly program using NASM syntax.

Control Flow: The Assembly 'GOTO'

Unlike high-level languages with structured constructs like if/else, for, and while loops, assembly language implements all control flow using conditional and unconditional jump instructions. These jumps are essentially the low-level equivalent of a goto statement, directing the CPU's instruction pointer to a different part of the code identified by a label. This fundamental mechanism allows for branching, looping, and function calls.

flowchart TD
    A[Start Program] --> B{Check Condition?}
    B -- Yes --> C[Execute Block 1]
    C --> E[Continue]
    B -- No --> D[Execute Block 2]
    D --> E[Continue]
    E --> F[End Program]

Basic conditional jump (if/else) logic in assembly.

    mov eax, 10
    cmp eax, 5      ; Compare EAX with 5
    jg  is_greater  ; Jump if greater to 'is_greater' label

    ; This code executes if EAX <= 5
    mov ebx, 0
    jmp end_check   ; Unconditional jump to 'end_check'

is_greater:
    ; This code executes if EAX > 5
    mov ebx, 1

end_check:
    ; Program continues here
    ; EBX will be 1 if EAX > 5, else 0

Assembly code demonstrating conditional and unconditional jumps.

Common jump instructions include:

  • jmp: Unconditional jump.
  • je/jz: Jump if equal/zero.
  • jne/jnz: Jump if not equal/not zero.
  • jg/jnle: Jump if greater/not less or equal (signed).
  • jl/jnge: Jump if less/not greater or equal (signed).
  • ja/jnbe: Jump if above/not below or equal (unsigned).
  • jb/jnae: Jump if below/not above or equal (unsigned).

These instructions manipulate the instruction pointer (EIP or RIP on x86/x64) to control the flow of execution. Loops are constructed by jumping back to an earlier label, and functions are implemented using call (which pushes the return address onto the stack before jumping) and ret (which pops the return address and jumps back).