x86 Assembly Programming: HLASM Developer's Guide to Intel Architecture

TT
Emily Ross
x86 Assembly Programming: HLASM Developer's Guide to Intel Architecture

x86 Assembly Programming: An HLASM Developer's Guide

If you already know HLASM, learning x86-64 assembly is significantly easier than starting from scratch. The fundamental concepts are identical: registers hold working data, instructions manipulate those registers and memory, a stack manages function calls, and every conditional branch tests a flags register. What changes is the architecture, the syntax, the toolchain, and the OS interface.

This guide bridges the gap. It introduces x86-64 assembly with explicit comparisons to z/Architecture and HLASM throughout, so you can build on what you already know.


x86-64 Architecture Overview

x86-64 (also called AMD64 or Intel 64) is the 64-bit extension of the original Intel 8086 architecture from 1978. Like z/Architecture, it has accumulated decades of backward-compatible extensions. Unlike z/Architecture, it is little-endian — the least significant byte of a multi-byte integer is stored at the lowest memory address.

Key architectural differences from z/Architecture

Featurez/Architecture (HLASM)x86-64
Byte orderBig-endianLittle-endian
General registers16 × 64-bit (GR0–GR15)16 × 64-bit (RAX–R15)
AddressingBase + displacement (12-bit disp)Many modes, up to 32-bit displacement
Condition codes2-bit CC in PSWRFLAGS register (multiple flag bits)
System callSVC instructionSYSCALL instruction
Memory modelVirtual, flat per address spaceVirtual, flat
Character encodingEBCDIC (z/OS)ASCII / UTF-8 (Linux/Windows)

x86-64 Registers

x86-64 has 16 general-purpose 64-bit registers. Their names carry historical baggage from 16-bit and 32-bit predecessors:

64-bit32-bit16-bit8-bit (low)Common Use
RAXEAXAXALAccumulator, return value
RBXEBXBXBLCallee-saved, base pointer (legacy)
RCXECXCXCLCounter, 4th argument
RDXEDXDXDLData, 3rd argument
RSIESISISILSource index, 2nd argument
RDIEDIDIDILDest index, 1st argument
RSPESPSPSPLStack pointer
RBPEBPBPBPLBase pointer (stack frame)
R8–R15R8D–R15DR8W–R15WR8B–R15BAdditional arguments / scratch

Compared to HLASM: z/Architecture's GR0–GR15 are symmetrical — any register can be used for any purpose (with the convention exceptions covered in the architecture fundamentals article). x86-64 registers are more asymmetric; RSP is always the stack pointer and cannot be repurposed freely.

Writing to a 32-bit sub-register (e.g., MOV EAX, 1) zero-extends the result into the full 64-bit register on x86-64. This is different from writing to a 16-bit sub-register, which leaves the upper 48 bits unchanged.

The RFLAGS Register

x86-64's equivalent of HLASM's condition code is the RFLAGS register, a 64-bit register with individual flag bits:

FlagMeaning
ZF (Zero Flag)Set if the result of an operation is zero
SF (Sign Flag)Set if the result is negative
CF (Carry Flag)Set on unsigned overflow / borrow
OF (Overflow Flag)Set on signed overflow
PF (Parity Flag)Set if the result has even parity

Conditional jump instructions test specific combinations of these flags — analogous to BC in HLASM testing specific condition code values.


The NASM Assembler and Toolchain

NASM (Netwide Assembler) is the standard tool for x86-64 assembly on Linux. The equivalent role to HLASM's ASMA90 on z/OS.

Installing and using NASM on Linux

bash
sudo apt install nasm          # Debian/Ubuntu
sudo dnf install nasm          # RHEL/Fedora

nasm -f elf64 program.asm -o program.o   # assemble to ELF object
ld -o program program.o                   # link to executable
./program                                 # run

Compared to HLASM: On z/OS you submit JCL to invoke ASMA90 and BINDER as batch jobs. On Linux, NASM and ld are command-line tools you run interactively from a terminal.

A first x86-64 program

nasm
; hello.asm — write "Hello\n" to stdout using Linux system calls
; Assembled with: nasm -f elf64 hello.asm -o hello.o && ld -o hello hello.o

section .data
    msg     db  "Hello, World!", 10    ; string + newline (LF = 0x0A)
    msglen  equ $ - msg                ; calculate length at assemble time

section .text
    global _start

_start:
    mov     rax, 1          ; syscall number: write (sys_write)
    mov     rdi, 1          ; file descriptor: 1 = stdout
    mov     rsi, msg        ; pointer to the message
    mov     rdx, msglen     ; number of bytes to write
    syscall                 ; invoke the Linux kernel

    mov     rax, 60         ; syscall number: exit (sys_exit)
    xor     rdi, rdi        ; exit code: 0
    syscall

Compared to HLASM: The HLASM equivalent uses WTO for console output and RETURN for exit — both are macro calls that generate SVC instructions. The x86-64 program issues SYSCALL directly, passing arguments in registers. The concept is identical; the registers and system call numbers differ.


x86-64 Memory Addressing

x86-64 uses a more flexible addressing model than z/Architecture's strict base-displacement scheme.

Addressing mode syntax (NASM Intel syntax)

nasm
mov rax, [rbx]              ; load from address in RBX
mov rax, [rbx + 8]          ; base + displacement
mov rax, [rbx + rcx]        ; base + index
mov rax, [rbx + rcx*8]      ; base + index * scale (scale: 1,2,4,8)
mov rax, [rbx + rcx*8 + 16] ; base + index * scale + displacement
mov rax, [rel symbol]        ; RIP-relative: address relative to instruction pointer

The last form — RIP-relative addressing — is crucial in x86-64. Because executables can be loaded at arbitrary addresses (Position Independent Code / PIE), data and code references use addresses relative to the current instruction pointer rather than absolute addresses. HLASM uses USING and base registers to achieve the same goal.

Compared to HLASM: HLASM's 12-bit displacement means a single base register covers 4096 bytes. x86-64 allows a 32-bit signed displacement, so a single reference can reach ±2 GB from the base register — much more flexible, but also more implicit.


Arithmetic and Logical Instructions

x86-64 arithmetic instructions are similar in concept to HLASM but use two-operand syntax where the first operand is both source and destination:

nasm
; NASM (Intel syntax)                  ; HLASM equivalent
add     rax, rbx        ; RAX += RBX   -- AR  R1,R2
sub     rax, 10         ; RAX -= 10    -- S   R1,=F'10'
imul    rax, rbx        ; RAX *= RBX   -- MR  (signed multiply, result in one reg)
inc     rcx             ; RCX++        -- A   R1,=F'1'
dec     rsi             ; RSI--        -- S   R1,=F'1'
neg     rax             ; RAX = -RAX   -- LCR R1,R1
and     rax, 0xFF       ; mask bits    -- N   R1,=X'000000FF'
or      rax, rbx        ; set bits     -- OR  R1,R2
xor     rax, rax        ; zero RAX     -- XR  R1,R1 (fastest zero idiom)
shl     rax, 3          ; shift left   -- SLL R1,3
shr     rax, 1          ; logical shr  -- SRL R1,1
sar     rax, 1          ; arith shr    -- SRA R1,1

The xor rax, rax idiom to zero a register is universal in x86-64 — it encodes as a shorter opcode than mov rax, 0 and is recognised by the CPU as a register-zeroing operation.


Control Flow and Branching

Control flow in x86-64 uses CMP to set flags and conditional jump instructions to branch:

nasm
    cmp     rax, 10         ; sets flags based on RAX - 10 (result discarded)
    je      equal           ; jump if equal (ZF = 1)
    jne     not_equal       ; jump if not equal (ZF = 0)
    jl      less_than       ; jump if less (signed, SF ≠ OF)
    jg      greater_than    ; jump if greater (signed, ZF=0 and SF=OF)
    jb      below           ; jump if below (unsigned, CF = 1)
    ja      above           ; jump if above (unsigned, CF=0 and ZF=0)

Compared to HLASM: CMP in x86-64 is equivalent to CR or CLC in HLASM. The conditional jump mnemonics (JE, JNE, JL, JG) correspond to HLASM's extended branch mnemonics (BE, BNE, BL, BH). The underlying mechanism — comparing, setting a status register, and branching on the result — is identical in both architectures.

Implementing a loop

nasm
    mov     rcx, 10         ; loop counter
.loop:
    ; ... loop body ...
    dec     rcx             ; decrement
    jnz     .loop           ; jump back if not zero

Equivalent HLASM:

hlasm
         LA    R5,10         LOAD COUNTER
LOOP     DS    0H
*        ... LOOP BODY ...
         BCTR  R5,0          DECREMENT R5 (BCTR = Branch on Count Register)
         BNZ   LOOP          BRANCH IF NOT ZERO

The Stack and Calling Conventions

The x86-64 stack grows downward (toward lower addresses), exactly like the z/Architecture stack.

System V AMD64 ABI (Linux/macOS)

This is the calling convention for Linux and macOS x86-64. It defines:

  • Arguments: passed in RDI, RSI, RDX, RCX, R8, R9 (left to right). Additional arguments go on the stack.
  • Return value: in RAX (and RDX for 128-bit values).
  • Callee-saved registers: RBX, RBP, R12–R15. The called function must preserve these.
  • Caller-saved registers: RAX, RCX, RDX, RSI, RDI, R8–R11. May be clobbered by a called function.
  • Stack alignment: RSP must be 16-byte aligned when the CALL instruction executes.
nasm
; a function that receives two arguments and returns their sum
; long add(long a, long b)  -- a in RDI, b in RSI, result in RAX
add_func:
    push    rbp             ; save caller's base pointer
    mov     rbp, rsp        ; establish stack frame
    ; function body
    mov     rax, rdi        ; return value = first argument
    add     rax, rsi        ; + second argument
    pop     rbp             ; restore base pointer
    ret                     ; return (pop return address, jump to it)

Compared to HLASM: HLASM's standard linkage saves all 15 registers in a 72-byte save area at 12(R13). System V ABI only requires saving RBX, RBP, R12–R15. The mechanisms differ, but both conventions ensure that after a function returns, the caller's register state is intact.


System Calls on x86-64 Linux

Linux x86-64 system calls use the SYSCALL instruction. Arguments go in RDI, RSI, RDX, R10, R8, R9 (note: R10 not RCX for syscalls):

nasm
; Common Linux x86-64 system call numbers
; sys_read   = 0
; sys_write  = 1
; sys_open   = 2
; sys_close  = 3
; sys_exit   = 60
; sys_mmap   = 9

; Read from stdin (fd=0) into buffer
    mov     rax, 0          ; sys_read
    mov     rdi, 0          ; fd = stdin
    mov     rsi, buffer     ; buffer address
    mov     rdx, 256        ; max bytes to read
    syscall
    ; RAX now holds number of bytes actually read (or negative error code)

Compared to HLASM: z/OS uses SVC instructions with macro-generated parameter lists. The concept — trap to the kernel with a function number and arguments — is the same. The details of which registers carry what, and the system call table, are OS-specific.


Debugging x86-64 Assembly with GDB

GDB (GNU Debugger) is the standard Linux debugger for x86-64 assembly — the equivalent of IBM's Debug Tool / IPCS on z/OS.

bash
nasm -f elf64 -g -F dwarf program.asm -o program.o   # assemble with debug info
ld -o program program.o
gdb ./program

Essential GDB commands for assembly:

text
(gdb) layout asm          # show disassembly pane
(gdb) layout regs         # show registers alongside disassembly
(gdb) si                  # step one instruction (step into calls)
(gdb) ni                  # step one instruction (step over calls)
(gdb) info registers      # display all register values
(gdb) x/8xg $rsp          # examine 8 quadwords at stack pointer
(gdb) break _start        # set breakpoint at label
(gdb) run                 # start execution

Compared to HLASM debugging: HLASM debugging under IPCS involves reading a formatted storage dump — a text representation of memory and register contents at the time of an ABEND. GDB is interactive: you step through instructions while the program is live, watching register and memory state change in real time.


Conclusion

x86-64 assembly is a natural extension for any HLASM developer. The core concepts — registers, memory addressing, condition flags, the call stack, and system calls — are universal. The differences lie in syntax, calling conventions, byte order, and the specific instruction mnemonics.

The most valuable skill is learning to read compiler output. When you compile a C function with gcc -O2 -S, the generated assembly is x86-64. Reading that output fluently is what separates senior systems engineers from everyone else. Your HLASM background gives you a substantial head start.

Continue the track by exploring ARM Assembly Programming: A Comparative Guide for Assembler Developers.