How ARM64 Instructions Are Really Encoded

Lobsters Hottest 06/25/26, 12:02 PM News

arm64 instruction-encoding aarch64 assembly cpu-architecture apple-silicon machine-code

Summary

An educational article that explains how ARM64 (AArch64) instructions are encoded in 32-bit fixed-length words, debunking common misconceptions and providing hands-on decoding examples using ADD immediate on Apple Silicon.

<p><a href="https://lobste.rs/s/skrixb/how_arm64_instructions_are_really">Comments</a></p>

Original Article

View Cached Full Text

Cached at: 06/25/26, 01:13 PM

# How ARM64 Instructions Are Really Encoded Source: [https://medium.com/@tomas.pitner/how-arm64-instructions-are-really-encoded-c7ace7da442d](https://medium.com/@tomas.pitner/how-arm64-instructions-are-really-encoded-c7ace7da442d) [![Tomáš Pitner](https://miro.medium.com/v2/resize:fill:64:64/0*MaQMYr-qPMs7qdx-.)](https://medium.com/@tomas.pitner?source=post_page---byline--c7ace7da442d---------------------------------------) ## Introduction If you’ve ever looked at ARM64 assembly on Apple Silicon \(M1 and newer\), you’ve probably seen instructions such as: ``` add x0, x1, #42 ``` Most programmers stop there\. The assembler converts the instruction into machine code, the CPU executes it, and everything works\. *But what does the instruction actually look like inside the processor?* Unlike x86, where instruction lengths vary between one and fifteen bytes, every**AArch64 instruction occupies exactly 32 bits**\. This seemingly simple design decision makes ARM64 surprisingly approachable if you want to understand what machine code really looks like\. In this article we will take several real ARM64 instructions used on Apple Silicon and gradually decode them until the encoding starts making sense\. The goal is not to memorize instruction formats\. The goal is to understand that every ARM64 instruction is simultaneously three things: - assembly language visible to the programmer, - machine code executed by the processor, - a structured collection of bit fields\. Once you see all three views at the same time, instruction encoding becomes far less mysterious\. ## A Common Misconception A common source of confusion is that ARM64 processors execute 32\-bit instructions\. Doesn’t that contradict the name ARM64?**Not at all\.** The term*64\-bit architecture*refers primarily to the width of general\-purpose registers and the size of values the processor can manipulate efficiently\. ARM64 provides thirty\-one 64\-bit general\-purpose registers \(`X0`through`X30`\) and supports 64\-bit arithmetic operations\. Instruction size is a completely separate design decision\. **ARM64 is therefore a 64\-bit architecture that uses fixed\-size 32\-bit instruction words\.** Press enter or click to view image in full size Press enter or click to view image in full size ## Why ARM64 Is Easier To Decode Than x86 One of the reasons ARM64 is popular in education is its regular structure\. With x86, the decoder must first determine where an instruction begins and ends before it can even decide what the instruction means\. **ARM64 is different\. Every instruction occupies exactly four bytes\.** Now we see how a piece of*executable memory*with three instructions looks like: Press enter or click to view image in full size This means instruction boundaries are always known in advance\. Both hardware and software disassemblers benefit from this property\. Fetching instructions from round addresses like`…0, …4, …8`is faster — and we can easily achieve it for all instructions just by putting an`\.p2align 2`directive into our assembly code\. Compilers such as C, C\+\+ or Swift do the same\. ## Example 1: ADD Immediate Consider this instruction which adds the integer number`42`and content of`x1`register, and puts the result into`x0`: ``` add x0, x1, #42 ``` A typical assembler may encode it as: ``` 9100A820 ``` The processor actually sees only the 32\-bit instruction word: ``` 10010001000000001010100000100000 ``` This may look like random noise, but ARM64 instructions are highly structured\. For an**ADD \(immediate\)**instruction the encoding can be simplified as: ``` 31 22 10 5 0+----------------------+-----------+-------+------+| opcode | imm12 | Rn | Rd |+----------------------+-----------+-------+------+ ``` Now let’s decode it by hand\. The*destination register***Rd**field occupies the lowest five bits: ``` 100100010000000010101000001 | Rd = 00000 -> X0 ``` Five bits can represent 32 values, therefore Rd = X0\. The next five bits contain the source register: ``` 1001000100000000101010 | Rn = 00001 -> X1 ``` The twelve\-bit immediate field contains: ``` 1001000100 | imm12 = 000000101010 = 42 ``` which is decimal 42\. Finally, the remaining ten high\-order bits identify the**instruction type**itself\. ``` opcode(ADD)=1001000100 imm12(42)=000000101010 Rn(X1)=00001 Rd(X0)=00000 ``` The CPU performs exactly the same decoding process, although in hardware and at extraordinary speed\. A disassembler performs essentially the same operation\. Starting with a 32\-bit value: ``` 9100A820 ``` it extracts individual fields: ``` opcode = ADDRn = X1Rd = X0imm12 = 42 ``` and reconstructs: ``` add x0, x1, #42 ``` Once you realize this, tools such as`otool`,`llvm\-objdump`, Hopper, IDA, and Ghidra become much less mysterious\. They are automating the same decoding process thousands or millions of times\. ## Why Registers Need Only Five Bits One detail that appears repeatedly throughout the ARM64 encoding is the use of 5\-bit register fields\. The reason is simple\. ARM64 has 32 general\-purpose registers\. ``` 2^5 = 32 ``` Five bits are therefore sufficient to uniquely identify any register\. ``` 00000 -> X000001 -> X1...11111 -> X31 ``` Once you recognize this pattern, many instruction formats suddenly become much easier to understand\. ## Example 2: Branch Instructions Control flow instructions use a different layout\. Consider: ``` b somewhere ``` A**branch instruction**does not store a complete destination address\. Suppose the current instruction is located at: ``` 0x100001004 b somewhere // somewhere => 0x100002004 ``` and the destination is: ``` 0x100002004 ``` The processor does not need to store the full destination address\. It only needs to store the*distance*, a\.k\.a\.**offset**\. A simplified visualization looks like this: Press enter or click to view image in full size As every ARM64 instruction occupies four bytes, the lower two address bits are therefore always zero and do not need to be stored in the instruction\. The complete target address calculation can look like: Press enter or click to view image in full size This allows ARM64 to represent surprisingly large branch distances using only 26 bits\. A 26\-bit signed offset means we can jump approximately**±128 MiB**in either direction\. That is sufficient for the overwhelming majority of branches found in real\-world programs\. As a historical comparison, the reachable range is roughly comparable to the famous 1\-MiB physical address limit of early IBM PCs running MS\-DOS\! ## Example 4: The Famous ADRP If you disassemble almost any macOS executable, sooner or later you will encounter: ``` adrp x0, symbol@PAGEadd x0, x0, symbol@PAGEOFF ``` For programmers encountering Apple Silicon for the first time, this sequence often appears everywhere\. The reason is position\-independent code\. Modern macOS executables cannot assume fixed virtual addresses because the operating system may load them at different locations every time they run\. ARM64 therefore splits the address problem into two pieces: a page\-aligned base address and an offset within that page\. Although macOS on Apple Silicon uses 16\-KiB virtual\-memory pages, the`ADRP`instruction itself operates on 4\-KiB\-aligned addresses as defined by the ARM64 architecture\. ``` // current instruction0x100001004 ADRP x0, symbol@PAGE0x100001008 ADD x0, x0, symbol@PAGEOFF0x10000100C LDR x1, [x0] // will load 0x1234ABCD...0x10000C100 symbol: .long 0x1234ABCD ``` An assembler such as`clang`would compile the first instruction`ADRP x0, symbol@PAGE`like: 1. take the 4\-KiB\-aligned base address of the current instruction:`0x100001000`; 2. take the 4\-KiB\-aligned base address of`symbol`:`0x10000C000`; 3. compute their difference:`0xB`pages \(= 11 × 4 KiB\); 4. encode`\+11`into the`ADRP`instruction\. Thus,`ADRP`computes the base address of a nearby 4\-KiB\-aligned region relative to the current instruction\. The lower page offset bits are intentionally discarded\. The subsequent`ADD`instruction restores the offset within that page, i\.e\. 0x0100\. This technique is so common that you can find similar instruction pairs in almost every Apple Silicon executable\. Together they reconstruct the final address and, eventually, the`LDR x1, \[x0\]`will load`0x1234ABCD`\. Guess, what the following code would do: ``` adrp x0, greeting@PAGEadd x0, x0, greeting@PAGEOFFbl _puts ``` Yeah, the first`adrp x0, greeting@PAGE`contains the relative page position of`greeting`symbol \(seamingly addressing a string\)\. When executed, it adds the encoded relative page of`greeting`to the current page, giving the absolute address of the string\. Now, it is ready to echo the string to console with`\_puts`which is nothing else than a notoriously known`puts`standard C function seamlessly reachable also from your assembly code here\. Once you recognize this pattern, macOS disassemblies become much easier to read\. ## Looking At Real Machine Code If you have an Apple Silicon Mac, you can generate machine code yourself\. Create a small assembly file: ``` add x0, x1, #42ret ``` Assemble it: ``` clang -c example.s -o example.o ``` And inspect the result: ``` otool -tvV example.o# orllvm-objdump -d example.o ``` Seeing the assembly instruction and the generated machine code side by side is one of the fastest ways to develop an intuition for ARM64 encodings\. ## The Hidden Design Philosophy After decoding a few instructions, a pattern begins to emerge\. ARM64 instruction encodings are not random\. Several design goals are visible throughout the architecture: - fixed\-size instructions - simple and predictable decoding - a large register file - efficient branch encoding - room for future architectural extensions These goals explain many of the recurring structures found throughout the instruction set\. Perhaps the most impressive aspect of ARM64 is that these design goals remain visible throughout the architecture\. Once you understand a handful of instruction formats, many others begin to feel familiar because the same encoding ideas are reused repeatedly\. ## The CPU Sees Bit Fields, Not Assembly Humans prefer symbolic names such as: ``` add x0, x1, #42 ``` The processor never sees this text\. Instead it receives: ``` 9100A820 ``` which expands into: ``` opcode = ADDRn = X1Rd = X0imm12 = 42 ``` and finally becomes an internal operation executed by the hardware\. This perspective is one of the key insights that separates assembly programming from machine\-code programming\. Assembly language is a human\-friendly representation\. The processor ultimately works with fields, masks, decoders, and control signals\. ## Conclusion Once you start looking at ARM64 instructions as collections of bit fields rather than assembly mnemonics, many seemingly mysterious concepts become obvious\. Registers occupy five bits because there are 32 of them\. Branch instructions store relative offsets because every instruction is exactly four bytes long\. Instructions such as`CBZ`combine multiple operations into a single encoding to reduce code size and improve efficiency\. Most importantly, ARM64 stops looking like magic\. Once you understand instruction encoding, reverse\-engineering tools stop looking magical as well\. Whether you use`otool`,`llvm\-objdump`, Hopper, IDA, or Ghidra, all of them begin with exactly the same task: decode a stream of 32\-bit instruction words into something humans can understand\. The next time you see a machine word such as: ``` 9100A820 ``` you will know that it is not just hexadecimal noise\. It is a carefully structured 32\-bit packet of information telling the processor exactly what to do\. ## Further Reading This article is adapted from my upcoming FREE book: ***ARM64 Assembly on macOS: A Practical Guide for Apple Silicon*** The complete open\-access edition, including instruction encoding, Apple Silicon ABI conventions, Mach\-O internals, optimization techniques, SIMD, security features, and practical macOS ARM64 programming, is freely available as PDF and HTML on Zenodo\.org: [https://doi\.org/10\.5281/zenodo\.20802832](https://doi.org/10.5281/zenodo.20802832) Project repository and companion code: [https://codeberg\.org/tpitner/arm\-macos](https://codeberg.org/tpitner/arm-macos)

How ARM64 Instructions Are Really Encoded

Similar Articles

Writing Portable ARM64 Assembly

Instruction decoding in the Intel 8087 floating-point chip

@Underfox3: In this paper is presented a comprehensive guide to the Apple Neural Engine, based on direct measurement on Apple silic…

Windows stack limit checking retrospective, follow-up

Dissecting Apple's Sparse Image Format (ASIF)

Submit Feedback

Similar Articles

Writing Portable ARM64 Assembly

Instruction decoding in the Intel 8087 floating-point chip

@Underfox3: In this paper is presented a comprehensive guide to the Apple Neural Engine, based on direct measurement on Apple silic…

Windows stack limit checking retrospective, follow-up

Dissecting Apple's Sparse Image Format (ASIF)