=========================================
Lightweight Fault Isolation (LFI) in LLVM
=========================================

.. contents::
   :local:

Introduction
++++++++++++

Lightweight Fault Isolation (LFI) is a compiler-based sandboxing technology for
native code. Like WebAssembly and Native Client, LFI isolates sandboxed code in-process
(i.e., in the same address space as a host application).

LFI is designed from the ground up to sandbox existing code, such as C/C++
libraries (including assembly code) and device drivers.

LFI aims for the following goals:

* Compatibility: LFI can be used to sandbox nearly all existing C/C++/assembly
  libraries unmodified (they just need to be recompiled). Sandboxed libraries
  work with existing system call interfaces, and are compatible with existing
  development tools such as profilers, debuggers, and sanitizers.
* Performance: LFI aims for minimal overhead vs. unsandboxed code.
* Security: The LFI runtime and compiler elements aim to be simple and
  verifiable when possible.
* Usability: LFI aims to make it as easy as possible to retrofit sandboxing,
  i.e., to migrate from unsandboxed to sandboxed libraries with minimal effort.

When building a program for the LFI target the compiler is designed to ensure
that the program will only be able to access memory within a limited region of
the virtual address space, starting from where the program is loaded (the
current design sets this region to a size of 4GiB of virtual memory). Programs
built for the LFI target are restricted to using a subset of the instruction
set, designed so that the programs can be soundly confined to their sandbox
region. LFI programs must run inside of an "emulator" (usually called the LFI
runtime), responsible for initializing the sandbox region, loading the program,
and servicing system call requests, or other forms of runtime calls.

LFI uses an architecture-specific sandboxing scheme based on the general
technique of Software-Based Fault Isolation (SFI). Initial support for LFI in
LLVM is focused on the AArch64 platform, with x86-64 support planned for the
future. The initial version of LFI for AArch64 is designed to support the
Armv8.1 AArch64 architecture.

See `https://github.com/lfi-project <https://github.com/lfi-project/>`__ for
details about the LFI project and additional software needed to run LFI
programs.

Compiler Requirements
+++++++++++++++++++++

When building for the ``aarch64_lfi`` target, the compiler must restrict use of
the instruction set to a subset of instructions, which are known to be safe
from a sandboxing perspective. To do this, we apply a set of simple rewrites at
the assembly language level to transform standard native AArch64 assembly into
LFI-compatible AArch64 assembly.

These rewrites (also called "expansions") are applied at the very end of the
LLVM compilation pipeline (during the assembler step). This allows the rewrites
to be applied to hand-written assembly, including inline assembly.

Context Register
++++++++++++++++

Both architectures designate a context register that points to a block of
thread-local memory managed by the LFI runtime. The context register is ``x25``
on AArch64 and ``r15`` on X86-64. The layout is as follows:

+--------+--------+----------------------------------------------+
| Offset | Size   | Description                                  |
+--------+--------+----------------------------------------------+
| 0      | 8      | Reserved for future use.                     |
+--------+--------+----------------------------------------------+
| 8      | 8      | Reserved for use by the LFI runtime.         |
+--------+--------+----------------------------------------------+
| 16     | 8      | Virtual thread pointer (used for TP access). |
+--------+--------+----------------------------------------------+

Linker Support
++++++++++++++

In the initial version, LFI only supports static linking, and only supports
creating ``static-pie`` binaries. There is nothing that fundamentally precludes
support for dynamic linking on the LFI target, but such support would require
that the code generated by the linker for PLT entries be slightly modified in
order to conform to the LFI architecture subset.

Assembler Directives
++++++++++++++++++++

The following directives are supported for controlling the rewriter.

``.lfi_rewrite_disable``
========================

Disables LFI assembly rewrites for all subsequent instructions, until
``.lfi_rewrite_enable`` is used. This can be useful for hand-written assembly
that is already safe and should not be modified by the rewriter.

``.lfi_rewrite_enable``
=======================

Re-enables LFI assembly rewrites after a previous ``.lfi_rewrite_disable``.

Example:

.. code-block:: gas

  .lfi_rewrite_disable
  // No rewrites applied here.
  ldr x0, [x27, w1, uxtw]
  .lfi_rewrite_enable

Compiler Options
++++++++++++++++

**Note**: these options are not yet implemented.

The LFI target has several configuration options, specified via ``-mattr=``:

* ``+no-lfi-loads``: Disable sandboxing for load instructions (stores-only mode).
* ``+no-lfi-stores``: Disable sandboxing for store instructions.

Use ``+no-lfi-loads`` to create a "stores-only" sandbox that may read, but not
write, outside the sandbox region.

Use ``+no-lfi-loads,+no-lfi-stores`` to create a "jumps-only" sandbox that may
read/write outside the sandbox region but may not transfer control outside
(e.g., may not execute system calls directly). This is primarily useful in
combination with some other form of memory sandboxing, such as Intel MPK.

AArch64
+++++++

The AArch64 LFI target is ``aarch64_lfi``. This is the first part of a target
triple that can be used with ``--triple=aarch64_lfi-<rest of triple>``.

Reserved Registers
==================

The AArch64 LFI target uses a custom ABI that reserves additional registers for
the platform. The registers are listed below, along with the security invariant
that must be maintained.

* ``x27``: always holds the sandbox base address (must be aligned to the size
  of the sandbox).
* ``x28``: always holds an address within the sandbox.
* ``sp``: always holds an address within the sandbox.
* ``x30``: always holds an address within the sandbox.
* ``x26``: scratch register.
* ``x25``: context register (see `Context Register`_).

The current design only supports 4GiB sandboxes, which requires the sandbox
base address to be 4GiB-aligned. This is because LFI's ABI stores pointers as
their full 64-bit values, rather than just 32-bit offsets from the base. This
enables stores-only mode, where loads are not sandboxed but stores are, and
allows the host to directly pass pointers to the sandbox.

Assembly Rewrites
=================

Terminology
~~~~~~~~~~~

In the following assembly rewrites, some shorthand is used.

* ``xN`` or ``wN``: refers to any general-purpose non-reserved register.
* ``{a,b,c}``: matches any of ``a``, ``b``, or ``c``.
* ``LDSTr``: a load/store instruction that supports register-register addressing modes, with one source/destination register.
* ``LDSTx``: a load/store instruction not matched by ``LDSTr``.

Control flow
~~~~~~~~~~~~

Indirect branches get rewritten to branch through register ``x28``, which must
always contain an address within the sandbox. An ``add`` is used to safely
update ``x28`` with the destination address. Since ``ret`` uses ``x30`` by
default, which already must contain an address within the sandbox, it does not
require any rewrite.

+--------------------+---------------------------+
|      Original      |         Rewritten         |
+--------------------+---------------------------+
| .. code-block::    | .. code-block::           |
|                    |                           |
|    {br,blr,ret} xN |    add x28, x27, wN, uxtw |
|                    |    {br,blr,ret} x28       |
|                    |                           |
+--------------------+---------------------------+
| .. code-block::    | .. code-block::           |
|                    |                           |
|    ret             |    ret                    |
|                    |                           |
+--------------------+---------------------------+

Memory accesses
~~~~~~~~~~~~~~~

**Note**: not yet implemented.

Memory accesses are rewritten to use the ``[x27, wM, uxtw]`` addressing mode if
it is available, which is automatically safe. Otherwise, rewrites fall back to
using ``x28`` along with an instruction to safely load it with the target
address.

+---------------------------------+-------------------------------+
|            Original             |           Rewritten           |
+---------------------------------+-------------------------------+
| .. code-block::                 | .. code-block::               |
|                                 |                               |
|    LDSTr xN, [xM]               |    LDSTr xN, [x27, wM, uxtw]  |
|                                 |                               |
+---------------------------------+-------------------------------+
| .. code-block::                 | .. code-block::               |
|                                 |                               |
|    LDSTr xN, [xM, #I]           |    add x28, x27, wM, uxtw     |
|                                 |    LDSTr xN, [x28, #I]        |
|                                 |                               |
+---------------------------------+-------------------------------+
| .. code-block::                 | .. code-block::               |
|                                 |                               |
|    LDSTr xN, [xM, #I]!          |    add xM, xM, #I             |
|                                 |    LDSTr xN, [x27, wM, uxtw]  |
|                                 |                               |
+---------------------------------+-------------------------------+
| .. code-block::                 | .. code-block::               |
|                                 |                               |
|    LDSTr xN, [xM], #I           |    LDSTr xN, [x27, wM, uxtw]  |
|                                 |    add xM, xM, #I             |
|                                 |                               |
+---------------------------------+-------------------------------+
| .. code-block::                 | .. code-block::               |
|                                 |                               |
|    LDSTr xN, [xM1, xM2]         |    add x26, xM1, xM2          |
|                                 |    LDSTr xN, [x27, w26, uxtw] |
|                                 |                               |
+---------------------------------+-------------------------------+
| .. code-block::                 | .. code-block::               |
|                                 |                               |
|    LDSTr xN, [xM1, xM2, MOD #I] |    add x26, xM1, xM2, MOD #I  |
|                                 |    LDSTr xN, [x27, w26, uxtw] |
|                                 |                               |
+---------------------------------+-------------------------------+
| .. code-block::                 | .. code-block::               |
|                                 |                               |
|    LDSTx ..., [xM]              |    add x28, x27, wM, uxtw     |
|                                 |    LDSTx ..., [x28]           |
|                                 |                               |
+---------------------------------+-------------------------------+
| .. code-block::                 | .. code-block::               |
|                                 |                               |
|    LDSTx ..., [xM, #I]          |    add x28, x27, wM, uxtw     |
|                                 |    LDSTx ..., [x28, #I]       |
|                                 |                               |
+---------------------------------+-------------------------------+
| .. code-block::                 | .. code-block::               |
|                                 |                               |
|    LDSTx ..., [xM, #I]!         |    add x28, x27, wM, uxtw     |
|                                 |    LDSTx ..., [x28, #I]       |
|                                 |    add xM, xM, #I             |
|                                 |                               |
+---------------------------------+-------------------------------+
| .. code-block::                 | .. code-block::               |
|                                 |                               |
|    LDSTx ..., [xM], #I          |    add x28, x27, wM, uxtw     |
|                                 |    LDSTx ..., [x28]           |
|                                 |    add xM, xM, #I             |
|                                 |                               |
+---------------------------------+-------------------------------+
| .. code-block::                 | .. code-block::               |
|                                 |                               |
|    LDSTx ..., [xM1], xM2        |    add x28, x27, wM1, uxtw    |
|                                 |    LDSTx ..., [x28]           |
|                                 |    add xM1, xM1, xM2          |
|                                 |                               |
+---------------------------------+-------------------------------+

Stack pointer modification
~~~~~~~~~~~~~~~~~~~~~~~~~~

**Note**: not yet implemented.

When the stack pointer is modified, we write the modified value to a temporary,
before moving it back into ``sp`` with a safe ``add``.

+------------------------------+-------------------------------+
|           Original           |           Rewritten           |
+------------------------------+-------------------------------+
| .. code-block::              | .. code-block::               |
|                              |                               |
|    mov sp, xN                |    add sp, x27, wN, uxtw      |
|                              |                               |
+------------------------------+-------------------------------+
| .. code-block::              | .. code-block::               |
|                              |                               |
|    {add,sub} sp, sp, {#I,xN} |    {add,sub} x26, sp, {#I,xN} |
|                              |    add sp, x27, w26, uxtw     |
|                              |                               |
+------------------------------+-------------------------------+

Link register modification
~~~~~~~~~~~~~~~~~~~~~~~~~~~

When the link register is modified, we write the modified value to a
temporary, before loading it back into ``x30`` with a safe ``add``.

+---------------------------+-------------------------------+
|         Original          |           Rewritten           |
+---------------------------+-------------------------------+
| .. code-block::           | .. code-block::               |
|                           |                               |
|    ldr x30, [...]         |    ldr x30, [...]             |
|    ret                    |    add x30, x27, w30, uxtw    |
|                           |    ret                        |
|                           |                               |
+---------------------------+-------------------------------+
| .. code-block::           | .. code-block::               |
|                           |                               |
|    ldp xN, x30, [...]     |    ldp xN, x30, [...]         |
|    ret                    |    add x30, x27, w30, uxtw    |
|                           |    ret                        |
|                           |                               |
+---------------------------+-------------------------------+

System instructions
~~~~~~~~~~~~~~~~~~~

System calls are rewritten into a sequence that loads the address of the first
runtime call entrypoint and jumps to it. The runtime call entrypoint table is
stored at a negative offset from the sandbox base, so it can be referenced by
``x27``. The rewrite also saves and restores the link register, since it is
used for branching into the runtime.

+-----------------+------------------------------+
|    Original     |          Rewritten           |
+-----------------+------------------------------+
| .. code-block:: | .. code-block::              |
|                 |                              |
|    svc #0       |    mov x26, x30              |
|                 |    ldur x30, [x27, #-8]      |
|                 |    blr x30                   |
|                 |    add x30, x27, w26, uxtw   |
|                 |                              |
+-----------------+------------------------------+

Thread pointer (TP)
~~~~~~~~~~~~~~~~~~~

TP accesses are rewritten into loads/stores from the context register
(``x25``), which holds the virtual thread pointer at offset 16 (see
`Context Register`_).

+----------------------+-------------------------+
|       Original       |        Rewritten        |
+----------------------+-------------------------+
| .. code-block::      | .. code-block::         |
|                      |                         |
|    mrs xN, tpidr_el0 |    ldr xN, [x25, #16]   |
|                      |                         |
+----------------------+-------------------------+
| .. code-block::      | .. code-block::         |
|                      |                         |
|    msr tpidr_el0, xN |    str xN, [x25, #16]   |
|                      |                         |
+----------------------+-------------------------+

Optimizations
=============

Basic guard elimination
~~~~~~~~~~~~~~~~~~~~~~~

**Note**: not yet implemented.

If a register is guarded multiple times in the same basic block without any
modifications to it during the intervening instructions, then subsequent guards
can be removed.

+---------------------------+---------------------------+
|         Original          |         Rewritten         |
+---------------------------+---------------------------+
| .. code-block::           | .. code-block::           |
|                           |                           |
|    add x28, x27, wN, uxtw |    add x28, x27, wN, uxtw |
|    ldur xN, [x28]         |    ldur xN, [x28]         |
|    add x28, x27, wN, uxtw |    ldur xN, [x28, #8]     |
|    ldur xN, [x28, #8]     |    ldur xN, [x28, #16]    |
|    add x28, x27, wN, uxtw |                           |
|    ldur xN, [x28, #16]    |                           |
|                           |                           |
+---------------------------+---------------------------+

Address generation
~~~~~~~~~~~~~~~~~~

**Note**: not yet implemented.

Addresses to global symbols in position-independent executables are frequently
generated via ``adrp`` followed by ``ldr``. Since the address generated by
``adrp`` can be statically guaranteed to be within the sandbox, it is safe to
directly target ``x28`` for these sequences. This allows the omission of a
guard instruction before the ``ldr``.

+----------------------+-----------------------+
|       Original       |       Rewritten       |
+----------------------+-----------------------+
| .. code-block::      | .. code-block::       |
|                      |                       |
|    adrp xN, target   |    adrp x28, target   |
|    ldr xN, [xN, imm] |    ldr xN, [x28, imm] |
|                      |                       |
+----------------------+-----------------------+

Stack guard elimination
~~~~~~~~~~~~~~~~~~~~~~~

**Note**: not yet implemented.

If the stack pointer is modified by adding/subtracting a small immediate, and
then later used to perform a memory access without any intervening jumps, then
the guard on the stack pointer modification can be removed. This is because the
load/store is guaranteed to trap if the stack pointer has been moved outside of
the sandbox region.

+---------------------------+---------------------------+
|         Original          |         Rewritten         |
+---------------------------+---------------------------+
| .. code-block::           | .. code-block::           |
|                           |                           |
|    add x26, sp, #8        |    add sp, sp, #8         |
|    add sp, x27, w26, uxtw |    ... (same basic block) |
|    ... (same basic block) |    ldr xN, [sp]           |
|    ldr xN, [sp]           |                           |
|                           |                           |
+---------------------------+---------------------------+

Guard hoisting
~~~~~~~~~~~~~~

**Note**: not yet implemented.

In certain cases, guards may be hoisted outside of loops.

+-----------------------+-------------------------------+
|       Original        |           Rewritten           |
+-----------------------+-------------------------------+
| .. code-block::       | .. code-block::               |
|                       |                               |
|        mov w8, #10    |        mov w8, #10            |
|        mov w9, #0     |        mov w9, #0             |
|    .loop:             |        add x28, x27, wM, uxtw |
|        add w9, w9, #1 |    .loop:                     |
|        ldr xN, [xM]   |        add w9, w9, #1         |
|        cmp w9, w8     |        ldr xN, [x28]          |
|        b.lt .loop     |        cmp w9, w8             |
|    .end:              |        b.lt .loop             |
|                       |    .end:                      |
|                       |                               |
+-----------------------+-------------------------------+

References
++++++++++

For more information, please see the following resources:

* `LFI project page <https://github.com/lfi-project/>`__
* `LFI RFC <https://discourse.llvm.org/t/rfc-lightweight-fault-isolation-lfi-efficient-native-code-sandboxing-upstream-lfi-target-and-compiler-changes/88380>`__
* `LFI paper <https://zyedidia.github.io/papers/lfi_asplos24.pdf>`__

Contact info:

* Zachary Yedidia - zyedidia@cs.stanford.edu
* Tal Garfinkel - tgarfinkel@google.com
* Sharjeel Khan - sharjeelkhan@google.com