蓝屏,谁之过?

Bug总能在你意想不到的地方给你个措手不及,只是它所带来并不是惊喜,而是Blue Screen Of Death !

既如此,只能兵来将挡。

先介绍一下程序的大体流程:

NTSTATUS
XXXProcessDirents(…)
{    
    do {
        KeEnterCriticalRegion();
        ExAcquireResourceSharedLite(&fcb->Resource, TRUE);

        /* access several members of fcb structure */
        ExReleaseResourceLite(&fcb->Resource);
        KeLeaveCriticalRegion();

         XXXXProcessDirent(…);

    } while (list_is_not_empty(….));

    return status;
}

NTSTATUS
XXXXProcessDirent(…)
{
    HANDLE handle = NULL;
    XXXX_FILE_HEADE fileHead;
    ……

    /* open file */
    status = ZwCreateFile(&handle, GENERIC_READ, &oa, &iosb, NULL, 0,
                          FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE,
                          FILE_OPEN, 0, NULL, 0);

    /* read file header*/
    status = ZwReadFile(handle, ioevent, NULL, NULL, &iosb, (PVOID)&fileHead,
                        sizeof(XXXX_FILE_HEADE), &offset, NULL);

    /* check whether file is interesting to us */
    if (status == STATUS_SUCCESS && iosb.Information == sizeof(……)) {
        /* it’s my taste, haha */
    }

    /* close file, not interested in it any more */

    if (handle){
        ZwClose(handle);
    }

    return status;
}

过程比较简单,XXXProcessDirents()会循环调用XXXProcessDirent(),直至列表中所有项全检查完毕。

下面再来看windbg分析吧:

1: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

IRQL_NOT_LESS_OR_EQUAL (a)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high.  This is usually
caused by drivers using improper addresses.
If a kernel debugger is available get the stack backtrace.
Arguments:
Arg1: 0abc9867, memory referenced
Arg2: 00000002, IRQL
Arg3: 00000001, bitfield :
bit 0 : value 0 = read operation, 1 = write operation
bit 3 : value 0 = not an execute operation, 1 = execute operation (only on chips which support this level of status)
Arg4: 806e7a2a, address which referenced memory

Debugging Details:
------------------

WRITE_ADDRESS:  0abc9867

CURRENT_IRQL:  2

FAULTING_IP:
hal!KeAcquireInStackQueuedSpinLock+3a
806e7a2a 8902            mov     dword ptr [edx],eax

DEFAULT_BUCKET_ID:  DRIVER_FAULT

BUGCHECK_STR:  0xA

PROCESS_NAME:  System

TRAP_FRAME:  b9019bbc -- (.trap 0xffffffffb9019bbc)
ErrCode = 00000002
eax=b9019c40 ebx=00000000 ecx=c0000211 edx=0abc9867 esi=c0000128 edi=8842d268
eip=806e7a2a esp=b9019c30 ebp=b9019c68 iopl=0         nv up ei ng nz na pe nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00010286
hal!KeAcquireInStackQueuedSpinLock+0x3a:
806e7a2a 8902            mov     dword ptr [edx],eax  ds:0023:0abc9867=????????
Resetting default scope

LAST_CONTROL_TRANSFER:  from 806e7a2a to 80544768

STACK_TEXT:
b9019bbc 806e7a2a badb0d00 0abc9867 804f4e77 nt!KiTrap0E+0x238
b9019c68 806e7ef2 00000000 00000000 b9019c80 hal!KeAcquireInStackQueuedSpinLock+0x3a
b9019c68 b9019d24 00000000 00000000 b9019c80 hal!HalpApcInterrupt+0xc6
WARNING: Frame IP not in any known module. Following frames may be wrong.
b9019cf0 80535873 00000000 8896fb20 00000000 0xb9019d24
b9019d10 b79d87ff ba668a30 8859b7e8 00000440 nt!ExReleaseResourceLite+0x8d
b9019d2c b79d8a5c 8a3ff2f0 00000003 ba6685f0 XXXXX!XXXProcessDirents+0xef
b9019d88 b79e163a e2f6b170 00000001 00000001 XXXXX!XXXKernelQueryDirectory+0x20c
b9019ddc 8054616e b79e1530 88a8ae00 00000000 nt!PspSystemThreadStartup+0x34
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x16

问题出在系统函数ExReleaseResourceLite()及KeAcquireInStackQueuedSpinLock()上,且程序要写的地址为0abc9867 ,明显不对,所以此处可做栈损坏推断。

第一嫌疑要考虑的是,XXXProcessDirents()中有锁保护的部分,此部分是果真是最容易造成栈损坏buffer复制操作。但经过仔细检查及测试,便排除了此部分出错的可能。

在排除第一嫌疑后,就没有明显目标了。只好再接着看windbg log:

貌似KeAcquireInStackQueuedSpinLock()要写的地址是LockHandle的LockQueue->Next,而LockHandle一般都在从当前堆栈分配的,由此可肯定之前对于栈损坏的推断。可问题是,是谁导致的栈损坏。

Stack中有hal!HalpApcInterrupt()调用记录,它是处理APC的软中断。hal!HalpApcInterrupt()会一般会调用nt!KiDeliverApc()来处理线程的APC队列。但当ExReleaseResourceLite()调用的时候,线程还处于临界区内(Critical Section),此时User mode APC及Kernel mode normal APC都会被禁止的,但Kernel mode special APC不会。

Kernel Special APC最常见的情况便是由IoCompleteRequest()添加的:在APC Level中调用IopCompleteRequest()以处理Irp的Stage 2的清理工作。

至此,问题终于有些眉目了。分析代码中唯一有可能导致APC添加的地方就在函数XXXXProcessDirent()中的ZwReadFile()调用,而且fileHead正是于堆栈中分配的。

想到此处,此bug的根据原因便付出水面:

XXXXProcessDirent()没有处理ZwReadFile()返回STATUS_PENDING的情况,此情形下,XXXXProcessDirent()退出并继续执行,而之前的ZwReadFile()的IRP完成操作也在同时进行(还没有完成),并且此完成操作所要写的fileHead地址,正是早已被回收并加以重用的当前栈。

搞清楚之后,便在调用ZwReadFile()后,特别针对STATUS_PENING的情况来调用ZwWaitForSingleObject()以确保读操作全部完成后,再进行下一步操作。

到此,问题解决!

一个蓝屏的问题,竟然如此之绕,不禁让我想起刘震云的《一句顶一万句》,只是这能顶一万句的一句到底是哪句呢?

<下一步打算写写APC相关的东西,操作系统将APC隐藏得太深,总让人捉摸不定!>

慎用MmSetAddressRangeModified

MmSetAddressRangeModified用来设置PFN为dirty/modified,并将PTE的dirty位清除。但除此之外,还有个不明显的副作用,看下面的分析:

1: kd> !pte 0xfffff880`0c9e6000
                                 VA fffff8800c9e6000
PXE @ FFFFF6FB7DBEDF88     PPE at FFFFF6FB7DBF1000    PDE at FFFFF6FB7E200320    PTE at FFFFF6FC40064F30
contains 000000003FE84863  contains 000000003FE83863  contains 0000000014516863  contains 000000000FAD7963
pfn 3fe84      ---DA--KWEV  pfn 3fe83      ---DA--KWEV  pfn 14516      ---DA--KWEV  pfn fad7       -G-DA--KWEV

PTE entry 状态为dirty,并且是可写的(writable)。 再看调用MmSetAddressRangeModified后的状态:

1: kd> !pte 0xfffff880`0c9e6000
                                 VA fffff8800c9e6000
PXE @ FFFFF6FB7DBEDF88     PPE at FFFFF6FB7DBF1000    PDE at FFFFF6FB7E200320    PTE at FFFFF6FC40064F30
contains 000000003FE84863  contains 000000003FE83863  contains 0000000014516863  contains 000000000FAD7921
pfn 3fe84      ---DA--KWEV  pfn 3fe83      ---DA--KWEV  pfn 14516      ---DA--KWEV  pfn fad7       -G--A--KREV

PTE entry的dirty位已被清除,但是此pte已被设成了readonly状态了。所以如果再有写操作,必然会导致page fault发生。

这就是我曾遇到的一个Ext2Fsd的bug:Ext2Fsd为了将page cache锁定,创建了MDL并重新映射到系统空间(调用MmMapLockedPagesSpecifyCache)。新映射的va具有dirty及writable属性,故此va在spinlock (DISPATCH_LEVEL)下进行写操作不会导致任何异常。但在提交改动过程中,Ext2Fsd调用了MmSetAddressRangeModified,调用后MmSetAddressRangeModified会将此pte设置为readonly,如果下一次的写操作正好在spinlock下(DISPATCH_LEVEL),将会导致BSOD: DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1),如果在获取spinlock前曾执行过写操作(IRQL < DISPATCH_LEVEL),则会正常触发page fault,然后MmAccessFault会重置pte为writeable,并设置dirty位,此后如果再进入DISPATCH_LEVEL,对此va进行写操作便不会触发page fault了。这就构成了一定的随机性和隐蔽性,给调试带来了很大的麻烦。

明白了问题所在,不妨再做个实验:如果手工将此pte设为writeable的,再进行写操作,cpu应该直接置pte为dirty,而不必调用OS(即page fault)。

对va 0xfffffa60`04ae7000 调用MmSetAddressRangeModified后,

1: kd> !pte 0xfffffa60`04ae7000
                                 VA fffffa6004ae7000
PXE @ FFFFF6FB7DBEDFA0     PPE at FFFFF6FB7DBF4C00    PDE at FFFFF6FB7E980128    PTE at FFFFF6FD30025738
contains 000000007FFC4863  contains 000000007FFC3863  contains 00000000539A2863  contains 00000000149AD921
pfn 7ffc4      ---DA--KWEV  pfn 7ffc3      ---DA--KWEV  pfn 539a2      ---DA--KWEV  pfn 149ad      -G--A—KREV

手工修改 0xfffffa60`04ae7000为writeable,不必置dirty标志:
1: kd> dq FFFFF6FD30025738 l1
fffff6fd`30025738  00000000`149ad921

1: kd> eb FFFFF6FD30025738 23
1: kd> !pte 0xfffffa60`04ae7000
                                 VA fffffa6004ae7000
PXE @ FFFFF6FB7DBEDFA0     PPE at FFFFF6FB7DBF4C00    PDE at FFFFF6FB7E980128    PTE at FFFFF6FD30025738
contains 000000007FFC4863  contains 000000007FFC3863  contains 00000000539A2863  contains 00000000149AD923
pfn 7ffc4      ---DA--KWEV  pfn 7ffc3      ---DA--KWEV  pfn 539a2      ---DA--KWEV  pfn 149ad      -G--A--KWEV

在进行写操作前可以对KiPageFault或MmAccessFault设置断点,然后进行实验。看断点会不会触发,操作后再检查一下pte的dirty标志是不是已经设置了。具体实验结果,就留给读者自己去验证了。

在DISPATCH_LEVEL中操作paged va无论是已将其page锁定还是重新映射过,都有点如履薄冰的感觉,特别是对file cache,Cache Manager的不少内部操作都会更改pte的属性或调用MmSetAddressRangeModified,到处都可能有陷阱。所以最保险的方式还是不用spinlock 微笑