A Half Day

Pingping (瓶瓶) is a software engineer, as thousands of other software engineers, everyday he codes, everyday he gets bugs, so everyday he de-bugs.

Today is just a normal day that every normal software engineer would normally have.                                                  

This afternoon, when testing his software, trying to save something he randomly typed in windows notepad, he encountered an error. To confirm, he restarted a new instance of windows notepad, and got the same error.

It's an obvious bug.

So he debug.

Soon he was absorbed in his interesting enterprise of debugging. One round after another, every restart he typed something to verify, until the end of the bug.

It was already mid-night. On his computer's screen, was a long log in windows notepad.

I’m too somehow a night bird. When I occasionally saw it and thought it was interesting, then I made a copy, because I know, pingping is a guy who is always seeing forward, he would let it go and ghost his test system to a fresh, with all data destroyed.

So luck for you, here's the log, also for pingping if he isn't coding or debugging right now.

Round 1, 2012/12/07 13:06
Hi buggy, I'm waiting for you … 
   
Round 2, 2012/12/07 13:10
Now we are acquainted. But I still don’t know where you are from and when you arrive here. Tell me your story, let me know you, Pleaseeeee !

Round 3, 2012/12/07 13:17
You are really a hard nut, just my taste, I'm nailing up you.

Round 4, 2012/12/07 13:26
Huh, you are so deeply hid. But anyway, it's just a matter of time for me to get you, shorter or short, haha.

Round 5, 2012/12/07 13:39
What a doggy luck you have today ! But you are almost running out of it, I'm sure of that.

Round 6, 2012/12/07 14:02
Ok, fine, fine, you are good, really good. Unfortunately, I will NOT give up.

Round 7, 2012/12/07 14:30
Now I know you better, know you more. Don't bother to hide any more.

Round 8, 2012/12/07 14:37
You are good at hiding, but remember I'm damn good at seeking, top level in the world.

Round 9, 2012/12/07 14:52
Once i catch you, I'll screw your neck, and squeeze your head. So pray now before it's too late.

Round 10, 2012/12/07 15:09
pang !

Round 11, 2012/12/07 15:29
peng !

Round 12, 2012/12/07 15:40
bang !

Round 13, 2012/12/07 16:01
Yes, I'm still here, with full eyes on you. You, little damn buggy, show yourself !

Round 14, 2012/12/07 16:20
bang ……

Round 15, 2012/12/07 16:32
peng ……

Round 16, 2012/12/07 16:50
pang ……

Round 17, 2012/12/07 17:03
Ah, finally, I got you. now you know I'm a damn good seeker.

Round  18, 2012/12/07 17:09
Hurrah !

Round  19, 2012/12/07 17:16
Yippee, did one more test just to confrim. I'm quite satisfied of your disapperance.

…………

Round 20, 2012/12/07 18:00
I'm back, little buggy, are you there ?

Round 21, 2012/12/07 18:15
I'm back again, little buggy, can you hear me ?

Round 22, 2012/12/07 18:20
where are you ? playing with another software engineer ? is he cool ?

Round 23, 2012/12/07 18:25
???

Round 24, 2012/12/07 18:30
I know now, just because of the half day we stay together, a bug is not only a bug, but a present of a secret. Of all bugs, you are the secret of all secrets.

Round 25, 2012/12/07 18:40
I … I,  starts missing you.  

Round 26, 2012/12/07 19:00
Hi there, if you can hear me. I'm here to say goodbye. I must go now, another bug was just coming. For the first time of the life, I just feel so sad to leave. May I see you again ?

< Dec. 07 night, on the train from Beijing to Jining Shandong. >

Continue reading » · Rating: · Written on: 12-10-12 · 1 Comment »

再谈_chkstk

WDK针对XP系统的build环境,在有_chkstk产生的情况下,link阶段会失败,因为无法解析引入符号__chkstk。

可我的驱动程序却能成功创建,只是在XP系统上加载时失败。检测并证实winxp的ntoskrnl.lib并没有__chkstk的输出。但是,wdmsec.lib的输出表中含有__chkstk,而驱动在SOURCE文件中指定了要加载wdmsec.lib,所以针对XP系统的build能够成功。

针对Win7系统,ntoskrnl.lib中已包含__chkstk符号,所以针对Win7所做的build不会报警。

为了让你的驱动正强壮,有意的针对XP系统编译一下你的驱动,将更容易检查出此类堆栈使用上的问题。

Continue reading » · Rating: · Written on: 12-09-12 · No Comments »

从_chkstk说起,谈谈用户栈的管理

上周在XP系统上测试一个驱动的时候,发现驱动加载不上,“net start”命令只是给出一个无意义的错误代码,驱动的DriverEntry()入口程序还没有得到机会运行。

初步怀疑是引入函数的问题,用WDK工具Depends.exe查看了一下驱动文件,果然是由于_chkstk函数无法解析所导致。

_chkstk是个微软C编译器的辅助库函数,MSDN上对其介绍十分简略:

_chkstk Routine is a helper routine for the C compiler. For x86 compilers, _chkstk Routine is called when the local variables exceed 4096 bytes; for x64 compilers it is 8K.

当编译器察觉到局部变量太大超过限值时(X86系统限值是4K,X64t系统上是8K), 编译器会自动插入_chkstk这个函数以保证栈空间所使用页面在内存中。

问题是发现了,但要查出来究竟在哪个函数中还是要费些心思的。从用户层移植过来不少代码,基本锁定问题出在其中,但如果一个函数一个函数寻找实在是个不讨巧的笨办法,也不符合程序人的一贯风格,便用IDA反编译驱动sys文件,于汇编代码中搜索_chkstk字串,直接锁定出了问题函数。此函数所使用的一个结构体中定义了超大数组,对栈的超常使用在内核中是相当危险的。

解决办法很简单,直接将此结构的定义放在一个从内存分配的结构中即可。问题虽已解决,但对于DDK中有关_chkstk的描述,及其相关的疑问一直让我觉得困扰,比如,为什么X86上是4K,而AMD64架构上可以是8K。

这两天终于有了时间,可以彻底地了结这个疑问了。

要想解决这个问题,还要先从用户栈的分配开始。以ReactOS代码为例,当线程创建时,CreateThread()会调用BasepCreateStack()来创建用户栈,具体可以参见ReactOS源码:
~/ReactOS/lib/kernel32/misc/utils.c。

BasepCreateStack()函数主要做三件事:

  1. 1,分配栈空间所需的虚拟内存,大小为Stack Reserve Size
  2. 2,根据Stack Commit Size锁定内存页面,如果Stack Commit Size小于Stack Reserve Size的话,需要增加一个Page,这个额外申请的Page用作Guard Page之用。
  3. 3,将栈底部的Page设定为Guard Page。

当用户栈被用尽时,会访问到栈底部的Guard Page。而对Guard Page的任何访问都会导致Page Fault的发生。Page Fault处理函数MmAccessFault()可以分析出此次Page Fault是由Guard Page导致,便会默认由用户栈处理程序MiCheckForUserStackOverflow()来处理。如果用户栈并没有溢出的话,即Stack Commit Size小于Stack Reserve Size的情况,MiCheckForUserStackOverflow()会自动向下扩展栈空间,扩展大小为GUARD_PAGE_SIZE。 GUARD_PAGE_SIZE针对不同的CPU架构有不同的定义:
X64:  #define GUARD_PAGE_SIZE   (PAGE_SIZE * 2)
X86:  #define GUARD_PAGE_SIZE   PAGE_SIZE

这里便解释了为什么X86系统上的限制是4K(即PAGE_SIZE),而X64上却为8K的原因。

说到此处,该是解答_chkstk()倒底是干什么的时候了。Visual Studio中有_chkstk的源码,以x86为例:

输入参数eax是所需堆栈大小(字节)
labelP  _chkstk,       PUBLIC

        push    ecx                          ; save ecx
        cmp     eax,_PAGESIZE_     ; more than one page requested?
        lea     ecx,[esp] + 8           ;   compute new stack pointer in ecx
                                                   ;   correct for return address and
                                                   ;   saved ecx
        jb      short lastpage           ; no

;------------

probepages:
        sub     ecx,_PAGESIZE_          ; yes, move down a page
        sub     eax,_PAGESIZE_          ; adjust request and...

        test    dword ptr [ecx],eax     ; ...probe it  (如果是guard page,刚会导致page fault,最终用户栈
                                                        ;  将向下扩展一个页面)

        cmp     eax,_PAGESIZE_          ; more than one page requested?
        jae     short probepages          ; no

lastpage:
        sub     ecx,eax                  ; move stack down by eax
        mov     eax,esp                 ; save current tos and do a...

        test    dword ptr [ecx],eax     ; ...probe in case a page was crossed
                                                        ;  调用函数将要访问的堆栈底部 ,如果此页面为guard page,同
                                                        ;  样会导致用户栈的向下延伸

        mov     esp,ecx                        ; set the new stack pointer
                                                        ; 向下更改栈指针,其上直到原ESP的栈空间为调用函数局部变量

        mov     ecx,dword ptr [eax]     ; recover ecx
        mov     eax,dword ptr [eax + 4] ; recover return address
                                                            ; 将返回地址(调用函数中)放入eax

        push    eax                     ; prepare return address
                                              ; 将返回地址(调用函数中)放入当前栈中,准备返回

                                              ; ...probe in case a page was crossed
        ret

        end

_chkstk()的主要作用是保证栈向下连续的生长。如果没有_chkstk(),当局部变量太多并超过guard page下沿时,若再有压栈操作,将会导致Access violation错误。因为此时堆栈内存页面无效,压栈直接将导致page fault的发生,而page fault处理程序因不能识别此fault的发生原因从而不能做出正确判断和有效处理。

相对用户层,内核程序的处理则相当简单,就如Win7内核中_chkstk实际上就是个空函数。其原因就是内核线程的栈空间是固定的。其取值针对X86及X64架构亦有所不同:
X64: #define KERNEL_STACK_SIZE 0x6000   /* 6个内存页面 */
X86: #define KERNEL_STACK_SIZE 12288    /* 3个内存页面 */

内核中栈资源非常紧缺,并驱动程序的编写有较高的要求,特别是有递归的情况下,一定要注意嵌套的层数,否则很容易收到M$发来的蓝屏。

Windows内核中其实还有一种大堆栈机制,以确保一些对堆栈较高消耗的特殊情况能够得到满足,但这部分完全是黑箱,对用户不可见,不是常见情况,此处不再多述。

参考资料:
1, http://support.microsoft.com/kb/100775/en
2, http://msdn.microsoft.com/en-us/library/ms648426(v=vs.85).aspx
3, http://www.reactos.org  ReactOS源码

Continue reading » · Rating: · Written on: 12-09-12 · No Comments »

月球科普

月亮天天见,但倒底对它了解多少呢?

晚上从办公室回来的路上,一抬头,东面的天空上挂着一轮超级大的月亮,虽然不圆,但大得惊人,我曾一度怀疑是高楼上的灯光。

以前看过不少月亮,弯的、圆的、明的、暗的,但其大小在记忆中却非常模糊。今天猛然间看到如此个头,真有点吃惊,莫非因为2012,the end of the world ?

随即在平板上查了查数据,才发觉对这个离我们最近的星球知之甚少。

月球的轨道接近圆形,或者说椭的并不厉害,不像地球绕太阳的公转轨道(即黄道)那样。但即是椭圆,就有近、远地点之分。

今年最大的月亮(最近的近地点)出现在2012/5/5,月亮绕地球公转的周期为27.321661天,这样算来,上次近地点出现在11月12号左右,下次将在12/10号左右。

今天才12月3号,理论上来说,今天的月亮并不是本月中最大的,看来是认知的问题,才让我觉得今天月亮超大,或者是因为终于将程序的I/O部分调通、心情舒畅的原因?!

今天的阴历日期是10月20,离15的圆月刚过了5天,与11月12日相差太远,所以月亮的圆与缺,与其绕地球公转的近、远地点并不是直接相关的。月亮的亮与圆,取决于月球、地球与太阳的相对位置,而月球轨道的近地点、远地点只取决于地球、月球母子俩。

而2012年的5月5号正是二者的重合,时值立夏,阴历4月15号。

More Moon Facts: http://www.moontoday.net/facts.html

Continue reading » · Rating: · Written on: 12-03-12 · No Comments »

人在北京:风雪箭扣

今天和同事老顾、曙光一起走了走箭扣野长城,下午时分山上开始下雪,遂取消去慕田峪计划并提前下撤。

从田仙峪(卧佛山庄)至正北楼的两条线都不是很好走,石头、崖壁较多,再加上下雪,气温较低,一路走来比较辛苦。

到达正北楼后,老顾、曙光原地休息,我一人前往箭扣。“铁梯子”处只剩下光滑的垂直崖壁,从上面攀下去,真是让人胆寒;等过了小布达拉,天开始下雪,雪花越聚越多,只得返回与同事汇合。

后程老顾右膝一直疼得厉害,走下来相当艰难。曙光有些恐高,下降过程中有段用绳子陡坡差点让他绝望,经此一次,料他恐高的阈值又能提高不少。等我们安全下山至停车处是天刚刚黑,一切刚刚好。

全程6.5KM,海拔上升/下降1000M。

20121201-风雪箭扣箭扣路线图

 

箭扣-P1050579正北楼箭扣-P1050585箭扣-P1050599箭扣-P1050606箭扣-P1050610箭扣-P1050616箭扣-P1050619箭扣-P1050620箭扣-P1050621箭扣-P1050623正北楼(回程)

箭扣铁梯子:

箭扣-铁梯子201010箭扣-铁梯子201212

Continue reading » · Rating: · Written on: 12-01-12 · 1 Comment »