文件系统是一类特殊的内核驱动,主要负责数据流(即文件)的管理。用户的视图所显示出的是各色各样的文件和目录,但最底层的存储设备则是以扇区为单位的连续的存储实体(如硬盘,光盘等),文件系统便处于二者中间,负责管理文件/目录在设备存储中的解析与定位,因此也可以将文件系统看作是一个转换机。
不同类型的驱动都有不同的功能实现,并且在驱动加载时都要向操作系统注册,报告它是个什么类型的驱动及要管理什么样的设备等等。就如一个电饭煲驱动,要实现的功能大致有开关电源,温控及定时等功能,加载时可以向系统注册为烹调类设备的驱动。文件系统驱动也大致如此,只是功能更复杂,与内核的关系也更紧密,特别是和Virtual Memory, Cache Management及I/O Subsystem的关系和交互。
不同的操作系统对文件系统的功能实现有不同的规范和要求,但从最基本的需求出发,各种文件系统的功能大致一样,只是系统的接口不同而已。这里只介绍Windows和Linux操作系统上文件系统的开发。Windows平台上称之为IFS (Installable File System),Linux系统则为VFS(Virtual Filesystem,也可解释为Virtual Filesystem Switch)。
此篇文章只着重于文件系统的注册过程,至于卷的挂载(Mount)操作将在以后的文章中再作讲述。
Windows平台上的文件系统注册过程:
Windows内核(I/O Manager)提供了一组内核支持程序(kernel support routine)来完成文件系统的注册和移除。
VOID
IoRegisterFileSystem(
IN OUT PDEVICE_OBJECT DeviceObject /* Deviect object representitive of file system*/
);
VOID
IoUnregisterFileSystem(
IN OUT PDEVICE_OBJECT DeviceObject /* Deviect object representitive of file system*/
);
注册调用很简单,以Ext2Fsd为例,Ext2Fsd针对光盘介质和磁盘介质的设备分别创建CdromdevObject及Diskdevobject,然后分别注册。
NTSTATUS
DriverEntry (
IN PDRIVER_OBJECT DriverObject,
IN PUNICODE_STRING RegistryPath
)
{
PDEVICE_OBJECT DiskdevObject = NULL;
PDEVICE_OBJECT CdromdevObject = NULL;
......
/* create Ext2Fsd cdrom fs deivce */
RtlInitUnicodeString(&DeviceName, CDROM_NAME);
Status = IoCreateDevice( DriverObject, 0, DeviceName,
FILE_DEVICE_CD_ROM_FILE_SYSTEM,
0, FALSE, &CdromdevObject );
if (!NT_SUCCESS(Status)) {
DEBUG(DL_ERR, ( "IoCreateDevice cdrom device object error.\n"));
goto errorout;
}
/* create Ext2Fsd disk fs deivce */
RtlInitUnicodeString(&DeviceName, DEVICE_NAME);
Status = IoCreateDevice( DriverObject, 0, &DeviceName,
FILE_DEVICE_DISK_FILE_SYSTEM,
0, FALSE, &DiskdevObject );
if (!NT_SUCCESS(Status)) {
DEBUG(DL_ERR, ( "IoCreateDevice disk deviceobject error.\n"));
goto errorout;
}
......
/* register file system devices for disk and cdrom */
IoRegisterFileSystem(DiskdevObject);
ObReferenceObject(DiskdevObject);
IoRegisterFileSystem(CdromdevObject);
ObReferenceObject(CdromdevObject);
......
return status;
}
你或许会迷惑为什么Ext2Fsd会注册两次(针对cdrom设备和disk设备)?下面我们就来探讨IoRegisterFileSystem的内部实现。
Windows I/O Manager 管理着4个队列:每个队列都管理不同类别的所有文件系统:
以Windows 7 X64为例:
0: kd> x nt!iop*FileSystemQueueHead
fffff800`04076200 nt!IopCdRomFileSystemQueueHead = <no type information>
fffff800`04076210 nt!IopDiskFileSystemQueueHead = <no type information>
fffff800`040761e0 nt!IopTapeFileSystemQueueHead = <no type information>
fffff800`040761f0 nt!IopNetworkFileSystemQueueHead = <no type information>
通过名字就能看出每个队列所管理的文件系统的类型,所以当I/O Manager新发现某一类型存储设备时,就只从相应的队列中依次调用队列上的文件系统驱动来识别存储设备上的文件卷,而不是调用注册于系统中的所有文件系统。
Ext2Fsd除了支持磁盘介质设备外,还可以支持光盘设备,所以要针对不同的储存设备类型分别注册。
在Windbg中可以很方便地罗列出所有挂载的文件系统,以IopDiskFileSystemQueueHead为例:
0: kd> !list "-t nt!_LIST_ENTRY.Flink -e -x \"dd @$extret l4; !devobj @$extret-0x50\" poi(nt!IopDiskFileSystemQueueHead)"
dd @$extret l4; !devobj @$extret-0x50
fffffa80`04839450 044e3d50 fffffa80 04076210 fffff800
Device object (fffffa8004839400) is for:
Ext2Fsd \FileSystem\Ext2Fsd DriverObject fffffa800482d210
Current Irp 00000000 RefCount 1 Type 00000008 Flags 00000040
Dacl fffff9a100324650 DevExt 00000000 DevObjExt fffffa8004839550
ExtensionFlags (0x00000800)
Unknown flags 0x00000800
AttachedDevice (Upper) fffffa8004843040 \FileSystem\FltMgr
Device queue is not busy.
dd @$extret l4; !devobj @$extret-0x50
fffffa80`044e3d50 044e22b0 fffffa80 04839450 fffffa80
Device object (fffffa80044e3d00) is for:
ExFatRecognizer \FileSystem\Fs_Rec DriverObject fffffa80044e2060
Current Irp 00000000 RefCount 1 Type 00000008 Flags 00000040
Dacl fffff9a100324650 DevExt fffffa80044e3e50 DevObjExt fffffa80044e3e60
ExtensionFlags (0x00000800)
Unknown flags 0x00000800
Device queue is not busy.
dd @$extret l4; !devobj @$extret-0x50
fffffa80`044e22b0 044e24e0 fffffa80 044e3d50 fffffa80
Device object (fffffa80044e2260) is for:
FatDiskRecognizer \FileSystem\Fs_Rec DriverObject fffffa80044e2060
Current Irp 00000000 RefCount 1 Type 00000008 Flags 00000040
Dacl fffff9a100324650 DevExt fffffa80044e23b0 DevObjExt fffffa80044e23c0
ExtensionFlags (0x00000800)
Unknown flags 0x00000800
Device queue is not busy.
dd @$extret l4; !devobj @$extret-0x50
fffffa80`044e24e0 0448d7f0 fffffa80 044e22b0 fffffa80
Device object (fffffa80044e2490) is for:
UdfsDiskRecognizer \FileSystem\Fs_Rec DriverObject fffffa80044e2060
Current Irp 00000000 RefCount 1 Type 00000008 Flags 00000040
Dacl fffff9a100324650 DevExt fffffa80044e25e0 DevObjExt fffffa80044e25f0
ExtensionFlags (0x00000800)
Unknown flags 0x00000800
Device queue is not busy.
dd @$extret l4; !devobj @$extret-0x50
fffffa80`0448d7f0 036660b0 fffffa80 044e24e0 fffffa80
Device object (fffffa800448d7a0) is for:
Ntfs \FileSystem\Ntfs DriverObject fffffa800448d9c0
Current Irp 00000000 RefCount 1 Type 00000008 Flags 00000040
Dacl fffff9a100324650 DevExt 00000000 DevObjExt fffffa800448d8f0
ExtensionFlags (0x00000800)
Unknown flags 0x00000800
AttachedDevice (Upper) fffffa80044d2040 \FileSystem\FltMgr
Device queue is not busy.
dd @$extret l4; !devobj @$extret-0x50
fffffa80`036660b0 04076210 fffff800 0448d7f0 fffffa80
Device object (fffffa8003666060) is for:
RawDisk \FileSystem\RAW DriverObject fffffa8003667770
Current Irp 00000000 RefCount 1 Type 00000008 Flags 00000050
Dacl fffff9a100324650 DevExt 00000000 DevObjExt fffffa80036661b0
ExtensionFlags (0x00000800)
Unknown flags 0x00000800
AttachedDevice (Upper) fffffa800371b860 \FileSystem\FltMgr
Device queue is not busy.
可以看出,最晚注册的文件系统Ext2Fsd则被放在了双向链表的链首位置,即第一个;最早注册的Ntfs则是在链尾。不过有种例外,如果设备被指定DO_LOW_PRIORITY_FILESYSTEM标志,IoRegisterFileSystem会将此文件系统挂载至链表尾部。
在Ntfs及Ext2Fsd之间,还有Windows系统本身提供的3个File system recognizer,均在模块Fs_Rec(即 fs_rec.sys)中实现。Recognizer是文件系统驱动中比较特殊的一类,它的功能很简单:如果发现有它能识别的卷设备,它则会加载相应的文件系统驱动来挂载此卷并卸载自己。Recognizer之所以存在的唯一目的就是节省内存资源。Recognizer驱动相比其文件系统驱动,代码量要小得多,比如,集多个Recognizer与一身的fs_rec.sys模块只有16K,而exfat及fastfat驱动在100K-200K之间,Ntfs更有1.5M之大。如果直接加载文件系统驱动,而实际系统中又没有此文件系统所管理的储存设备的话,那驱动程序所占用的内存就完全是资源的浪废。
下面做个实验,插入一个FAT32格式的SD卡后,再来观察一下IopDiskFileSystemQueueHead链的变化:
0: kd> !list "-t nt!_LIST_ENTRY.Flink -e -x \"dd @$extret l4; !devobj @$extret-0x50\" poi(nt!IopDiskFileSystemQueueHead)"
dd @$extret l4; !devobj @$extret-0x50
fffffa80`048428d0 04839450 fffffa80 04076210 fffff800
Device object (fffffa8004842880) is for:
Fat \FileSystem\fastfat DriverObject fffffa8005809e70
Current Irp 00000000 RefCount 1 Type 00000008 Flags 00000040
Dacl fffff9a100324650 DevExt 00000000 DevObjExt fffffa80048429d0
ExtensionFlags (0x00000800)
Unknown flags 0x00000800
AttachedDevice (Upper) fffffa80047a6870 \FileSystem\FltMgr
Device queue is not busy.
dd @$extret l4; !devobj @$extret-0x50
fffffa80`04839450 044e3d50 fffffa80 048428d0 fffffa80
Device object (fffffa8004839400) is for:
Ext2Fsd \FileSystem\Ext2Fsd DriverObject fffffa800482d210
Current Irp 00000000 RefCount 1 Type 00000008 Flags 00000040
Dacl fffff9a100324650 DevExt 00000000 DevObjExt fffffa8004839550
ExtensionFlags (0x00000800)
Unknown flags 0x00000800
AttachedDevice (Upper) fffffa8004843040 \FileSystem\FltMgr
Device queue is not busy.
dd @$extret l4; !devobj @$extret-0x50
fffffa80`044e3d50 044e24e0 fffffa80 04839450 fffffa80
Device object (fffffa80044e3d00) is for:
ExFatRecognizer \FileSystem\Fs_Rec DriverObject fffffa80044e2060
Current Irp 00000000 RefCount 1 Type 00000008 Flags 00000040
Dacl fffff9a100324650 DevExt fffffa80044e3e50 DevObjExt fffffa80044e3e60
ExtensionFlags (0x00000800)
Unknown flags 0x00000800
Device queue is not busy.
dd @$extret l4; !devobj @$extret-0x50
fffffa80`044e24e0 0448d7f0 fffffa80 044e3d50 fffffa80
Device object (fffffa80044e2490) is for:
UdfsDiskRecognizer \FileSystem\Fs_Rec DriverObject fffffa80044e2060
Current Irp 00000000 RefCount 1 Type 00000008 Flags 00000040
Dacl fffff9a100324650 DevExt fffffa80044e25e0 DevObjExt fffffa80044e25f0
ExtensionFlags (0x00000800)
Unknown flags 0x00000800
Device queue is not busy.
dd @$extret l4; !devobj @$extret-0x50
fffffa80`0448d7f0 036660b0 fffffa80 044e24e0 fffffa80
Device object (fffffa800448d7a0) is for:
Ntfs \FileSystem\Ntfs DriverObject fffffa800448d9c0
Current Irp 00000000 RefCount 1 Type 00000008 Flags 00000040
Dacl fffff9a100324650 DevExt 00000000 DevObjExt fffffa800448d8f0
ExtensionFlags (0x00000800)
Unknown flags 0x00000800
AttachedDevice (Upper) fffffa80044d2040 \FileSystem\FltMgr
Device queue is not busy.
dd @$extret l4; !devobj @$extret-0x50
fffffa80`036660b0 04076210 fffff800 0448d7f0 fffffa80
Device object (fffffa8003666060) is for:
RawDisk \FileSystem\RAW DriverObject fffffa8003667770
Current Irp 00000000 RefCount 1 Type 00000008 Flags 00000050
Dacl fffff9a100324650 DevExt 00000000 DevObjExt fffffa80036661b0
ExtensionFlags (0x00000800)
Unknown flags 0x00000800
AttachedDevice (Upper) fffffa800371b860 \FileSystem\FltMgr
Device queue is not busy.
FatDiskRecognizer发现FAT32格式的文件卷后会加载fastfat驱动,然后将自己卸载:
新增加的Fat设备:
Fat \FileSystem\fastfat DriverObject fffffa8005809e70
消失的FatDiskRecognizer:
FatDiskRecognizer \FileSystem\Fs_Rec DriverObject fffffa80044e2060
Windows I/O Manager还提供文件系统注册事件的通知,这部分就是在IoRegisterFileSystem函数中实现的。IoRegisterFileSystem在将文件系统的Device Object挂入相应的队列中以后,还会调用所有的NotificationRoutine以通知其注册者。(注册者,即对此事件感兴趣的驱动。一般来说,对文件系统注册感兴趣的只有file system filter driver。)Windows 内核提供了3个函数用于文件系统注册事件的通知:
NTSTATUS
IoRegisterFsRegistrationChange(
IN PDRIVER_OBJECT DriverObject,
IN PDRIVER_FS_NOTIFICATION DriverNotificationRoutine
);
NTSTATUS
IoRegisterFsRegistrationChangeEx(
IN PDRIVER_OBJECT DriverObject,
IN PDRIVER_FS_NOTIFICATION DriverNotificationRoutine
); /* available from Win2k SP4 and its successor OS */
VOID
IoUnregisterFsRegistrationChange(
IN PDRIVER_OBJECT DriverObject,
IN PDRIVER_FS_NOTIFICATION DriverNotificationRoutine
);
关于这三个函数的使用请参阅DDK文档,在此不做熬述。
Linux平台上的文件系统注册过程:
Linux内核(VFS:Virtual Filesystem)提供了两个支持例程分别进行文件系统的注册和移除:
int register_filesystem(struct file_system_type * fs)
int unregister_filesystem(struct file_system_type * fs)
struct file_system_type结构就代表一个文件系统驱动,此结构由文件系统驱动程序各自定义,其结构体如下:
struct file_system_type {
const char *name;
int fs_flags;
int (*get_sb) (struct file_system_type *, int,
const char *, void *, struct vfsmount *);
void (*kill_sb) (struct super_block *);
struct module *owner;
struct file_system_type * next;
struct list_head fs_supers;
struct lock_class_key s_lock_key;
struct lock_class_key s_umount_key;
struct lock_class_key i_lock_key;
struct lock_class_key i_mutex_key;
struct lock_class_key i_mutex_dir_key;
struct lock_class_key i_alloc_sem_key;
};
结构成员说明:
- name: 文件系统名称:如ext3,ext4,nfs,nfs4,lustre等
- fs_flags: 文件系统相关标志位,目前有如下取值:
#define FS_REQUIRES_DEV 1 /* 此文件系统适用于卷设备,如硬盘分区等
不适用于网络文件系统 */
#define FS_BINARY_MOUNTDATA 2 /* 此文件系统有单独的mount tool, 参数的传递有
自己的格式,此标志目前用于coda, FUSE, nfs,
smbfs, ncpfs */
#define FS_HAS_SUBTYPE 4 /* 仅 FUSE使用,支持多种user mode文件系统 */
#define FS_REVAL_DOT 16384 /* 仅NFS使用,"."及".."会过期或失效 */
#define FS_RENAME_DOES_D_MOVE 32768 /* 仅NFS使用,文件系统在.rename操作
已处理 d_move() 情形 */ - get_sb: 挂载新卷时的回调函数,由VFS调用,多数情况是用户进行了mount(2)操作
- kill_sb: 卸载已挂载卷时的回调函数,一般由用户进行umount(2)操作而触发
- owner: 文件系统所在模块(module)
- next: 供VFS使用,组成文件系统驱动的单向列表
- fs_supers: 供VFS使用,双向链表链首,用以管理此文件系统所识别的所有的文件卷的
超级块( super_block) - s_lock_key … i_alloc_sem_key: 用于lockdep检查
看一下kernel中ext4的相关代码:
static struct file_system_type ext4_fs_type = {
.owner = THIS_MODULE,
.name = "ext4",
.get_sb = ext4_get_sb,
.kill_sb = kill_block_super,
.fs_flags = FS_REQUIRES_DEV,
};
/* 如果ext3模块没被定义,ext4文件系统将默认接管ext3文件卷 */
#if !defined(CONTIG_EXT3_FS) && !defined(CONFIG_EXT3_FS_MODULE) && defined(CONFIG_EXT4_USE_FOR_EXT23)
static struct file_system_type ext3_fs_type = {
.owner = THIS_MODULE,
.name = "ext3",
.get_sb = ext4_get_sb,
.kill_sb = kill_block_super,
.fs_flags = FS_REQUIRES_DEV,
};
static inline void register_as_ext3(void)
{
int err = register_filesystem(&ext3_fs_type);
if (err)
printk(KERN_WARNING
"EXT4-fs: Unable to register as ext3 (%d)\n", err);
}
static inline void unregister_as_ext3(void)
{
unregister_filesystem(&ext3_fs_type);
}
MODULE_ALIAS("ext3");
#endif
ext4模块的初始化函数(init_ext4_fs)会调用register_filesystem来注册文件系统为ext4.
static int __init init_ext4_fs(void)
{
int err;
......
register_as_ext3();
err = register_filesystem(&ext4_fs_type);
if (err)
goto out;
......
return err;
}
然后再研究一下register_filesystem及unregister_filesystem的代码($kernel/vfs/filesystems.c):
static struct file_system_type *file_systems;
static DEFINE_RWLOCK(file_systems_lock);
/**
* register_filesystem - register a new filesystem
* @fs: the file system structure
*
* Adds the file system passed to the list of file systems the kernel
* is aware of for mount and other syscalls. Returns 0 on success,
* or a negative errno code on an error.
*
* The &struct file_system_type that is passed is linked into the kernel
* structures and must not be freed until the file system has been
* unregistered.
*/
int register_filesystem(struct file_system_type * fs)
{
int res = 0;
struct file_system_type ** p;
BUG_ON(strchr(fs->name, '.'));
if (fs->next)
return -EBUSY;
INIT_LIST_HEAD(&fs->fs_supers);
write_lock(&file_systems_lock);
p = find_filesystem(fs->name, strlen(fs->name));
if (*p)
res = -EBUSY;
else
*p = fs;
write_unlock(&file_systems_lock);
return res;
}
EXPORT_SYMBOL(register_filesystem);
static struct file_system_type **find_filesystem(const char *name, unsigned len)
{
struct file_system_type **p;
for (p=&file_systems; *p; p=&(*p)->next)
if (strlen((*p)->name) == len &&
strncmp((*p)->name, name, len) == 0)
break;
return p;
}
/**
* unregister_filesystem - unregister a file system
* @fs: filesystem to unregister
*
* Remove a file system that was previously successfully registered
* with the kernel. An error is returned if the file system is not found.
* Zero is returned on a success.
*
* Once this function has returned the &struct file_system_type structure
* may be freed or reused.
*/
int unregister_filesystem(struct file_system_type * fs)
{
struct file_system_type ** tmp;
write_lock(&file_systems_lock);
tmp = &file_systems;
while (*tmp) {
if (fs == *tmp) {
*tmp = fs->next;
fs->next = NULL;
write_unlock(&file_systems_lock);
return 0;
}
tmp = &(*tmp)->next;
}
write_unlock(&file_systems_lock);
return -EINVAL;
}
通过代码可以得知,与Windows对文件系统驱动的管理不同的是,Linux只维护了一个全局的单向列表来管理所有的文件系统驱动,而且新文件系统默认被链接至链表末尾。
Linux不允许同名的文件系统再次被注册,register_filesystem()每次都会调用find_filesystem()以检查同名的文件系统是不是已存在。Windows平台上因为是以DeviceObject为主体进行注册的,并不存在此项的检测,所以Windows允许多个同一卷类型的文件系统驱动的加载,但只有最晚加载的驱动才有效(默认无DO_LOW_PRIORITY_FILESYSTEM标志)。