linux文件系统驱动
(2020-01-31 14:27:14)
标签:
itkernellinux |
分类: programming |
Virtual Filesystem (VFS)
The general file system model
superbloc
,
inode
, file
,
and
dentry;这些实体是文件系统的元数据。模型实体使用一些VFS或内核的子系统交互,
dentry
cache, inode cache, buffer cache.
每个实体作为一个对象:有关联的数据结构和指针指向函数表,每个组件的特定行为引导通过替换所关联的函数。superblock
Localization:
struct
super_block列表,方法在
struct
super_operations
inode
dentry实体,这样inode可以有多个名字,
Localization:
inode有磁盘对应物,磁盘上的inode通常组团在一个特定磁盘区域(
inode域),和其他数据域分离开来,在一些文件系统,
inode的等价物在文件系统分布(FAT),作为VFS的实体,一个inode表示为struct
inode,以及对应操作在
struct
inode_operations
.razvan@valhalla:~/school/so2/wiki$ ls -i
1277956 lab10.wiki
1277962 lab9.wikibak
1277964 replace_lxr.sh
1277954 lab9.wiki
1277958 link.txt
1277955 homework.wiki
file
Localization:
struct
file和VFS实体关联,
struct
file_operations表示了和文件实体关联的操作函数。
dentry
路径/
bin/vi,可以创建3个
dentry对象,包括[/
,
bin
, and
vi.】
*dentry在磁盘上有对应物,但这个对应不是直接的因为每个文件系统都以特定方法保存dentry实体。
*在VFS,dentry实体表述为结构 struct dentry,所关联的操作定义在
struct dentry_operations
Register and unregister filesystems
struct
file_system_type
---------------------------------------
#include
struct file_system_type {
const char *name;
fs_flags; int
struct dentry *(*mount) (struct file_system_type *, int, const char *, void *);
(*kill_sb) (struct super_block *); void
module *owner; struct
file_system_type * next; struct
hlist_head fs_supers; struct
struct lock_class_key s_lock_key;
lock_class_key s_umount_key; struct
//...
};
--name
标识文件系统,在mount -t 中作为参数,如 yaffs2,ubifs
--owner
,文件系统设计为模块时,是THIS_MODULE,实现在内核时,为NULL
--
mount, 在加载文件系统时从磁盘读取
superblock 到内存, 此函数对每个文件系统是唯一的。
--
kill_sb
,在卸载文件系统是释放
super-block
--fs_flags
,确定了文件系统加载时的flags,例子,FS_REQUIRES_DEV
表明该文件系统需要物理磁盘。
--
,列举了文件系统的超级块superblocks,因为同一个文件系统可以mount多次,每次都有一个独立的superblockfs_supers
----------------------------------------
注册文件系统到内核通常在模块初始化函数执行,针对注册,程序必须执行:
*.初始化
struct
file_system_type以参数:文件系统名,flags,函数实现读取文件系统超级块,对识别当前
模块的结构的引用。
*.调用
register_filesystem()
在卸载模块时,应该解注册调用unregister_filesystem()
去除文件系统。
ramfs是一个注册虚拟文件系统的例子
==================================
static struct file_system_type ramfs_fs_type = {
.name = "ramfs",
.mount = ramfs_mount,
kill_sb = ramfs_kill_sb, .
.fs_flags = FS_USERNS_MOUNT,
};
static int __init init_ramfs_fs(void)
{
if (test_and_set_bit(0, &once))
return 0;
register_filesystem(&ramfs_fs_type); return
}
==================================
Functions mount, kill_sb
file_system_type的mount函数,这个函数调用一系列初始化动作,并返回结构
struct
dentry,其含有文件系统加载点目录,通常,mount()是个简单的函数会调用下列函数:
*
mount_bdev()
,加载块设备文件系统。
*
mount_single()
,加载一个文件系统在所有的mount操作中共享一个实例。
*
mount_nodev()
,加载一个不在物理设备上的文件系统
*
mount_pseudo()
,
伪文件系统的helper函数,
(sockfs
,
pipefs
,一般不能被加载的文件系统)
这些函数作为参数指针给函数
fill_super(),在超级块初始化后被驱动调用来结束初始化,这样的函数在
fill_super部分可以看到。
在卸载文件系统时,内核调用
kill_sb()
,执行清除操作并执行如下函数之一:
*
kill_block_super()
,卸载块设备上的文件系统。
*
kill_anon_super()
,卸载虚拟文件系统
*
kill_litter_super()
,卸载不在磁盘上的文件系统(信息保存在内存)
没有磁盘支持的文件系统函数例子如ramfs的ramfs_mount()
。
==================
struct dentry *ramfs_mount(
file_system_type *fs_type, struct
int flags, const char *dev_name, void *data)
{
mount_nodev(fs_type, flags, data, ramfs_fill_super); return
}
====================
磁盘文件系统的例子如minix文件系统的minix_mount()
函数。
-------------------------------
struct dentry *minix_mount(
file_system_type *fs_type, struct
int flags,
char *dev_name, const
void *data)
{
return mount_bdev(fs_type, flags, dev_name, data, minix_fill_super);
}
------------------------------
Superblock in VFS
superblock存在于作为物理实体(在磁盘上),或者作为VFS实体(在struct super_block结构中
),superblock包含元信息用于从磁盘读写元数据,(inodes, directory entries),一个superblock(隐含为一个struct super_block结构
),含所使用的块设备信息,inode列表,指向文件系统根目录的inode的指针,指向超级块操作的指针。
The struct super_block
structure
struct
super_block结构部分定义如下:
----------------------------------------
struct super_block {
//...
s_dev; dev_t
unsigned char s_blocksize_bits;
unsigned long s_blocksize;
unsigned char s_dirt;
loff_t s_maxbytes;
struct file_system_type *s_type;
struct super_operations *s_op;
//...
unsigned long s_flags;
unsigned long s_magic;
dentry *s_root; struct
//...
char s_id[32];
void *s_fs_info;
};
----------------------------------------
superblock存储一个文件系统实例的全局信息,包括:
*.
the physical device on which it
resides
*.
block
size
*.
the maximum
size of a file
*.
file system
type
*.
the operations it
supports
*.
magic number
(identifies the file system)
*.
the root
directory
dentry
Superblock operations
struct
super_operations中描述。
=====================================
struct super_operations {
//...
int (*write_inode) (struct inode *, struct writeback_control *wbc);
struct inode *(*alloc_inode)(struct super_block *sb);
(*destroy_inode)(struct inode *); void
(*put_super) (struct super_block *, int *, char *); void
//...
};
====================================
结构的各个域是函数指针:
^
. write_inode
,
alloc_inode
, destroy_inode:
写,分配,释放inode关联的资源
^
.
put_super,
superblock在umount时被释放时被调用,在这个函数,任何从文件系统来的私有数据关联的
资源被释放
^
.
remount_fs,在内核检测到
remount操作时被调用(mount with
MS_REMOUNTM
),
^
.
statfs,在
statfs
系统调用完成时被调用,(try
stat –f
or
df
),这个调用以参数结构kstatfs,
ext4_statfs()
The fill_super()
function
fill_super()用来终止
superblock初始化,这个初始化包括填充struct
super_block,和初始化根目录inode.
一个实现的例子是
ramfs_fill_super()函数,在初始化
superblock结构的其他域后被调用。#include
#define RAMFS_MAGIC 0x858458f6
static const struct super_operations ramfs_ops = {
.statfs = simple_statfs,
drop_inode = generic_delete_inode, .
show_options = ramfs_show_options, .
};
static int ramfs_fill_super(struct super_block *sb, void *data, int silent)
{
struct ramfs_fs_info *fsi;
struct inode *inode;
int err;
(sb, data); save_mount_options
fsi = kzalloc(sizeof(struct ramfs_fs_info), GFP_KERNEL);
sb->s_fs_info = fsi; if (!fsi) return -ENOMEM;
= ramfs_parse_options(data, &fsi->mount_opts); err
if (err)
err; return
->s_maxbytes = MAX_LFS_FILESIZE; sb
sb->s_blocksize = PAGE_SIZE;
->s_blocksize_bits = PAGE_SHIFT; sb
->s_magic = RAMFS_MAGIC; sb
sb->s_op = &ramfs_ops;
sb->s_time_gran = 1;。。
inode = ramfs_get_inode(sb, NULL, S_IFDIR | fsi->mount_opts.mode, 0);
->s_root = d_make_root(inode); sb
if (!sb->s_root)
return -ENOMEM;
return 0;
}
generic_drop_inode()
and
simple_statfs()函数就是这样的函数,并能够用来实现驱动,如果它们的功能是足够的。
ramfs_fill_super()函数填充
superblock的某些域,然后读取root
inode并分配root dentry,读取root
inode由函数ramfs_get_inode()
完成,使用new_inode()分配新的inode并初始化它,为了释放inode,
iput()被调用,
d_make_root()用来分配
root dentry。minix_fill_super()函数,在磁盘操作系统的功能是和在虚拟文集那系统类似,除了它使用了
buffer
cache。minix文件系统使用struct
minix_sb_info结构保存私有数据,这个函数的一大部分工作是处理这些私有数据的初始化,私有数据用kzalloc分配,并存储在
superblock
结构的s_fs_info域。
VFS函数的参数一般包括:
superblock,
inode或者dentry其包含有指针指向
superblock,这样这些私有数据可以很容易被访问。
Buffer cache
struct
buffer_head,这个结构最重要的域包括:
#
b_data
,
pointer to a memory area where the data was read from or where the
data must be written to
#
b_size
, buffer
size
#
b_bdev
, the
block device
#
b_blocknr
, the
number of block on the device that has been loaded or needs to be
saved on the disk
#
b_state
, the
status of the buffer
和这些结构共同工作的函数是:
__bread()
: 根据buffer_head结构中的number和size读取块数据否则返回NULL。
sb_bread()
: 和上面一样,但读取块的大小size来自于superblockmark_buffer_dirty()
: 标记buffer为dirty (设置BH_Dirty
位); buffer将在后面的时间写入磁盘 (在每一次bdflush
内核线程运行并写buffer到磁盘);brelse()
: 释放buffer使用的内存, 在它的内容被写入磁盘后。map_bh()
: 关联buffer-head到对应的扇区。
Functions and useful macros
Inode
struct
dentry结构获得
inode指向磁盘的文件,为了指向一个打开的文件,(关联进程的文件描述符)
struct
file结构用于这个操作,一个inode可以和任意多个file结构关联,(多个进程可以打开一个文件,一个进程也可以多次打开同一个文件),
inode即存在与VFS(在内存),也可以作为磁盘实体(
for
UNIX, HFS, NTFS,
etc.),VFS中的inode表述为
struct
inode
,像其他的VFS结构一样,
struct
inode是一个通用结构覆盖了所有支持的文件类型的选项,即使那些没有关联磁盘实体的文件系统类型(FAT)。
The inode structure
i_private来引用,惯例上保持这些特定信息的结构是
_inode_info
,fsname
是文件系统名,例如,minix
and ext4文件系统存储特定信息在struct
minix_inode_info
, or struct
ext4_inode_info
.
struct
inode中的一些重要域是:
i_sb
: inode所属的文件系统的超级块结构i_rdev
: the device on which this file system is mountedi_ino
: the number of the inode (uniquely identifies the inode within the file system)i_blkbits
: number of bits used for the block size == log2(block size)i_mode
,i_uid
,i_gid
: access rights, uid, gidi_size
: file/directory/etc. size in bytesi_mtime
,i_atime
,i_ctime
: change, access, and creation timei_nlink
: the number of names entries (dentries) that use this inode; for file systems without links (either hard or symbolic) this is always set to 1i_blocks
: the number of blocks used by the file (all blocks, not just data); this is only used by the quota subsystemi_op
,i_fop
: pointers to operations structures:struct inode_operations
andstruct file_operations
;i_mapping->a_ops
contains a pointer tostruct address_space_operations
.i_count
: the inode counter indicating how many kernel components use it.
-
new_inode()
: creates a new inode, sets thei_nlink
field to 1 and initializesi_blkbits
,i_sb
andi_dev
; -
insert_inode_hash()
: adds the inode to the hash table of inodes; an interesting effect of this call is that the inode will be written to the disk if it is marked as dirty; -
mark_inode_dirty()
: marks the inode as dirty; at a later moment, it will be written on the disc; -
iget_locked()
: loads the inode with the given number from the disk, if it is not already loaded; -
unlock_new_inode()
: used in conjunction withiget_locked()
, releases the lock on the inode; -
iput()
: tells the kernel that the work on the inode is finished; if no one else uses it, it will be destroyed (after being written on the disk if it is maked as dirty); -
make_bad_inode()
: tells the kernel that the inode can not be used; It is generally used from the function that reads the inode when the inode could not be read from the disk, being invalid.
Inode operations
Getting an inode
struct
inode
in
VFS),直到内核2.6.24,定义了一个函数
read_inode,从
2.6.25版本开始,开发人员必须定义
_iget函数,这函数负责寻找
VFS
的inode或者创建新的inode并从磁盘读取信息填充该inode。
这个函数将调用
iget_locked()从VFS获取
inode结构,如果inode是新创建的,将从磁盘读取inode
(using sb_bread()
)并填充有用的信息。
struct inode *ubifs_iget(struct super_block *sb, unsigned long inum)
{
int err;
union ubifs_key key;
struct ubifs_ino_node *ino;
struct ubifs_info *c = sb->s_fs_info;
struct inode *inode;
struct ubifs_inode *ui;
dbg_gen("inode %lu", inum);
inode = iget_locked(sb, inum);
if (!inode)
return ERR_PTR(-ENOMEM);
if (!(inode->i_state & I_NEW))
return inode;
ui = ubifs_inode(inode);
ino = kmalloc(UBIFS_MAX_INO_NODE_SZ, GFP_NOFS);
if (!ino) {
err = -ENOMEM;
goto out;
}
...
}
ubifs_iget()函数调用
iget_locked()得到VFS
inode,如果inode经存在,函数返回,否则读取磁盘信息填充
VFS inode。
Superoperations
(
superblock所使用的struct
super_operations结构的组件)用于inode的操作,这些操作包括:
alloc_inode
: 分配一个inode. 通常, 这个函数分配一个struct _inode_info
结构和执行基本的VFS inode 初始化(调用inode_init_once()
); minix uses for allocation thekmem_cache_alloc()
function that interacts with the SLAB subsystem. For each allocation, the cache construction is called, which in the case of minix is theinit_once()
function. Alternatively,kmalloc()
can be used, in which case theinode_init_once()
function should be called. Thealloc_inode()
function will be called by thenew_inode()
andiget_locked()
functions.write_inode
: saves/updates the inode received as a parameter on disk; to update the inode, though inefficient, for beginners it is recommended to use the following sequence of operations:
- load the inode from the disk using the
sb_bread()
function;- modify the buffer according to the saved inode;
- mark the buffer as dirty using
mark_buffer_dirty()
; the kernel will then handle its writing on the disk;- an example is the
minix_write_inode()
function in theminix
file systemevict_inode
: removes any information about the inode with the number received in thei_ino
field from the disk and memory (both the inode on the disk and the associated data blocks). This involves performing the following operations:
- delete the inode from the disk;
- updates disk bitmaps (if any);
- delete the inode from the page cache by calling
truncate_inode_pages()
;- delete the inode from memory by calling
clear_inode()
;- an example is the
minix_evict_inode()
function from the minix file system.destroy_inode
releases the memory occupied by inode
inode_operations
struct
inode_operations结构描述,
Inodes有几种类型:file,
directory, special file (pipe, fifo), block device, character
device, link
etc。因此,inode所需要实现的操作针对每类inode是不同的,下面是文件类型inode和目录类型inode的详细操作。一个inode的操作用struct
inode结构的
i_op初始化和访问。
The file structure
file结构对应进程中打开的文件并只存在与内存中,和一个inode关联,是用户空间最常用的VFS实体,结构字段含有用户空间文件类似的信息,
(access
mode, file position,
etc.)。与之相关的操作由已知系统调用完成(read
,
write
, etc.).struct
file_operations结构描述,文件系统的文件操作用
struct
inode结构的
i_fop域初始化,当打开一个文件时,
VFS初始化struct
file结构的
f_op字段,用
inode->i_fop地址,这样随后的系统调用使用存储在
file->f_op的值。
Regular files inodes
i_op
and
i_fop域必须被设置,inode类型决定了它所需要实现的操作。
Regular files inode operations
ubifs_file_operations定义为file结构的操作。
-----------------------------------------------------------
};
#ifdef CONFIG_COMPAT
#endif
};
generic_file_llseek()
,
generic_file_mmap()
,
generic_file_read_iter()
and
generic_file_write_iter()在内核实现。
对简单的文件系统,只有
truncation操作(
truncate
system
call)需要实现,尽管起初这只是个专有操作,从3.14起,该操作嵌入在
setattr实现:如果大小和inode当前的size不同,
truncate操作必须被执行,参看
ubifs_setattr()的实现。
{
}
- freeing blocks of data on the disk that are now extra (if the new dimension is smaller than the old one) or allocating new blocks (for cases where the new dimension is larger)
- updating disk bit maps (if used);
- updating the inode;
- filling with zero the space that was left unused from the last block using the
block_truncate_page()
function.
minix_truncate函数
==============================
Address space operations
read
and write
.struct
address_space结构用于描述地址空间,与其关联的操作由
struct
address_space_operations描述,为了初始化地址空间操作,需要填充文件类型的inode的
inode->i_mapping->a_ops
。ubifs_file_address_operations
====================
const struct address_space_operations
ubifs_file_address_operations = {
.readpage
= ubifs_readpage,
.writepage
= ubifs_writepage,
.write_begin
= ubifs_write_begin,
.write_end
= ubifs_write_end,
.invalidatepage = ubifs_invalidatepage,
.set_page_dirty = ubifs_set_page_dirty,
#ifdef CONFIG_MIGRATION
.migratepage
= ubifs_migrate_page,
#endif
.releasepage
= ubifs_releasepage,
};
===================
大多数函数很容易实现,如下:
{
}
static int minix_readpage(struct file *file, struct page *page)
{
}
static void minix_write_failed(struct address_space *mapping, loff_t to)
{
}
static int minix_write_begin(struct file *file, struct address_space *mapping,
{
}
static sector_t minix_bmap(struct address_space *mapping, sector_t block)
{
}
Dentry structure