点击下载
本文文档

当前位置：首页 - 正文

UNIX File Management UNIX index node (inode)

来源：动视网责编：小OO 时间：2025-10-01 10:24:17

UNIX File Management UNIX index node (inode)

3UNIXindexnode(inode)•EachfileisrepresentedbyanInode•Inodecontainsallofafile’smetadata–Accessrights,owner,accountinginfo–(partial)blockindextableofafile•Eachinodehasauniquenumber(withinapartition)–Systemorientedname–Try‘ls–i’onUnix(Linux)•Directorie

推荐度：

点击下载本文 文档为doc格式

导读3UNIXindexnode(inode)•EachfileisrepresentedbyanInode•Inodecontainsallofafile’smetadata–Accessrights,owner,accountinginfo–(partial)blockindextableofafile•Eachinodehasauniquenumber(withinapartition)–Systemorientedname–Try‘ls–i’onUnix(Linux)•Directorie

3

UNIX index node (inode)

•Each file is represented by an Inode •Inode contains all of a file’s metadata

–Access rights, owner,accounting info –(partial) block index table of a file

•Each inode has a unique number (within a partition)

–System oriented name –Try ‘ls –i’on Unix (Linux)

•Directories map file names to inode numbers

–Map human-oriented to system-oriented names –Mapping can be many-to-one

•Hard links

4

Inode Contents

•Mode

–Type

•Regular file or directory

–Access mode

•rwxrwxrwx

•Uid

–User ID

•Gid

–Group ID

mode uid gid atime ctime mtime size block count reference count direct blocks

(10)single indirect double indirect triple indirect

5

Inode Contents

•atime

–Time of last access

•ctime

–Time when file was created

•mtime

–Time when file was last modified

mode uid gid atime ctime mtime size block count reference count direct blocks

(10)single indirect double indirect triple indirect

6

Inode Contents

•Size

–Size of the file in bytes

•Block count

–Number of disk blocks used by the file.

•Note that number of blocks can be much less than expected given the file size

–Files can be sparsely populated

• E.g. write(f,“hello”); lseek(f, 1000000); write(f, “world”);•Only needs to store the start an end of file, not all the empty blocks in between.

–Size = 1000005

–Blocks = 2 + overheads

mode uid gid atime ctime mtime size block count reference count direct blocks

(10)single indirect double indirect triple indirect

9

Inode Contents

•

Single Indirect Block

–Block number of a block containing block numbers

•In this case 8

mode uid gid atime ctime mtime size block count reference count

direct blocks (10)40,58,26,8,12,44,62,30,10,42

single indirect: 32

double indirect triple indirect

Disk

03

2SI 56

017

47

63

568

914282920386143

4612151013171114

1610

Single Indirection

•Requires two disk access to read

–One for the indirect block; one for the target block

•Max File Size

–In previous example

•10 direct + 8 indirect = 18 block file

–A more realistic example

•Assume 1Kbyte block size, 4 byte block numbers

•10 * 1K + 1K/4 * 1K = 266 Kbytes

•For large majority of files (< 266 K), only one or two accesses required to read any block in file.

11

Inode Contents

•

Double Indirect Block

–Block number of a block containing block numbers of blocks containing block numbers

•Triple Indirect

–Block number of a block containing block numbers of blocks containing block

numbers of blocks containing block numbers ☺

mode uid gid atime ctime mtime size block count reference count

direct blocks (10)40,58,26,8,12,44,62,30,10,42

single indirect: 32

double indirect triple indirect

12

Unix Inode Block Addressing

Scheme

1 write (to write modified block back)

15Worst Case Access Patterns with Unallocated Indirect Blocks

•Worst to write 1 byte

– 4 writes (3 indirect blocks; 1 data)

– 1 read, 4 writes (read-write 1 indirect, write 2; write 1 data) – 2 reads, 3 writes (read 1 indirect, read-write 1 indirect, write 1; write 1 data)

– 3 reads, 2 writes (read 2, read-write 1; write 1 data)

•Worst to read 1 byte

–If reading writes an zero-filled block on disk

•Worst case is same as write 1 byte

–If not, worst-case depends on how deep is the current indirect block tree.

16

Inode Summary

•The inode contains the on disk data associated with a file

–Contains mode, owner, and other bookkeeping

–Efficient random and sequential access via indexed allocation –Small files (the majority of files) require only a single access

–

Larger files require progressively more disk accesses for random access

•Sequential access is still efficient

–Can support really large files via increasing levels of indirection

17

Where/How are Inodes Stored

•System V Disk Layout (s5fs)

–Boot Block

•contain code to bootstrap the OS

–Super Block

•Contains attributes of the file system itself

–e.g. size, number of inodes, start block of inode array, start of data block area, free inode list, free data block list

–Inode Array –Data blocks

Boot Block Super Block

Inode Array

Data Blocks

18

Some problems with s5fs

•Inodes at start of disk; data blocks end

–Long seek times

•Must read inode before reading data blocks

•Only one superblock

–Corrupt the superblock and entire file system is lost

•Block allocation suboptimal

–Consecutive free block list created at FS format time

•Allocation and de-allocation eventually randomises the list resulting the random allocation

•Inodes allocated randomly

–Directory listing resulted in random inode access patterns

21

Layout of an Ext2 Partition

•Disk divided into one or more partitions

•Partition:

–Reserved boot block,

–Collection of equally sized block groups –All block groups have the same structure

Boot Block

Block Group

….

Block Group

n

22

Layout of a Block Group

•Replicated super block

–For e2fsck

•Group descriptors

•Bitmaps identify used inodes/blocks

•All block have the same number of data blocks •Advantages of this structure:

–Replication simplifies recovery

–Proximity of inode tables and data blocks (reduces seek time)

Super Block Group Descrip-tors Data

Block

Bitmap Inode Bitmap Inode

Table

Data blocks 1 blk

n blks 1 blk 1 blk m blks

k blks

23

Superblocks

•Size of the file system, block size and similar parameters

•Overall free inode and block counters

•Data indicating whether file system check is needed:

–Uncleanly unmounted –Inconsistency

–Certain number of mounts since last check –

Certain time expired since last check

•Replicated to provide redundancy to add recoverability

24

Group Descriptors

•Location of the bitmaps

•Counter for free blocks and inodes in this group

•Number of directories in the group

27

Ext2fs Directories

•Directories are files of a special type

–Consider it a file of special format, managed by the kernel, that uses most of the same machinery to implement it

•Inodes, etc…

•Directories translate names to inode numbers •Directory entries are of variable length •Entries can be deleted in place

–inode = 0

–Add to length of previous entry

–use null terminated strings for names

inode

rec_len

name_len

type

name…

28

Ext2fs Directories

•“f1”= inode 7•“file2”= inode 43•“f3”= inode 85

7122‘f’‘1’0 043165‘f’‘i’‘l’‘e’‘2’0 0 085122‘f’‘3’0 0

Inode No Rec Length Name Length

Name

29

Ext2fs Directories

•Note that inodes can have more than one name

–Called a Hard Link –Inode (file) 7 has three names

•“f1”= inode 7•“file2”= inode 7•“f3”= inode 7

7122‘f’‘1’0 0

7165‘f’‘i’‘l’‘e’‘2’0 0 07122‘f’‘3’0 0

Inode No Rec Length Name Length

Name

30

Inode Contents

•We can have many name for the same inode.•

When we delete a file by name, i.e. remove the directory entry (link), how does the file system know when to delete the underlying inode?

–Keep a reference count in the inode

•Adding a name (directory entry) increments the count

•Removing a name decrements the count •If the reference count == 0, then we have no names for the inode (it is unreachable), we can delete the inode (underlying file or directory)

mode uid gid atime ctime mtime size block count reference count

direct blocks (10)40,58,26,8,12,44,62,30,10,42

single indirect: 32

double indirect triple indirect

33

Kernel File-related Data Structures and Interfaces

•We have reviewed how files and directories are stored on disk

•We know the UNIX file system-call interface

–open, close, read, write, lseek,…..

•What is in between?

34

What do we need to keep track

of?

•File descriptors

–Each open file has a file descriptor

–Read/Write/lseek/…. use them to specify which file to operate on.

•File pointer

–Determines where in the file the next read or write is performed

•Mode

–Was the file opened read-only, etc….

35An Option?

•Use inode numbers as file descriptors and add a file pointer to the inode •Problems

–What happens when we concurrently open the same file twice?

•We should get two separate file descriptors and file pointers….

36

An Option?

•Single global open file array

–fd is an index into the array

–Entries contain file pointer and pointer to an inode

fp i-ptr

fd

inode

39

Issue

•Fork

–Fork defines that the child shares the file pointer with the parent

•Dup2

–Also defines the file

descriptors share the file pointer

•With per-process table, we

can only have independent file pointers

–Even when accessing the same file

P1 fd

inode

fp i-ptr

P2 fd inode

40

Per-Process fd table with global

open file table

•

Per-process file descriptor array

–Contains pointers to open file table entry

•

Open file table array

–Contain entries with a fp and pointer to an inode.

•

Provides

–Shared file pointers if required

–Independent file pointers if required

•Example:

–All three fds refer to the same file, two share a file

pointer, one has an

independent file pointer P1 fd

inode

f-ptr

f-ptr P2 fd

inode

fp i-ptr fp i-ptr

Per-process File Descriptor

Tables

Open File Table

41Per-Process fd table with global

open file table

•Used by Linux and most other Unix operating systems

P1 fd inode

f-ptr

f-ptr P2 fd

inode

fp i-ptr fp i-ptr

Per-process File Descriptor

Tables

Open File Table

42

Older Systems only had a single

file system

•They had file system specific open, close, read, write, …calls.

•The open file table pointed to an in-memory representation of the inode

–inode format was specific to the file system used (s5fs, Berkley FFS, etc)

•However, modern systems need to support many file system types

–ISO9660 (CDROM), MSDOS (floppy), ext2fs, tmpfs

44

VFS architecture

45

Virtual File System (VFS)

•Provides single system call interface for many file systems

– E.g., UFS, Ext2, XFS, DOS, ISO9660,…

•Transparent handling of network file systems

– E.g., NFS, AFS, CODA

•File-based interface to arbitrary device drivers (/dev )•File-based interface to kernel data structures (/proc )•Provides an indirection layer for system calls

–File operation table set up at file open time

–Points to actual handling code for particular type –Further file operations redirected to those functions

46

The file system independent code

deals with vfs and vnodes

P1 fd

vnode f-ptr

f-ptr

f-ptr P2 fd fp v-ptr fp v-ptr

Per-process File Descriptor

Tables

Open File Table

inode

File system dependent code 47

VFS Interface

•

Reference

–S.R. Kleiman., "Vnodes: An Architecture for Multiple File System Types in Sun Unix," USENIX Association: Summer Conference Proceedings, Atlanta, 1986

–

Linux and OS/161 differ slightly, but the principles are the same

•

Two major data types

–

vfs

•Represents all file system types

•

Contains pointers to functions to manipulate each file system as a whole (e.g. mount, unmount)

–

Form a standard interface to the file system

–vnode

•Represents a file (inode) in the underlying filesystem •Points to the real inode

•

Contains pointers to functions to manipulate files/inodes (e.g. open, close, read, write,…)

48

A look at OS/161’s VFS

The OS161’s file system type

Represents interface to a mounted filesystem

struct fs {

int (*fs_sync)(struct fs *);

const char *(*fs_getvolname)(struct fs *);struct vnode *(*fs_getroot)(struct fs *);int (*fs_unmount)(struct fs *);

void *fs_data;};

Force the filesystem to flush its content to disk

Retrieve the volume name Retrieve the vnode associates with the root of the filesystem

Unmount the filesystem Note: mount called via function ptr passed to vfs_mount

Private file system

specific date

51

Vnode Ops

struct vnode_ops {

unsigned long vop_magic;

/* should always be VOP_MAGIC */

int (*vop_open)(struct vnode *object, int flags_from_open);int (*vop_close)(struct vnode *object);int (*vop_reclaim)(struct vnode *vnode);int (*vop_read)(struct vnode *file, struct uio *uio);

int (*vop_readlink)(struct vnode *link, struct uio *uio);int (*vop_getdirentry)(struct vnode *dir, struct uio *uio);int (*vop_write)(struct vnode *file, struct uio *uio);

int (*vop_ioctl)(struct vnode *object, int op, userptr_t data);int (*vop_stat)(struct vnode *object, struct stat *statbuf);int (*vop_gettype)(struct vnode *object, int *result);int (*vop_tryseek)(struct vnode *object, off_t pos);int (*vop_fsync)(struct vnode *object);

int (*vop_mmap)(struct vnode *file /* add stuff */);int (*vop_truncate)(struct vnode *file, off_t len);

int

(*vop_namefile)(struct vnode *file, struct uio *uio);

52

Vnode Ops

int (*vop_creat)(struct vnode *dir,

const char *name, int excl,struct vnode **result);

int (*vop_symlink)(struct vnode *dir,

const char *contents, const char *name);

int (*vop_mkdir)(struct vnode *parentdir,

const char *name);

int (*vop_link)(struct vnode *dir,

const char *name, struct vnode *file);

int (*vop_remove)(struct vnode *dir,

const char *name);

int (*vop_rmdir)(struct vnode *dir,

const char *name);int (*vop_rename)(struct vnode *vn1, const char *name1,

struct vnode *vn2, const char *name2);int (*vop_lookup)(struct vnode *dir,

char *pathname, struct vnode **result);

int (*vop_lookparent)(struct vnode *dir,

char *pathname, struct vnode **result,char *buf, size_t len);

};

53

Vnode Ops

•Note that most operation are on vnodes. How do we operate on file names?

–Higher level API on names that uses the internal VOP_* functions

int vfs_open(char *path, int openflags, struct vnode **ret);void vfs_close(struct vnode *vn);

int vfs_readlink(char *path, struct uio *data);int vfs_symlink(const char *contents, char *path);int vfs_mkdir(char *path);

int vfs_link(char *oldpath, char *newpath);int vfs_remove(char *path);int vfs_rmdir(char *path);

int vfs_rename(char *oldpath, char *newpath);int vfs_chdir(char *path);

int vfs_getcwd(struct uio *buf);

54

Example: OS/161 emufs vnode

ops

/*

* Function table for emufs files.*/

static const struct vnode_ops

emufs_fileops = {

VOP_MAGIC,/* mark this a valid vnode ops table */

emufs_open,emufs_close,emufs_reclaim,

emufs_read,

NOTDIR, /* readlink */NOTDIR, /* getdirentry */emufs_write,emufs_ioctl,emufs_stat,

emufs_file_gettype,emufs_tryseek,emufs_fsync,

UNIMP, /* mmap */emufs_truncate,

NOTDIR, /* namefile */NOTDIR, /* creat */NOTDIR, /* symlink */NOTDIR, /* mkdir */NOTDIR, /* link */NOTDIR, /* remove */NOTDIR, /* rmdir */NOTDIR, /* rename */NOTDIR, /* lookup */

NOTDIR, /* lookparent */};

55Buffer Cache

57

Buffering Disk Blocks

•

Allow applications to work with arbitrarily sized region of a file

–Apps can still optimise for a particular block size

Disk

47

561215101311

1416Buffers in Kernel RAM

Transfer of whole blocks

Application Program

Transfer of arbitrarily sized regions

of file

58

Buffering Disk Blocks

•

Writes can return immediately after copying to kernel buffer

–Avoids waiting until write to disk is complete

–Write is scheduled in the background

Disk

47

56121510

1311

1416Buffers in Kernel RAM

Transfer of whole blocks

Application Program

Transfer of arbitrarily sized regions

of file

59

Buffering Disk Blocks

•

Can implement read-ahead by pre-loading next block on disk into kernel buffer

–Avoids having to wait until next read is issued

Disk

47

561215101311

1416Buffers in Kernel RAM

Transfer of whole blocks

Application

Program

Transfer of arbitrarily sized regions

of file

60

Cache

•Cache:

–Fast storage used to temporarily hold data to speed up repeated access to the data

•Example: Main memory can cache disk blocks

63

Unix Buffer Cache

On read

–Hash the

device#, block#–Check if match in buffer cache –Yes, simply use in-memory copy –No, follow the collision chain –If not found, we load block from disk into cache

Replacement

•What happens when the buffer cache is full and we need to read another block into memory?

–We must choose an existing entry to replace –Similar to page replacement policy

•Can use FIFO, Clock, LRU, etc.

•Except disk accesses are much less frequent and take longer than memory references, so LRU is possible •However, is strict LRU what we want?

–What is different between paged data in RAM and file data in RAM?

65

File System Consistency

•Paged data is not expected to survive crashes or power failures

•File data is expected to survive

•Strict LRU could keep critical data in memory forever if it is frequently used.

66

File System Consistency

•Generally, cached disk blocks are prioritised in terms of how critical they are to file system consistency

–Directory blocks, inode blocks if lost can corrupt the entire filesystem

• E.g. imagine losing the root directory

•These blocks are usually scheduled for immediate write to disk

–Data blocks if lost corrupt only the file that they are associated with

•These block are only scheduled for write back to disk periodically

•In UNIX, flushd (flush daemon ) flushes all modified blocks to disk every 30 seconds

UNIX File Management UNIX index node (inode)

3UNIXindexnode(inode)•EachfileisrepresentedbyanInode•Inodecontainsallofafile’smetadata–Accessrights,owner,accountinginfo–(partial)blockindextableofafile•Eachinodehasauniquenumber(withinapartition)–Systemorientedname–Try‘ls–i’onUnix(Linux)•Directorie

推荐度：

点击下载本文 文档为doc格式

热门焦点

UNIX File Management UNIX index node (inode)

UNIX File Management UNIX index node (inode)

UNIX File Management UNIX index node (inode)

最新推荐

猜你喜欢

热门推荐