GIOS Lecture Notes - Part 3 Lesson 5

GIOS Lecture Notes - Part 3 Lesson 5 - I/O Management

I/O Management

Focus on the mechanisms that OS’s use to represent and manage I/O hardware devices
Notably block devices and file systems

I/O Devices

Key features of a device
- control registers
  - command - CPU uses to control what the device does
  - data transfers - used for the CPU to control data transfers in and out of device
  - status - used by the CPU to find out what’s happening on the device
- microcontroller
  - device’s CPU
- on-device memory
- other logic
  - e.g. analog to digital converters

CPU - Device Interconnect

Devices interface with the rest of the system via a controller that’s integrated as part of the device packaging
Peripheral Component Interconnect (PCI)
- Common architecture is PCI Express (PCIe)
  - (more advanced than original PCI or PCI-X bus)
  - Usually includes PCI eXtended (PCI-X) for backwards compatibility
Other types of interconnects
- SCSI Bus
- Peripheral Bus
- Controllers determine what kinds of interconnects a device can use
- Bridges handle differences between different types of interconnects

Device Drivers

OS’s support devices via device drivers
Device Drivers
- per each device type
- Responsible for device access, management, and control
  - includes how instructions can be passed from higher level systems like the kernel or applications to the actual device
  - includes how the system should respond to device-level events like errors or notifications
- provided by device manufacturers per OS/version
- each OS standardizes interfaces to device drivers
  - so that the manufacturers can stay within a framework
  - this achieves device independence – the OS does not need to be specialized for any given device
  - this achieves device diversity – any device manufacturer can write a driver for any given OS knowing there’s a standardized interface

Types of Devices

To deal with such device diversity, devices are usually grouped into types
Block: Disk
- read/write blocks of data
- direct access to arbitrary block
Character: Keyboard
- get/put characters
- serial sequence of characters
Network Devices
- kind of a special case, somewhere between block and character. stream of data chunks, sometimes of different sizes
OS representation of a device == special device file
- allows any other existing OS file mechanisms to interact with devices
Unix-like systems
- all devices appear as files under /dev directory
- handled by special filesystem, not the normal one
  - tmpfs
  - devfs

CPU-Device Interactions

The main way PCI makes devices available to the CPU is to represent these devices’ registers in a way similar to how the CPU accesses memory
Device registers appear to the CPU as memory locations at a specific physical address
Memory-mapped I/O
- part of ‘host’ physical memory is dedicated for device interactions
- Base Address Registers (BAR) – control this dedicated memory
- Configured during boot process and determined by PCI configuration protocol
I/O port model
- dedicated in/out CPU instructions for device access
- target device (I/O) port and value in register

Path from Device to CPU

The path from the device to the CPU can take 2 routes
- Interrupt
  - devices can generate interrupts to CPU
  - Pros
    - can be generated as soon as possible
  - Cons
    - interrupt handling steps generate overhead in several ways
- Polling
  - CPU can poll devices by reading their status registers
  - Pros
    - can be done when convenient for OS
  - Cons
    - either delay or additional CPU overhead depending on implementation
Choosing between them will depend on a variety of charteristics
- type of device
- interrupt complexity
- overall load on the system

Device Access – PIO

Programmed I/O == PIO
Requires no additional hardware support
CPU issues instructions by writing into the command registers of the device
CPU controls data movement by accessing data registers of the device
Example: NIC, data == network packet
- write command to request packet transmission
- copy packet to data registers
- repeat until packet sent
  - e.g. 1500b packet; 8byte regs or bus
    - 1 (for bus command) + ~188 for data => ~189 CPU store instructions

Direct Memory Access (DMA)

Alternative to PIO
Relies on special hardware support – DMA controller
CPU “programs” the device
- via command registers
- via DMA controls
Example: NIC
- write command to request packet transmission
- configure DMA controller with in-memory address and size of packet buffer
  - e.g. 1500B w/ 8 byte regs or bus => 1 store instruction + 1 DMA configure
  - less steps, but DMA config is more complex
    - therefore, for smaller transfer PIO is better than DMA, as DMA configuration takes many cycles
For DMAs:
- data buffer must be in physical memory until transfer completes
  - cannot swap it out to disk, as DMA controller can only interact with physical memory
  - this is done with pinning regions (non-swappable pages)

Typical Device Access

Typical ways a user process interacts with a device:
1. system call to kernel
2. in-kernel stack for target device
3. device request configuration
  - driver knows all the specific device things, it will handle minutae
4. Device performs request
5. call chain is then traversed in a reverse manner back up to user process

OS Bypass

It is not actually necessary to go through the kernel to get to a device
- Some devices can be configured to be directly access from User Level
- This is called OS Bypass
OS Bypass
- device registers and/or data directly accessible
- OS does the configuration then gets out of the way
- Jump straight from user process to device driver
- requires a user-level driver library for user process to talk to device driver. equivalent to kernel device drivers
- OS retains coarse-grain control, such as enable/disable or add/remove permission
- relies on device features
  - device must have sufficient registers so that OS can map some registers to user process but still retain access to whatever registers that are used for configuration and control
  - must also be able to share the device across multiple user processes
  - demux capability
    - one device wants to send data back to one of multiple user processes, the device must be able to figure out how to correctly send desired data to target process’ address space
    - this means the device must be able to look inside data to understand where it should go
  - In normal device usage path, kernel would normally handle all of this, putting less burden on the device and libraries
- ioctl() command on Linux
  - ioctl call examples

Sync vs Async Access

When an I/O request is made, usually the user process needs a response from the device, even just an acknowledgement
What happens to the user process while waiting on that response?
- Synchronous
  - user process blocks, placed on wait queue
- Asynchronous
  - process continues
  - later:
    - process checks and retrieves result
    - OR process is notified that the operation completed and the results are ready
    - similar to interrupt vs polling tradeoffs

Block Device Stack

Block device are typically used as storage for files
Processes use files as a logical storage unit
- Applications dont think about disks, they think about files and request operations to be performed on files
kernel file system (FS; POSIX API)
- where and how to find and access file
- OS specifies the interface for the file system
  - this is to support using multiple different file systems
generic block layer
- os standardized block interface
- serves as standard for smoothing interactions between kernel and block devices by passing instructions to and interpreting errors from the driver layer
device driver for each device
- protocol specific API

Virtual File System

What if files are on more than one device
What if devices work better with different FS implementations?
What if files are not on a local device?
To deal with these issues, OS’s use a virtual file system layer
- Hide from applications all details regarding underlying FS

VFS Abstractions

File == elements on which the VFS operates
file descriptor == OS representation of file
- used for open/read/write/sendfile/lock/close etc.
inode == persistent representation of file “index”
- list of all data blocks that correspond to the file
- device, permissions, size, etc
- a directory is really just a file, except its contents include information about files/inodes within
dentry == directory entry, corresponds to single path component
- /users/ada => /, /users, /users/ada
- if we’re trying to reach a file in ada, need to traverse path. VFS will create dentry element for each path component
- dentry cache will house all directories that have been visited to keep overhead down over time
- soft state, no on-disk representation
superblock
- superblock == filesystem-specific information regarding the FS layout on storage device
- FS maintains some additional metadata here that helps during operations. What is stored differs among FS implementations

VFS On Disk

VFS dentries are maintained in soft state by OS, but other components are kept on-disk
file => data blocks on disk
inode => track files’ blocks
- also resides on disk in some block
- example: blue and green blocks are each separate files, red nodes are inodes for those files
Superblock => overall map of disk blocks
- inode blocks
- data blocks
- free blocks

ext2: Second Extended Filesystem

Was a default FS in several versions of Linux. Most recently replaced by ext4

For each block group
- superblock => number of inodes, number of disk blocks, start of free blocks
- group descriptor => bitmaps, number of free nodes, number of directories
- bitmaps => tracks free blocks and inodes
- inodes => 1 to max number, 1 per file
  - owner, accounting info, how to locate actual data blocks
- data blocks => file data

inodes

are an index of all disk blocks corresponding to a file
file => uniquely identified by its inode
inode => list of all blocks and other metadata. uniquely numbered.
- pros
  - easy to perform sequential or random accesses to a given file
- cons
  - places a limit on file size - number of blocks that can be indexed into an inode
    - e.g. 128B inode, 4B block ptr
      - 32 addressible blocks x 1kb per block => 32kb max file size

inodes with indirect pointers

are an index of all disk blocks corresponding to a file
inodes contain
- metadata
- pointers to blocks of data
  - the ones discussed above are known as direct pointers
  - indirect pointers solve the space problem
    - points to a block full of pointers
    - in the above example with 128B inode with a 4b pointer and 1kb block size, this allows a pointer with a single level of indirection to point to 256kb of file data per entry
    - double indirect pointers are the same concept with two layers of direction, and blows it up to 64mb per entry
    - can layer as many levels of indirection as you want, obviously
pros
- small inode => large file size
cons
- file access slowdown as you must access the disk at each step of traversing the indirection

Disk Access Optimizations

A key goal is to reduce file access overheads
Caching/buffering => reduce disk accesses
- buffer cache in main memory
- read/write from cache
- periodically flush cache to disk - fsync()
I/O scheduling => reduce disk head movement
- maximize sequential access (as opposed to random acess)
- e.g writes to blocks 25 then 17 would be reorderd to go 17->25 to reduce disk head movement overall
Prefetching => increases cache hits
- leverages locality of data on disk
- e.g. if you’re given an instruction to read block 17, also read/cache blocks 18 and 19 because they’re likely to be needed too
- This does add some overhead by adding additional read volume and cache size that may not be needed, but on balance is usually a net win
Journaling/Logging => reduce random access
- “describe” write in a log: block, offset, value, etc
- periodically apply updates to proper disk locations
- this provides safety and stability to the reorder process, as it allows any reordered but not yet performed disk writes to be kept track of

andrew@theinternet

Software Engineering

OMSCS

Data Science

GIOS Lecture Notes - Part 3 Lesson 5 - I/O Management

I/O Management

I/O Devices

CPU - Device Interconnect

Device Drivers

Types of Devices

CPU-Device Interactions

Path from Device to CPU

Device Access – PIO

Direct Memory Access (DMA)

Typical Device Access

OS Bypass

Sync vs Async Access

Block Device Stack

Virtual File System

VFS Abstractions

VFS On Disk

ext2: Second Extended Filesystem

inodes

inodes with indirect pointers

Disk Access Optimizations