 |
» |
|
|
 |
 |  |  |  |  | NOTE: The tunables and the techniques described in this section
work on a per file system basis. Use them judiciously based on the
underlying device properties and characteristics of the applications
that use the file system. |  |  |  |  |
Performance of a file system can be enhanced by a suitable
choice of I/O sizes and proper alignment of the I/O requests based
on the requirements of the underlying special device. VxFS provides
tools to tune the file systems. Tuning VxFS I/O Parameters |  |
VxFS provides a set of tunable I/O parameters that control
some of its behavior. These I/O parameters are useful to help the
file system adjust to striped or RAID-5 volumes that could yield
performance superior to a single disk. Typically, data streaming
applications that access large files see the largest benefit from
tuning the file system. If VxFS is being used with the VERITAS Volume Manager, the
file system queries VxVM to determine the geometry of the underlying
volume and automatically sets the I/O parameters. The mount command also queries VxVM when the file system is mounted
and downloads the I/O parameters. If the default parameters are not acceptable or the file system
is being used without VxVM, then the /etc/vx/tunefstab file can be used to set values for I/O parameters. The mount command reads the /etc/vx/tunefstab file and downloads any parameters specified for a file
system. The tunefstab file overrides any values obtained from VxVM. While the
file system is mounted, any I/O parameters can be changed using
the vxtunefs command which can have tunables specified on the command
line or can read them from the /etc/vx/tunefstab file. For more details, see the vxtunefs(1M) and tunefstab(4) manual pages.
The vxtunefs command can be used to print the current values of the
I/O parameters: # vxtunefs -p mount_point |
If the default alignment from mkfs is not acceptable, the -o align=n option can be used to override alignment information
obtained from VxVM. The following is an example tunefstab file: /dev/vx/dsk/userdg/netbackup |
read_pref_io=128k,write_pref_io=128k,read_nstream=4,write_nstream=4 |
read_pref_io=128k,write_pref_io=128k,read_nstream=4,write_nstream=4 |
/dev/vx/dsk/userdg/metasave |
read_pref_io=128k,write_pref_io=128k,read_nstream=4,write_nstream=4 |
/dev/vx/dsk/userdg/solbuild |
read_pref_io=64k,write_pref_io=64k,read_nstream=4,write_nstream=4 |
/dev/vx/dsk/userdg/solrelease |
read_pref_io=64k,write_pref_io=64k,read_nstream=4,write_nstream=4 |
/dev/vx/dsk/userdg/solpatch |
read_pref_io=128k,write_pref_io=128k,read_nstream=4,write_nstream=4 |
Tunable VxFS I/O Parameters |  |
read_pref_io | The preferred read request size. The
file system uses this in conjunction with the read_nstream value to determine how much data to read ahead. The default
value is 64K. | write_pref_io | The preferred write request size. The
file system uses this in conjunction with the write_nstream value to determine how to do flush behind on writes.
The default value is 64K. | read_nstream | The number of parallel read requests
of size read_pref_io to have outstanding at one time. The file system uses
the product of read_nstream multiplied by read_pref_io to determine its read ahead size. The default value for read_nstream is 1. | write_nstream | The number of parallel write requests
of size write_pref_io to have outstanding at one time. The file system uses
the product of write_nstream multiplied by write_pref_io to determine when to do flush behind on writes. The default
value for write_nstream is 1. | default_indir_ size | On VxFS, files can have up to ten direct
extents of variable size stored in the inode. Once these extents
are used up, the file must use indirect extents which are a fixed
size that is set when the file first uses indirect extents. These
indirect extents are 8K by default. The file system does not use larger
indirect extents because it must fail a write and return ENOSPC if there are no extents available that are the indirect
extent size. For file systems with many large files, the 8K indirect
extent size is too small. The files that get into indirect extents
use many smaller extents instead of a few larger ones. By using
this parameter, the default indirect extent size can be increased
so large that files in indirects use fewer larger extents. The tunable default_indir_size should be used carefully. If it is set too large, then
writes will fail when they are unable to allocate extents of the
indirect extent size to a file. In general, the fewer and the larger
the files on a file system, the larger the default_indir_size can be set. This parameter should generally be set to
some multiple of the read_pref_io parameter. default_indir_size is not applicable on Version 4 and Version 5 disk layouts. | discovered_direct_iosz | Any file I/O requests larger than the discovered_direct_iosz are handled as discovered direct I/O. A discovered direct
I/O is unbuffered similar to direct I/O, but it does not require
a synchronous commit of the inode when the file is extended or blocks
are allocated. For larger I/O requests, the CPU time for copying
the data into the page cache and the cost of using memory to buffer
the I/O data becomes more expensive than the cost of doing the disk
I/O. For these I/O requests, using discovered direct I/O is more
efficient than regular I/O. The default value of this parameter
is 256K. | hsm_write_ prealloc | For a file managed by a hierarchical
storage management (HSM) application, hsm_write_prealloc preallocates disk blocks before data is migrated back
into the file system. An HSM application usually migrates the data
back through a series of writes to the file, each of which allocates
a few blocks. By setting hsm_write_prealloc (hsm_write_prealloc=1), a sufficient number of disk blocks are allocated on
the first write to the empty file so that no disk block allocation
is required for subsequent writes. This improves the write performance
during migration. The hsm_write_prealloc parameter is implemented outside of the DMAPI specification,
and its usage has limitations depending on how the space within
an HSM-controlled file is managed. It is advisable to use hsm_write_prealloc only when recommended by the HSM application controlling
the file system. | initial_extent_ size | Changes the default initial extent size.
VxFS determines, based on the first write to a new file, the size
of the first extent to be allocated to the file. Normally the first
extent is the smallest power of 2 that is larger than the size of
the first write. If that power of 2 is less than 8K, the first extent allocated
is 8K. After the initial extent, the file system increases the size
of subsequent extents (see max_seqio_extent_size) with each allocation. Since most applications write
to files using a buffer size of 8K or less, the increasing extents
start doubling from a small initial extent. initial_extent_size can change the default initial extent size to be larger,
so the doubling policy will start from a much larger initial size
and the file system will not allocate a set of small extents at
the start of file. Use this parameter only on file systems that
will have a very large average file size. On these file systems
it will result in fewer extents per file and less fragmentation. initial_extent_size is measured in file system blocks. | max_buf_data_size | The maximum buffer size allocated for
file data; either 8K bytes or 64K bytes. Use the larger value for
workloads where large reads/writes are performed sequentially. Use the
smaller value on workloads where the I/O is random or is done in
small chunks. 8K bytes is the default value. | max_direct_iosz | The maximum size of a direct I/O request
that will be issued by the file system. If a larger I/O request
comes in, then it is broken up into max_direct_iosz chunks. This parameter defines how much memory an I/O
request can lock at once, so it should not be set to more than 20
percent of memory. | max_diskq | Limits the maximum disk queue generated
by a single file. When the file system is flushing data for a file
and the number of buffers being flushed exceeds max_diskq, processes will block until the amount of data being
flushed decreases. Although this doesn't limit the actual disk queue, it
prevents flushing processes from making the system unresponsive.
The default value is 1 MB. | max_seqio_extent_size | Increases or decreases the maximum size
of an extent. When the file system is following its default allocation policy
for sequential writes to a file, it allocates an initial extent
which is large enough for the first write to the file. When additional
extents are allocated, they are progressively larger (the algorithm
tries to double the size of the file with each new extent) so each
extent can hold several writes worth of data. This is done to reduce
the total number of extents in anticipation of continued sequential writes.
When the file stops being written, any unused space is freed for
other files to use. Normally this allocation stops increasing the
size of extents at 2048 blocks which prevents one file from holding
too much unused space. max_seqio_extent_size is measured in file system blocks. | read_ahead | The default for all VxFS read operations
is to perform sequential read ahead. You can specify the read_ahead cache
advisory to implement the VxFS enhanced read ahead functionality.
This allows read aheads to detect more elaborate patterns (such
as increasing or decreasing read offsets or multithreaded file accesses)
in addition to simple sequential reads. You can specify the following
values for read_ahead: 0—Disables read ahead
functionality 1—Retains traditional sequential
read ahead behavior 2—Enables enhanced read
ahead for all reads The default is 1—VxFS detects
only sequential patterns. read_ahead detects patterns
on a per-thread basis, up to a maximum determined by vx_era_nthreads parameter. The default number of threads is 5, but you
can change the default value by setting the vx_era_nthreads parameter in the /etc/system configuration file. | write_throttle | The write_throttle parameter is useful in special situations where a computer
system has a combination of a large amount of memory and slow storage
devices. In this configuration, sync operations (such as fsync()) may take long enough to complete that a system appears
to hang. This behavior occurs because the file system is creating dirty
buffers (in-memory updates) faster than they can be asynchronously
flushed to disk without slowing system performance. Lowering
the value of write_throttle limits the number of dirty buffers per file that a file
system will generate before flushing the buffers to disk. After
the number of dirty buffers for a file reaches the write_throttle threshold, the file system starts flushing buffers to
disk even if free memory is still available. The default
value of write_throttle is zero, which puts no limit on the number of dirty buffers
per file. If non-zero, VxFS limits the number of dirty buffers per
file to write_throttle buffers. The default value typically generates
a large number of dirty buffers, but maintains fast user writes.
Depending on the speed of the storage device, if you lower write_throttle, user write performance may suffer, but the number of
dirty buffers is limited, so sync operations will complete much
faster. Because lowering write_throttle may in some cases delay write requests (for example,
lowering write_throttle may increase the file disk queue to the max_diskq value, delaying user writes until the disk queue decreases),
it is advisable not to change the value of write_throttle unless your system has a combination of large physical memory
and slow storage devices. |
If the file system is being used with VxVM, it is advisable
to let the VxFS I/O parameters get set to default values based on
the volume geometry. If the file system is being used with a hardware disk array
or volume manager other than VxVM, try to align the parameters to
match the geometry of the logical disk. With striping or RAID-5,
it is common to set read_pref_io to the stripe unit size and read_nstream to the number of columns in the stripe. For striped arrays,
use the same values for write_pref_io and write_nstream, but for RAID-5 arrays, set write_pref_io to the full stripe size and write_nstream to 1. For an application to do efficient disk I/O, it should issue
read requests that are equal to the product of read_nstream multiplied by read_pref_io. Generally, any multiple or factor of read_nstream multiplied by read_pref_io should be a good size for performance. For writing, the
same rule of thumb applies to the write_pref_io and write_nstream parameters. When tuning a file system, the best thing
to do is try out the tuning parameters under a real life workload. If an application is doing sequential I/O to large files,
it should try to issue requests larger than the discovered_direct_iosz. This causes the I/O requests to be performed as discovered
direct I/O requests, which are unbuffered like direct I/O but do
not require synchronous inode updates when extending the file. If the
file is larger than can fit in the cache, using unbuffered I/O avoids
removing useful data out of the cache and lessens CPU overhead.
|