Abstract
The ftScalableTM Storage and ftScalable Storage G2 arrays are highly flexible, scalable hardware storage subsystems. Understanding the benefits and pitfalls of the various RAID types and configuration options, especially how they interact with the OpenVOS operating system, allows you to create an optimal disk topology.
This white paper details the various RAID types supported by ftScalable Storage, outlining their strengths and weaknesses, typical usages, and ways to design a disk topology best suited to your OpenVOS application.
Terminology
The following glossary defines some common storage-industry terms as used in this document.
Degraded mode. The mode of operation of a VDISK after one of its physical disk drives has failed, but before any recovery operation starts. While in this mode of operation, the VDISK is not fully redundant, and a subsequent physical drive failure could result in loss of this VDISK.
HBA or host bus adapter. A PCI-X or PCI-e circuit board or integrated circuit adapter that provides input/output (I/O) processing and physical connectivity between a server and a storage device.
Logical disk. An OpenVOS logical volume that contains one or more member disks. Each member disk is either a single physical disk in a D910 Fibre Channel disk enclosure, a LUN in an ftScalable Storage array or a duplexed pair of either type or both types.
LUN or logical unit. Either a subdivision of, or an entire VDISK in an ftScalable Storage array.
Multi-member logical disk. An OpenVOS logical volume consisting of at least two pairs of duplexed member disks with data striped across all the member disk pairs.
RAID overhead. The amount of physical drive capacity used in a specific RAID type to provide redundancy. For example, in a RAID1 VDISK, this would be 50% of the total capacity of the physical drives that make up the VDISK, since the data on each drive is duplicated on a mirrored partner drive.
Recovery mode. The mode of operation of the VDISK while it is rebuilding after a drive failure. While in this mode of operation, the VDISK is not fully redundant and a subsequent physical drive failure could result in loss of this VDISK.
Striping. A method of improving I/O performance by breaking data into blocks and writing the blocks across multiple physical disks.
VDISK or virtual disk. A group of one or more physical disk drives in an ftScalable Storage array, organized using a specific RAID type into what appears to the operating system as one or more disks, depending on the number of LUNs defined.
The terms “VOS” and “OpenVOS” are used interchangeably in this document in reference to Stratus’s VOS and OpenVOS operating systems.
1.0 RAID Types
The ftScalable Storage array supports a variety of RAID types. These include: non-fault-tolerant RAID types (RAID-0, NRAID), parity-based RAID types (RAID-3, RAID-5, and RAID-6), mirroring RAID types (RAID-1), and combination RAID types (RAID-10, RAID-50). You must specify a RAID type when creating each VDISK.
Each RAID type has unique availability, cost, performance, scalability, and serviceability characteristics. Understanding them allows you to make an informed selection when creating your array disk topology.
1.1 Non-Fault-Tolerant RAID Types
There are two non-fault-tolerant RAID types available in ftScalable Storage, RAID-0 and NRAID.
1.1.1 RAID-0
A RAID-0 VDISK consists of at least two physical disk drives, with data striped across all the physical disk drives in the set. It provides the highest degrees of I/O performance, but offers no fault tolerance. Loss of any physical disk drive will cause total loss of data in this VDISK.
Since RAID-0 is a non–fault-tolerant RAID type, the ftScalable Storage array cannot automatically take marginal or failing physical disk drives out of service and proactively rebuild the data using an available spare disk drive. Instead, recovery completely depends on the traditional OpenVOS system of fault tolerance via duplexed disks.
As a result, a series of manual service operations (delete the failing VDISK, physically remove the bad physical disk, install a replacement physical drive, recreate the VDISK, reformat the logical disk, re-duplex via VOS) are required to recreate and recover any RAID-0 VDISK. Your data is simplexed until all these recovery operations are completed. For further information regarding the impacts that physical drive insertions or removals have on I/O processing, see Section 11.0, “Physical Disk Drive Insertions and Removals: Impacts to I/O Performance”.
Stratus does not recommend using this RAID type without also using the software based mirroring available in OpenVOS. Even with OpenVOS mirroring, you should strongly consider the potential for data loss given the manual service operations and associated time required to restore your data to full redundancy.
1.1.2 NRAID
An NRAID VDISK is basically a single physical disk drive without any fault tolerance. It offers no striping and thus has the performance characteristics of a single physical disk drive. NRAID VDISKs have all the same availability and serviceability characteristics as a RAID-0 VDISK.
1.2 Parity-Based RAID Types: RAID-3, RAID-5, RAID-50, and RAID-6
The ftScalable Storage array supports four types of parity-based VDISKs: RAID-3, RAID-5, RAID-50 and RAID-6. Given the low usage of RAID-3 and RAID-50, this whitepaper focuses on the more commonly used RAID-5 and RAID-6 types.
These RAID types use parity-based algorithms and striping to offer high availability at a reduced cost compared to mirroring. A RAID-5 VDISK uses the capacity equivalent of one physical disk drive for storing XOR generated parity data, while a RAID-6 VDISK uses the equivalent of two physical disk drives, as both XOR and Reed-Solomon parity data is generated and stored. Both RAID-5 and RAID-6 VDISKs distribute parity and data blocks among all the physical disk drives in the set.
VDISKs using parity-based RAID types require less storage capacity for RAID overhead compared to mirroring RAID types. The minimum number of physical disk drives to create a RAID-5 VDISK is three, while a RAID-6 requires at least four.
A RAID-5 VDISK can survive a single disk drive failure without data loss, while a RAID-6 VDISK can survive two drive failures. The ftScalable Storage array can proactively remove a marginal or failing physical disk drive from the VDISK without affecting availability of data. In addition, if a spare drive is available, recovery mode starts automatically without any operator intervention, physical drive insertions, or need to re-duplex logical disks in OpenVOS as it is handled transparently to the operating system. You can then schedule a time to replace the failed disk drive and create a new spare. However, see Section 11.0, “Physical Disk Drive Insertions and Removals: Impacts to I/O Performance”, for further information regarding the impacts that a physical drive removal and insertion have on I/O processing.
Both types offer excellent read performance, but write performance is impacted by needing to write not only the data block, but also by the calculation and read / modify / re-write operations necessary for the parity block(s). A RAID-5 or RAID-6 VDISK running in degraded mode after a single physical disk drive failure has a medium impact on throughput. However, a VDISK in recovery mode with data being rebuilt has a high impact on throughput.
A RAID-6 VDISK running in degraded mode resulting from two failed physical disk drives has a medium to high impact on throughput, while one running in recovery mode with two drives being rebuilt has an extremely high impact on throughput.
Review Table 1 for the estimated impact to I/O when running with a RAID-5 or RAID-6 VDISK in either degraded or recovery mode.
NOTE: These are estimates, and the actual impact in your environment may vary depending on your configuration, workload, and your application I/O profile.
Table 1. Estimated Degradation to I/O Performance
RAID 5 / RAID 6 Degraded Mode Single Drive Failure |
RAID 5 / RAID 6 Recovery Mode Single Drive Failure |
RAID 6 Degraded Mode Dual Drive Failure |
RAID 6 Recovery Mode Dual Drive Failure |
|
Read Perf. | 40 – 50% | 50 – 60% | 50 – 60% | 60 – 70% |
Write Perf. | 10 – 15% | 15 – 25% | 20 – 25% | 25 – 35% |
1.3 Mirroring RAID Types: RAID-1 and RAID-10
With ftScalable Storage, you can create two types of mirroring RAID VDISKs, RAID-1 and RAID-10.
1.3.1 RAID-1:
A RAID-1 VDISK is a simple pair of mirrored physical disk drives. It offers good read and write performance and can survive the loss of a single physical disk drive without impacting data availability. Reads can be handled by either physical drive, while writes must be written to both drives. Since all data is mirrored in a RAID-1 VDISK, there is a high degree of RAID overhead compared to parity-based RAID types.
Recovery from a failed physical disk drive is a straight-forward operation, requiring only a re-mirroring from the surviving partner. The ftScalable Storage array can proactively remove a marginal or failing physical disk drive from a RAID-1 VDISK without affecting availability of data. As with parity-based RAID types, if a spare drive is available, recovery mode starts automatically without any operator intervention, physical drive insertions, or need to re-duplex the logical disk in OpenVOS as it is handled transparently to the operating system. You can then schedule a time to replace the failed disk drive and create a new spare. That said, see Section 11.0, “Physical Disk Drive Insertions and Removals: Impacts to I/O Performance”, for further information regarding the impacts that a physical drive removal and insertion have on I/O processing.
There is typically a small impact on performance while running in either degraded or recovery mode.
1.3.2 RAID-10:
A RAID-10 VDISK is comprised of two or more RAID-1 disk pairs, with data blocks striped across them all. A RAID-10 VDISK offers high performance, scalability and the ability to potentially survive multiple physical drive failures without losing data. The serviceability, RAID overhead, and impact on performance while running in degraded or recovery mode is similar to that of a RAID-1 VDISK.
1.3.3 RAID Type Characteristics Summary
Table 2 summarizes the characteristics of the RAID types discussed. It rates VDISKs of each type in several categories on a scale of 0 to 5, where 0 is very poor and 5 is very good. You should only compare values within each row; comparisons between rows are not valid.
Table 2. RAID Type Characteristics
Category | NRAID | RAID‑0 | RAID‑1 | RAID‑10 | RAID‑5 | RAID‑6 |
Availability | 0 | 0 | 3 | 5 | 4 | 5 |
RAID Overhead | 5 | 5 | 0 | 0 | 3 | 2 |
Read Performance | 3 | 5 | 4 | 5 | 4 | 4 |
Write Performance | 3 | 5 | 3 | 4 | 2 | 2 |
Degraded Mode Performance | N/A | N/A | 3 | 5 | 2 | 1 |
Recovery Mode Performance | N/A | N/A | 3 | 5 | 2 | 1 |
2.0 Selecting a RAID type
Each RAID type has specific benefits and drawbacks. By understanding them, you can select the RAID type best suited to your environment. Keep in mind that you can create multiple VDISKs that use any of the RAID types supported by the ftScalable Storage array, allowing you to design a RAID layout that is optimal for your application and system environment. You do not need to use the same RAID type for all the VDISKs on your ftScalable Storage array.
NOTE: Stratus’s use of a specific RAID type and LUN topology for the OpenVOS system volume does not imply that is the optimal RAID type for your application or data.
For data and applications where write throughput or latency is not critical (for example, batch processing), or are heavily biased toward reads versus writes, RAID-5 is a good choice. In return for accepting lower write throughput performance and higher latency, you can use fewer physical disk drives for a given capacity, yet still achieve a high degree of fault-tolerance. However, you must also consider the impact that running with a VDISK in either degraded or recovery mode could have on your application. Overall I/O performance and latency during degraded and recovery mode suffers more with parity-based RAID types compared to mirroring RAID types.
For data and applications which require optimum write throughput with smallest latencies (for example, online transaction processing systems), which perform more writes than reads, or which cannot tolerate degraded performance in the event of a physical drive failure, mirroring RAID types (RAID-1 or RAID-10) offer a better solution. These RAID types eliminate the additional I/Os resulting from read-before-write penalty for parity data of RAID-5 or RAID-6, so writing data is a simple operation. RAID-10 is generally a better choice than RAID-1 because it allows you to stripe the data over multiple physical drives, which can significantly increase overall read and write performance. However, see Section 5.0, “OpenVOS Multi-Member Logical Disks Versus ftScalable RAID-10 VDISKs” and Section 6.0, “OpenVOS Queue Depth and ftScalable Storage” for additional information about OpenVOS I/O queuing, LUN counts and striping considerations.
For data and applications that can tolerate longer periods with simplexed data after a drive failure, or which are not very sensitive to longer latencies, NRAID and RAID-0 VDISKs, only when used with OpenVOS’s mirroring, may be considered. Selecting one of these RAID types allows using the fewest number of physical disk drives for a given capacity point, although, in exchange for lessened availability. Given these restrictions and availability implications, Stratus does not recommend using these RAID types.
If you can’t decide whether to select a parity-based or mirroring RAID type, then the prudent choice is to use one of the mirroring RAID types as they will offer the best performance and availability characteristics in the majority of applications.
3.0 Partitioning VDISKs into LUNs
Before a VDISK can be used by OpenVOS, it must first be partitioned into one or more LUNs. Each LUN is assigned to a specific VOS member disk. One or more member disks are combined into a single OpenVOS logical disk.
While the ftScalable Storage array supports partitioning a VDISK into multiple LUNs, this can introduce significant performance penalties that affect both I/O throughput and latency for all the LUNs on that VDISK. As a result, Stratus does not recommend configurations using multiple LUNs per VDISK for customer data.
There are several reasons for the performance penalties seen running multi-LUN VDISK configurations, but the basic ones are disk contention and head seeks. Each time the ftScalable Storage array has to satisfy an I/O request to one of the LUNs in a multi-LUN VDISK configuration, it has to seek the physical disk drive heads. The more LUNs that comprise a VDISK, the more head movement occurs. The more head movement there is, the greater the latencies become as disk contention increases. Remember, all I/O must eventually be handled by the physical disk drives that make up the VDISK; the array’s cache memory cannot replace this physical I/O.
Stratus has run benchmarks demonstrating that the aggregate I/O throughput of a 4-LUN VDISK is about half the throughput of the same VDISK configured as a single LUN, while the average latency can be over four times greater.
Charts 1 and 2 show the impacts using multiple LUNs per VDISK have on performance. These charts show the aggregate of write I/Os per second (IOPS) and maximum latencies in milliseconds (ms) seen when using a 4 drive RAID-5 VDISK configured with one, two or three LUNs.
NOTE: These charts are based on results from Stratus internal lab testing under controlled conditions. Your actual results may vary.
Charts 1 and 2. Multiple LUNs per VDISK Performance Impacts
4.0 Assigning OpenVOS Logical Disks to LUNs
The simplest approach is to assign each member disk within an OpenVOS logical disk to a LUN. If you need a VOS logical disk that is bigger than a single LUN, or if you want the performance benefits of striping, you can create a VOS multi-member logical disk, where each member disk is a single LUN.
Figure 1 shows the relationship between physical disk drives, VDISKs, and LUNs on the ftScalable Storage array and OpenVOS logical disks. This is an example of a simple OpenVOS logical disk, consisting of two member disks, where each member disk is a single RAID1 VDISK / LUN on an ftScalable Storage array.
5.0 OpenVOS Multi-Member Logical Disks Versus ftScalable RAID-10 VDISKs
There are now a variety of ways in which you can implement striping in OpenVOS. Prior to the release of ftScalable Storage, the only method available was to use multiple physical disk drives configured as a VOS multi-member logical disk. With the advent of ftScalable Storage, you can create RAID-10 VDISKs where the array handles all striping, or even a combination of both methods, combining multiple LUNs, each of which is a VDISK, into a single VOS multi-member logical disk.
If you want to use striping, Stratus recommends you use non-striping RAID type VDISKs (for example, RAID-1 or RAID-5), with a single LUN per VDISK, and combine them into VOS multi-member logical disks. This allows OpenVOS to maintain a separate disk queue for each LUN, maximizing throughput and while minimizing latency. That said, review Section 6.0, “OpenVOS Queue Depth and ftScalable Storage” for some considerations regarding numbers of allocated LUNs and potential performance implications.
6.0 OpenVOS Queue Depth and ftScalable Storage
All storage arrays, physical disk drives, Fibre Channel HBAs, and modern operating systems have various sized queues for I/O requests. The queue depth basically defines how many unique I/O requests can be pending (queued) for a specific device at any given time.
A queue full condition occurs when a device becomes extremely busy and cannot add any additional I/O requests onto its queue. When a queue full condition exists, new I/O requests are aborted, and re-tried until there is space on the queue again. This causes increased I/O latency, increased application response times, and decreased I/O throughput.
OpenVOS maintains a separate queue with a default queue depth of twelve for every LUN. Each host (Fibre Channel) port on the ftScalable Storage array has a single queue with a depth of 128.
In OpenVOS configurations with a large number of LUNs, it is possible to fill the host port queues on the ftScalable Storage array with a relatively small number of very busy LUNs. This results in I/O requests for other LUNs receiving a queue full condition status, delaying them and your application. You should carefully balance the number of LUNs used in your configuration and if necessary consult with Stratus to adjust the OpenVOS queue depth settings as necessary.
7.0 Assigning Files to VOS Logical Disks
When possible, assign randomly-accessed files and sequentially-accessed files to separate logical disks. Mixing the two types of file access methods on the same logical disk increases the worst-case time (maximum latency) needed to access the random-access files and reduces the maximum possible throughput of sequentially-accessed files. Also keep in mind that you can use a different RAID type for each logical disk, to best match the I/O access type.
8.0 Balancing VDISKs between Storage Controllers
The ftScalable Storage array has an active-active storage controller design, with two controllers actively processing I/O. However, every VDISK is assigned to a specific storage controller when allocated; either controller A or controller B. All I/O for a specific VDISK is handled by the assigned storage controller. If you do not specify which controller you want assigned to a particular VDISK, the ftScalable Storage array assigns them in a round-robin fashion, alternating between the two controllers.
While this may balance the number of VDISKs between the two storage controllers, it may not ensure that the I/O workload is evenly split. For example, suppose that you have 6 VDISKs in your configuration, called VDISK1 through VDISK6. VDISK1 and VDISK3 handle all your primary online data and are both very I/O intensive, while the rest of the VDISKs handle offline archival data and are much less busy.
If you did not explicitly assign the VDISKs to controllers, you will end up with VDISK1, VDISK3, and VDISK5 assigned to controller A, while VDISK2, VDISK4, and VDISK6 will be on controller B. This would result in an unbalanced I/O load between the two storage controllers.
You should consider the estimated I/O workloads when allocating your VDISKs, and, if necessary manually assign specific VDISKs to controllers during the VDISK creation process. If you find that your workload changes or that you have an unbalanced I/O allocation, you can reassign an existing VDISK to a new storage controller.
CAUTION: Changing controller ownership of a VDISK is a disruptive operation and cannot be done without temporarily impacting access to your data while the VDISK is moved between the two controllers and the LUN number is re-assigned. This cannot be done to online OpenVOS logical disks. Consult with Stratus for assistance before doing this operation.
9.0 Single VDISK configurations
While it is possible to create a single large VDISK on an ftScalable Storage array, you should avoid doing so as this has implications that affect performance and is not recommended by Stratus.
As described previously, there are two storage controllers within each ftScalable Storage array, running in an active-active mode. Each VDISK is assigned to a specific storage controller that owns and executes all I/O processing for that VDISK. In a single VDISK configuration, you are halving the total available performance of the ftScalable Storage array, as you will have only one of the two storage controllers processing all I/O requests.
In OpenVOS configurations, there is a separate queue of I/O disk requests for each LUN. By having only a single VDISK, you minimize OpenVOS’ ability to send parallel I/O requests to the ftScalable Storage array, again degrading your overall I/O throughput and latency.
10.0 VDISK / LUN Sizing Implications on OpenVOS
10.1 Raw Versus Usable Capacity
The OpenVOS operating system utilizes meta-data to ensure the highest degree of data integrity for disk data. This meta-data is stored as a separate physical sector from the data itself. As a result, OpenVOS uses nine physical sectors to store every eight sectors of usable data.
When configuring ftScalable VDISKs and LUNs, remember that the size presented to OpenVOS represents RAW capacity and does not reflect the meta-data overhead. Your usable capacity is approximately 8/9ths (88%) of the raw physical size of the VDISK / LUN. In addition, OpenVOS also reserves approximately 1.1 GB of space for partitioning overheard.
OpenVOS normally utilizes storage on a LUN rounded to the nearest 5 GB boundary. This allows partnering of LUNS with slightly dissimilar sizes. The only exception to this rounding is for LUNs that exactly match the size of certain legacy OpenVOS disk types (for example, a D913, or D914).
10.2 OpenVOS Disk Segments, LUN Sizing and LUN Counts
OpenVOS has a maximum of 254 addressable disk segments per VOS module, where each disk segment can address approximately 34 GB of storage. This results in a maximum of roughly 8.6 TB (Terabytes) of duplexed addressable storage on OpenVOS. Every OpenVOS logical disk consumes at least one disk segment.
These two constraints need to be considered when allocating your VDISK / LUN sizes and counts. Since each logical disk requires at least one segment, using many small VDISKs / LUNs can substantially reduce the overall maximum amount of storage capacity that can be configured on your OpenVOS system. To further maximize the amount of configurable storage on your OpenVOS system, create LUNs with an integral multiple of 34 GB usable (38.6 GB raw) size to minimize the number of disk segments required for a specific logical disk.
10.3 POSIX Restrictions on Logical Disk Size
The number of disk members and the total size of a VOS logical disk determine the size of generated inode numbers used by POSIX applications. In current VOS releases, that value is restricted to 32 bits, which allows a logical disk size of approximately 545 GB. If that limit is exceeded, the VOS initialize_disk command generates a warning that this could present a compatibility issue for your existing POSIX applications.
If all your POSIX applications that access logical disks have been recompiled on OpenVOS 17.1 with the preprocessor symbol _VOS_LARGE_INODE defined, and rebound with the OpenVOS 17.1 POSIX runtime routines that support both 32 and 64-bit inode numbers, there is no issue. The warning message for that disk may be ignored and/or suppressed with the -no_check_legacy_inodes option added to the initialize_disk command in OpenVOS release 17.1 and beyond.
Refer to the documents Software Release Bulletin: OpenVOS Release 17.1 (R622-01) and OpenVOS System Administration: Disk and Tape Administration (R284-12) for further information.
11.0 Physical Disk Drive Insertions and Removals: Impacts to I/O Performance
The ftScalable Storage array supports online insertion and removal of physical disk drives without powering off the array, or stopping the host and/or application. After one or more drives are inserted or removed, the ftScalable Storage array must go through a process of remapping the underlying physical disk topology to determine if there are any relocated, removed or newly inserted physical disk drives. This is called a “rescan.”
These rescans are done automatically by the ftScalable Storage array, without any manual operator commands. While this rescan process is occurring, any pending I/O requests may be temporarily delayed until it has completed.
In the first-generation ftScalable Storage array, a drive insertion could cause multiple I/O delays ranging from 4 to 7 seconds over a period of approximately 40 seconds. A drive removal typically results in two I/O delays of between 3 to 11 seconds over a period of roughly 15 seconds.
With the latest generation of the ftScalable Storage G2 array, I/O delays resulting from drive insertions or removals are now 3 seconds or less.
NOTE: These are results from Stratus internal lab testing under controlled conditions with the latest FW versions and with maximum physical disk configurations (3 enclosures per array with 36 drives for the first-generation ftScalable Storage array, or 72 drives for ftScalable Storage G2 array). Your actual results may vary depending on your specific configuration and workload.
There are several recommendations which can minimize the impacts from the I/O delays that occur during rescan processing could have on your latency-sensitive application.
- Configure at least one physical disk drive as a spare drive and use fault-tolerant RAID type VDISKs. By allocating a spare drive and using fault-tolerant RAID types, the ftScalable Storage array can automatically remove a marginal or failing physical disk from a VDISK and start the recovery process without requiring any drive insertion or removal, avoiding a rescan. You can replace the failing drive during a less critical period.
- If using non-fault-tolerant RAID type VDISKs (RAID-0, NRAID), SRA recommends creating an extra VDISK as a standby spare. You can use this standby VDISK as a replacement member disk and re-duplex using OpenVOS mirroring to provide redundancy. This allows you to replace the failing drive during a less critical period.
- Do not move physical drives to preserve specific enclosure slot positions after service operations. The ftScalable Storage array design does not require the physical drives for a VDISK to remain in the same enclosure slot positions as when allocated.
- Do not physically remove marginal or failing disk drives until a replacement is received and ready to be installed at the same time. By coordinating physical drive removals and insertions, you can minimize the number of times a rescan process occurs as multiple drive topology changes can occur within one rescan process.
Summary
The combination of the OpenVOS operating system with ftScalable Storage arrays gives you a robust, scalable and flexible storage environment to host your most critical applications. By understanding the benefits and drawbacks of the various RAID types, LUN topologies and configuration choices available, you can create an optimal storage layout to meet your business, performance and availability needs.