Definition: Fault tolerance is an IT non-functional requirement that refers to the ability of a system to continue functioning even in the event of hardware or software failures. This means that the system should be able to detect and recover from errors or failures without causing any disruption to the user experience. Fault tolerance is important for critical systems that cannot afford to fail, such as those used in healthcare, finance, and transportation. It involves redundancy, backup systems, and failover mechanisms to ensure that the system remains operational even in the face of unexpected events.
Source: TOGAF
Source reference: https://pubs.opengroup.org/architecture/togaf9-doc/arch/chap03.html
Additional information: According to the TOGAF specification, fault tolerance is a non-functional requirement that refers to the ability of a system to continue operating in the event of a failure or error. This means that the system should be able to detect and recover from faults or errors without causing any disruption to the overall performance or availability of the system.
To achieve fault tolerance, the system should be designed with redundancy and failover mechanisms that can automatically switch to backup components or systems in the event of a failure. This can include redundant hardware components, such as servers, storage devices, and network connections, as well as software components, such as load balancers, clustering, and replication.
In addition, the system should be able to detect faults and errors in real-time, using monitoring and alerting mechanisms that can notify system administrators or operators of any issues. This can include automated monitoring tools that can detect performance issues, security breaches, or other types of errors that can impact the system's availability or performance.
Overall, fault tolerance is a critical non-functional requirement for any system that requires high availability and reliability, such as mission-critical applications, financial systems, or healthcare systems. By designing systems with fault tolerance in mind, organizations can ensure that their systems can continue operating even in the face of unexpected failures or errors, minimizing downtime and ensuring business continuity.
Example: One example of the IT non-functional requirement 'Fault tolerance' is in the aviation industry. The flight control systems of an aircraft must be designed to be fault-tolerant, meaning that they can continue to operate even if one or more components fail. This is critical for ensuring the safety of passengers and crew in the event of a system failure. The system must be able to detect and isolate faults, and then continue to operate using redundant components or backup systems.
LOST view: Digital Solution Non-Functional Requirements Catalogue view
Identifier: http://data.europa.eu/dr8/egovera/FaultToleranceRequirement
EIRA traceability: eira:DigitalSolutionNonFunctionalRequirementRequirement
ABB name: egovera:FaultToleranceRequirement
EIRA concept: eira:ArchitectureBuildingBlock
Last modification: 2023-05-16
dct:identifier: http://data.europa.eu/dr8/egovera/FaulttoleranceRequirement
dct:title: Fault tolerance Non-Functional Requirement
|
|
dct:modified | 2024-01-28 |
dct:identifier | http://data.europa.eu/dr8/egovera/FaulttoleranceRequirement |
dct:title | Fault tolerance Non-Functional Requirement |
skos:example | One example of the IT non-functional requirement 'Fault tolerance' is in the aviation industry. The flight control systems of an aircraft must be designed to be fault-tolerant, meaning that they can continue to operate even if one or more components fail. This is critical for ensuring the safety of passengers and crew in the event of a system failure. The system must be able to detect and isolate faults, and then continue to operate using redundant components or backup systems. |
skos:definition | Fault tolerance is an IT non-functional requirement that refers to the ability of a system to continue functioning even in the event of hardware or software failures. This means that the system should be able to detect and recover from errors or failures without causing any disruption to the user experience. Fault tolerance is important for critical systems that cannot afford to fail, such as those used in healthcare, finance, and transportation. It involves redundancy, backup systems, and failover mechanisms to ensure that the system remains operational even in the face of unexpected events. |
eira:concept | eira:ArchitectureBuildingBlock |
eira:definitionSource | TOGAF |
eira:definitionSourceReference | https://pubs.opengroup.org/architecture/togaf9-doc/arch/chap03.html |
skos:note | According to the TOGAF specification, fault tolerance is a non-functional requirement that refers to the ability of a system to continue operating in the event of a failure or error. This means that the system should be able to detect and recover from faults or errors without causing any disruption to the overall performance or availability of the system.
To achieve fault tolerance, the system should be designed with redundancy and failover mechanisms that can automatically switch to backup components or systems in the event of a failure. This can include redundant hardware components, such as servers, storage devices, and network connections, as well as software components, such as load balancers, clustering, and replication.
In addition, the system should be able to detect faults and errors in real-time, using monitoring and alerting mechanisms that can notify system administrators or operators of any issues. This can include automated monitoring tools that can detect performance issues, security breaches, or other types of errors that can impact the system's availability or performance.
Overall, fault tolerance is a critical non-functional requirement for any system that requires high availability and reliability, such as mission-critical applications, financial systems, or healthcare systems. By designing systems with fault tolerance in mind, organizations can ensure that their systems can continue operating even in the face of unexpected failures or errors, minimizing downtime and ensuring business continuity. |
eira:PURI | http://data.europa.eu/dr8/FaultToleranceRequirement |
dct:type | eira:FaultToleranceRequirement |
eira:view | Digital Solution Non-Functional Requirements Catalogue view |
eira:eifLayer | N/A |
skos:broader | http://data.europa.eu/dr8/DigitalSolutionNonFunctionalRequirementRequirement |