The terms “fail open”, “fail close”, “Fail Safe”, “Failover” describe how systems behave when they encounter a failure or unexpected event. These concepts are crucial to maintain either availability or ensure security depending on the requirements of the specific context.
Fail Open
A system that fails open will continue to allow access or operations to proceed even if it encounters a failure.
This ensure availability as it allows users to continue to access the system or network resources, minimizing downtime. At the same time it can be a security risk by allowing all access during a failure, it can potentially expose the system to unauthorized access and attacks (e.g. A firewall stops working).
Fail Close
A system that fails close will deny access or halt operations if it encounters a failure.
This improves security as it ensures that no unauthorized access can occur during a failure, protecting sensitive data and resources, and it prevents the system from being exploited during a period of vulnerability. At the same time it puts the availability at risk as it can lead to significant downtime and loss of access, which might be unacceptable in certain scenarios. (e.g. If a firewall fails it will block all the traffic)
Fail Safe
A systems that Fails Safe is designed to default to a safe condition when something fails. The primary goal is to prevent harm and minimize damage.
This ensures the system switches to a safe state during failure increasing safety, but it might stop operations temporarily. (e.g. if a payment transaction fails due to network issues, the system automatically cancels the transaction and refunds any charges)
Failover
A system that Failover switches to a backup system when the primary one fails, ensuring continuity of service with minimal disruption.
This ensures redundancy as it uses backup systems that can take over immediately giving a high availability to the services and provides business continuity, but it can be complex and costly as it requires additional resources and management and it can cause brief delays as switching to backup may cause slight delays (e.g. in a data center, if the primary server fails, a secondary server automatically takes over to keep services running).