Resiliency

Automation / Scripting

  • Reduces risk through repeatable processes and automated courses of action.

  • Leveraging sophisticated monitors and sensors / continuous monitoring.

  • Configuration Validation - ensures that new equipment has all the proper settings, applications, and drivers as existing equipment through automation and scripting.

  • Operating system scripting languages:

    • Linux Shells: Bash, Ksh

    • Windows: PowerShell

Master Image

  • AKA 'Gold' Image.

  • Creating a model OS verified as 'clean'.

  • Used for system restores.

  • Needs to be secured.

Managing Cloud Risk

  • Nonpersistence - Temporary system images. Snapshot of a known, good state.

  • Elasticity / Scalability - Adjusting resources as needed.

  • High Availability (HA) - The measures, such as redundancy, failover, and mirroring, used to keep services and systems operational.

  • Redundancy - Replicating systems usually at multiple sites. Associated with failover.

  • Distributive Allocation / Load Balancing - Distributing burden across multiple systems.

Fault Tolerance

  • The ability of a system to sustain operations in the event of a component failure.

  • Two key components: spare parts and electrical power.

  • Power Protection:

    • Surge Protection

    • Uninterruptible Power Supply (UPS)

    • Backup Power / Generators

RAID Storage

  • Redundant Array of Inexpensive Disks

  • Focuses on availability of data.

  • RAID Types

    • 0 - Disk Striping

    • 1 - Disk Mirroring

    • 3 - Disk Striping with a Parity Disk

    • 5 - Disk Striping with Parity