Dropdown items
My Academies

Personal Library

Account settings

Cyber Hard Problems: Focused Steps Toward a Resilient Digital Future (2025)

Chapter: 2 Key Considerations for Cyber Resiliency

Visit NAP.edu/10766 to get more information about this book, to buy it in print, or to download it as a free PDF.

Previous chapter Next chapter
Page of 122
Search this publication

Page 19 Cite Bookmark

Suggested Citation: "2 Key Considerations for Cyber Resiliency." National Academies of Sciences, Engineering, and Medicine. 2025. Cyber Hard Problems: Focused Steps Toward a Resilient Digital Future. Washington, DC: The National Academies Press. doi: 10.17226/29056.

2
Key Considerations for Cyber Resiliency

Difficulties in maintaining a resilient cyber ecosystem are hardly ever caused by single-dimensional issues. There are fundamental considerations that existed at the inception of computing technologies and persisted as they grew into the cyber ecosystem of today. These considerations are impossible to capture accurately because they cannot be bound; yet, it is important to understand them in order to have a comprehensive view of why cyber hard problems are persistently hard. In this section, the committee analyzes the overarching considerations as well as two key standalone drivers—engineering of resilient cyber systems and system complexity.

OVERARCHING CONSIDERATIONS

There are overarching considerations that affect each and every cyber hard problem described in this report. Many of their adverse effects were evident and relevant in 1995 and 2005, but new considerations have emerged or have been exacerbated due to scale, globalization, policy, and new technology. With increased usage, many technology problems are affected by human, cultural, political, or economic problems. These important influences were, perhaps, less apparent in earlier lists but are undeniable.

These considerations include the following:

Computing technology is affected by global-scale economics. Scale affects personal data whose shared—and often uncontrolled—use is vast, and whose

Page 20 Cite Bookmark

impacts on privacy are increasing, sometimes in unexpected ways. Financial incentives for collecting personal information and broadly sharing, selling, and repurposing the information are large and growing.
The vastly increased complexity and interconnection of cyber systems makes it very difficult (and expensive) to implement resilient designs to prevent cyber failures. The complexity is not only due to the increased size and broad deployment of cyber systems but also to the entanglement and integration of disparate stakeholders, data providers, and technology providers within the scope of a single solution ecosystem. Modern services and apps rely on global multi-party ecosystems with interacting policies (e.g., privacy policies) that operate over many jurisdictions. This scaling and interconnection amplify the need to develop architectures that are resilient to compromises and failures in individual system elements.
Security metrics sufficient to predict or verify the security properties of a cyber system are non-existent. In the absence of metrics based on measurable external parameters of systems, risk assessment has often fallen to the level of penetration testing because establishing the correct context may be difficult. However, there are far too few skillful “pen testers,” and the quality of a test too often seems a measure of the skill and luck of the tester. Systems are so large that complete coverage can be too time consuming (and therefore expensive) to be practical. Additionally, many security attributes are contextual—so not amenable to traditional testing—but instead require analysis and inspection, including of design models.
The opacity and diversity of components in a single cyber system makes the challenges in risk assessment even more difficult. Dependencies among subsystems in a large system may create never-been-tested timing dependencies that only become evident when a disaster forces the entire collection of systems to be restarted for the first time in its life.
The increasingly complex cyber-physical systems (CPS) and their central role underpinning critical functions mean that such systems are harder to defend, and compromises can cause large, immediate real-world damage. Integration of sensors and actuators broaden the important surface area that must be assured for CPS, such as critical infrastructure, as well for other CPS that can cause real-world damage. As an example, sensors often have their values checked only periodically for exceptional conditions. Local attacks can be timed to cause irreversible damage before the next time a value is checked.
Complicated human–systems interaction and societal entanglement can confuse users, even in basic interactions, but follow-on consequences can be subtle

Page 21 Cite Bookmark

and important. A significant challenge for designers is that users—in a hurry to accomplish specific tasks or no longer vigilant against rare anomalies—cannot always be counted on to attend to security concerns.
The emergence of artificial intelligence (AI) (and statistical models) can increase capability; however, due to the inherent weaknesses and consequent vulnerabilities of neuro-network models, it is extremely difficult (in some cases, impossible) to validate decisions made by AI-enabled systems.
The emergence of new concentrated targets such as clouds, popular software systems, and mobile devices, whose failure could have extremely broad adverse effects and whose complexity and opacity defy principled risk evaluation and principled risk management.

CONSIDERATIONS FOR ENGINEERING RESILIENT CYBER SYSTEMS

There are fundamentally only two operative methodologies to assure resilience of a cyber system. However, both require a complete understanding of the behavior of the entire engineered cyber system.

A guarantee that a delivered system will operate in accordance with a well-tailored specification of security-relevant quality behavior, so that an outside observer can be assured that the operations for which the system is employed, if they meet the specification, are safe. Generally, cyber systems are not warranted to operate in accordance with an enforceable behavior; in fact, cyber systems usually come equipped with extensive liability disclaimers protecting the supplier from claims about any but the most obvious, egregious failures and even then, providing incomplete remedies.
The development of an accurate risk assessment of every component of a cyber system as well as their interactions. This requires a comprehensive understanding of the cyber system and the components and services that it encompasses, the environment in which it operates, its users, as well as the principles according to which they are composed into the overall system.

Complete understanding of cyber systems is complicated by the fact that modern cyber systems are among the most complex artifacts created by humans, and current cyber systems are literally millions of times more complex than their predecessors 20 years ago, and too complex to model completely. Engineered systems depend, for predictability, on a profound understanding and specification of component subsystems, extensive

Page 22 Cite Bookmark

testing to guarantee conformance with specifications, and complete understanding of the interaction of these subsystems (as well as their human users) to assure safety or, at a minimum, provide actionable, concrete, specific alerts when things have malfunctioned or are about to malfunction. Systems engineered for critical applications also rely on resilience, which is often provided by redundant independent functioning of sub-functions.

Cyber engineering as a discipline offers a few common ways to deal with complexity.

One mechanism is the isolation of each subsystem coupled with extensive interface specification, often called an application program interface, as well as a detailed understanding of the interaction of that subsystem with other subsystems. Indeed, most processors and platforms also incorporate isolation (enclave) technology to partition specific critical software subsystems from other software running on that platform. This is related to the long-standing software engineering principle of information hiding—and the hardware design principle of process separation. These systems also provide mechanisms for storing secrets for that software and allowing that software to cryptographically authenticate itself when interacting with other isolated components.

A modern implementation of this paradigm goes by the rubric confidential computing and incorporates cryptographic hardware roots of trust. These have already been applied to protect limited high-risk data such as storage and use of cryptographic keys or sensitive communications (e.g., the Signal messaging app). However, this level of partitioning and isolation is a rare exception in most cyber systems. Also, such confidentiality mechanisms are not without trade-offs; for example, an attack on the protocols can compromise availability (secrets cannot be decrypted) or integrity (the secrets can be corrupted).

Furthermore, security dogma insists that people and subsystems should exercise well-identified “segregated duties” and any action taken by such an (isolated) subsystem should enforce the “principle of least privilege,”¹ meaning that the segregated subsystem should only take actions when the requesting principal (a software subsystem or a person) has been authenticated and the action conforms with the access control policies for that action, and entities should only have the access to do their jobs and no more. For example, a bank deposit should only be executed if the depositor to an account has been authenticated, and that depositor might not have the right to do transfers or withdrawals.²

However, some cybersecurity practices have visibly improved. Both software and hardware development systems (e.g., GitHub) now include mechanisms to track

___________________

¹ J.H. Saltzer and M.D. Schroeder, 1975, “The Protection of Information in Computer Systems,” Proceedings of the IEEE 63(9):1278–1308.

² This is a consequence of know your customer (KYC) rules developed to make money laundering more difficult.

Page 23 Cite Bookmark

provenance of contributions. However, more is needed to ensure the integrity of design and program information—the records need to be immutable, and few systems are designed to keep such a rigorous and robust audit trail.

Modern practice also provides for automated testing of components. Recent progress suggests that a comprehensively curated repository with strong provisioning verification, extensive automated tests, and change tracking dramatically shortens the development time, increases reliability, facilitates understanding, and helps rapid reliable upgrades, including upgrades to fix vulnerabilities. Additionally, many actions in support of secure development can also have the effect of enhancing engineering productivity, both in original development and in evolution. However, cyber systems developers may fail to test or verify the majority of functionality.

Many design systems incorporate automated tools to either satisfy compliance requirements or check for identified flaws. These include type-safe languages and associated compilers, automated design simulation and test, static analysis, fuzzing tools, conventional glitch analysis for hardware, and many other tools. Software systems are often developed with type-safe languages and specification-generated internal checks (e.g., for software fault isolation and control flow integrity) to prevent the exploitation of residual software flaws.

Modern complex systems incorporate extensive monitoring and logging as well as automated tools to analyze collected data to spot problems.

The state of formal verification has advanced and can provide principled protection. Historically, the protection is usually complete only for relatively small systems, but there is a growing population of larger-scale examples.³ More often, the focus is on more narrow critical properties, in which case greater degrees of scaling can be achieved. An example of this is type-safety in languages, such as Java, Rust, and Typescript, where critical properties “come for free” and, additionally, developers may not even be aware of the security benefits that come along with the increased productivity.

Cryptography has provided a firm underpinning for communication security.

These techniques offer important “partial” solutions to the cross-cutting resilient design problem. However, challenges remain.

Side-channel attacks are very difficult to identify and often impossible to avoid. Side channels arise from unspecified, but observable, implementation characteristics (like power consumption or timing) that inadvertently disclose information that was thought to be protected.

___________________

³ B. Cook, 2024, “An Unexpected Discovery: Automated Reasoning Often Makes Systems More Efficient and Easier to Maintain,” Amazon Web Services Security Blog, October 17, https://aws.amazon.com/blogs/security/an-unexpected-discovery-automated-reasoning-often-makes-systems-more-efficient-and-easier-to-maintain.

Page 24 Cite Bookmark

Isolation and partitioning can offer protection for a relatively small system with well understood and carefully designed components but does not provide complete protection for complex systems with only partially understood or cursorily verified operations. Strict adherence to security-focused architecture designs can avert problems, but the required rigor is seldom employed.
Systems are often built in an ad hoc way from many existing and widely used subsystems, which are themselves poorly understood. The most visible example is the incorporation of a vast library of open-source components. The web-focused open-source NPM (node package manager) library, for example, has more than 3 million packages available to web developers, along with various package management capabilities. Another common example is that, in the interest of economy, systems are frequently assembled using legacy subsystems that were not even built with then existing security standards.
The way software contributions enter open-source projects varies widely. Attempts using social engineering to add flawed, exploitable code to a widely used open-source code brought additional attention to the difficulties of assurance in open-source projects.⁴
Even well-designed systems depend entirely on operational configuration and policy management to operate properly; these systems and procedures are themselves complex and often error prone, and especially difficult to debug.
Finally, modern cyber systems are heterogeneous, and a complete analysis relies on understanding the design and function of hardware, software, sensors and actuators, communications infrastructure, a bit of mathematics, and any relied-on external services.

Process assurance, including compliance regimes that include third-party verification of identified properties, can help a little but is usually an inadequate substitute for well-informed, engineered resilience based on direct modeling, analysis, and evidence. Process assurance is also often influenced by powerful stakeholders who can help shape rulemaking, placing small providers at a disadvantage in security perception and standards-based acquisition.

As a result, well-designed complex cyber systems often employ active remediation like rapid patching and reliable verifiable recovery as a resilient design crutch to compensate for future discovered flaws.

___________________

⁴ S. Sabin, 2024, “Open-Source Developers Face a Potential Social-Engineering Crisis,” Axios, April 19, https://www.axios.com/2024/04/19/open-source-software-social-engineering-hacks.

Page 25 Cite Bookmark

CONSIDERATIONS FOR COMPLEXITY

The previous section analyzed the resilience of small- or medium-scale cyber systems designed by a single supplier. However, the complexity of modern cyber systems, including having been integrated from multiple producers, introduces even greater challenges, including the following:

Verification of the function and qualities of acquired subsystem components, including open source.
The global hardware supply chain, which includes widely sourced complex components that can include deliberately inserted vulnerabilities.
The technical, economic, political, and cultural dynamics that make it difficult, if not impossible, for individual organizations to secure themselves effectively when they are dependent on diverse, complex integrated systems.
Relocated risk associated with deploying systems in one of a few large cloud providers whose infrastructure, operations, and even legal jurisdiction is undisclosed. Although most cloud providers offer safer infrastructure than less well-trained and well-practiced users provide for themselves—and they have strong business incentives to deliver high levels of security—they can still be single points of failure that become focused targets of malign nation states and rogue law enforcement and are well shielded from consequential liability by contract and law.
The sheer effect of scale that provides capability built on vast compositions of and interdependencies among complex systems.
The lack of clear metrics, monitoring, and operational procedures to accurately reflect residual risk.
Resilience and recovery (returning to a known-good state or, more precisely, to a well-understood state) from catastrophes is greatly complicated by the complexity and scale of modern cyber systems. This includes data and system configuration as well as hardware and software components. This is partly a technical problem but also part of the operational challenge discussed above.
Automated audit and forensics to find sources and effects of compromise are not widely available or widely used.
Development of standards for infrastructure that enable safety and assessment is not prescriptive enough to determine liability for a failure.
Data provenance integrity is seldom well implemented.
Protection of critical data (like PII) is not reliable against attacks from skillful threats, foreign and domestic, outsiders and insiders. Consider the attacks

Page 26 Cite Bookmark

against the U.S. Office of Personnel Management^5,6 and on unpatched vulnerable infrastructure particularly intended for legal interception of telephone calls at multiple telephone carriers internationally (“Salt Typhoon” or “RedMike”).⁷

Complexity is increasingly a factor in limiting the extent to which it is possible to understand a cyber system or CPS. However, proprietary barriers also prevent examination of code, design, hardware, and operations.

Many products have been admirably assessed by normal “market” incentives and mechanism, not cyber. Cyber is not unique in experiencing market failures that prevent reasonable risk assessment and effective regulatory oversight. Indeed, there is an extensive study of this in the used-car market under the “lemon law doctrine” of economics.⁸ Information asymmetry in the cyber-market means users and regulators cannot effectively identify, reward, or punish better or worse products or services. The result is that “good” cyber products may only sell for the same price as “bad” products because the usual market price discrimination incentives are ineffective. Correspondingly, information asymmetry in the cybersecurity market leaves little competitive pressure for security, and good security products sell for the same price as bad security products.

Absent effective liability protection, which is usually precluded by license disclaimers, information asymmetry effectively bars a principled determination of end-customer risks. Investment in cybersecurity by technology and service providers suffers when security benefits, which are obscured and difficult to measure, are scored against new revenue that can be obtained by equivalent investment in new capabilities. As a result, end-customers have no effective way to value better security, and providers are both unable to compete, pre-sale, on the basis of the security they offer, and they are at risk, post-sale, from unquantifiable losses because of liability protections. This situation is further complicated by the presence of adaptive adversaries whose own research investments in new attacks can serve to devalue past investments in defense.

Attacks on most engineered systems in the past were mounted on targets that were specific, well identified, and were motivated by comprehensible, predictable risk–reward assessments by attackers. They also required proximity, physical access, and specific tooling, so the risk of detection and punishment was significant. Many such attacks

___________________

⁵ Committee on Oversight and Government Reform, 2016, “The OPM Data Breach: How the Government Jeopardized Our National Security for More Than a Generation,” https://oversight.house.gov/report/opm-data-breach-government-jeopardized-national-security-generation.

⁶ N. Narea, 2025, “Elon Musk’s Secretive Government IT Takeover, Explained,” Vox, February 5, https://www.vox.com/politics/398366/musk-doge-treasury-sba-opm-budget.

⁷ Insikt Group, 2025, “RedMike (Salt Typhoon) Exploits Vulnerable Cisco Devices of Global Telecommunications Providers,” Recorded Future, February 13, https://go.recordedfuture.com/hubfs/reports/cta-cn-2025-0213.pdf.

⁸ G.A. Akerlof, 1970, “The Market for ‘Lemons’: Quality Uncertainty and the Market Mechanism,” The Quarterly Journal of Economics 84(3):488–500, https://doi.org/10.2307/1879431.

Page 27 Cite Bookmark

could be thwarted by simple physical protection of high-value targets. None of these circumstantial mitigations are commonly helpful in preventing cyberattacks, especially with the proliferation of systems that are reachable through radio signals, or through vulnerable WiFi access points belonging to a remotely vulnerable close-by entity (sometimes described as a “near-neighbor attack”).⁹

Attackers are often shielded from scrutiny because of the expense of investigations and, sometimes unachievable, standards of proof preventing or limiting attribution or sanction. Moreover, the existence of active attacks over a network by entities (including nation states) throughout the world makes deterrence based on legal mechanisms largely ineffective. The attackers can also plant “false flags” via use of certain tools and techniques that point investigators to a different known threat actor to avoid attribution.

A further complication is the diversity of attack surfaces—the infection vectors through which attackers and their tools engage a system. Modern AI systems, for example, can be attacked during training, while in operation, and in their delivery for use as a component in larger AI-based systems. In addition to diversity, there is also the issue of scalability of attacks. Consider, for example, the possibility of a software update that includes an adverse payload but is correctly signed by the vendor due to the theft or abuse of the code signing certificate. The scale of delivery could, for example, affect an entire fleet of vehicles or embedded CPS devices used across an entire sector.

These difficulties play a starring role in all the identified hard problems.

___________________

⁹ S. Koessel, S. Adair, and T. Lancaster, 2024, “The Nearest Neighbor Attack: How a Russian APT Weaponized Nearby Wi-Fi Networks for Covert Access,” Volexity, November 22, https://www.volexity.com/blog/2024/11/22/the-nearest-neighbor-attack-how-a-russian-apt-weaponized-nearby-wi-fi-networks-for-covert-access.