Difficulties in maintaining a resilient cyber ecosystem are hardly ever caused by single-dimensional issues. There are fundamental considerations that existed at the inception of computing technologies and persisted as they grew into the cyber ecosystem of today. These considerations are impossible to capture accurately because they cannot be bound; yet, it is important to understand them in order to have a comprehensive view of why cyber hard problems are persistently hard. In this section, the committee analyzes the overarching considerations as well as two key standalone drivers—engineering of resilient cyber systems and system complexity.
There are overarching considerations that affect each and every cyber hard problem described in this report. Many of their adverse effects were evident and relevant in 1995 and 2005, but new considerations have emerged or have been exacerbated due to scale, globalization, policy, and new technology. With increased usage, many technology problems are affected by human, cultural, political, or economic problems. These important influences were, perhaps, less apparent in earlier lists but are undeniable.
These considerations include the following:
There are fundamentally only two operative methodologies to assure resilience of a cyber system. However, both require a complete understanding of the behavior of the entire engineered cyber system.
Complete understanding of cyber systems is complicated by the fact that modern cyber systems are among the most complex artifacts created by humans, and current cyber systems are literally millions of times more complex than their predecessors 20 years ago, and too complex to model completely. Engineered systems depend, for predictability, on a profound understanding and specification of component subsystems, extensive
testing to guarantee conformance with specifications, and complete understanding of the interaction of these subsystems (as well as their human users) to assure safety or, at a minimum, provide actionable, concrete, specific alerts when things have malfunctioned or are about to malfunction. Systems engineered for critical applications also rely on resilience, which is often provided by redundant independent functioning of sub-functions.
Cyber engineering as a discipline offers a few common ways to deal with complexity.
One mechanism is the isolation of each subsystem coupled with extensive interface specification, often called an application program interface, as well as a detailed understanding of the interaction of that subsystem with other subsystems. Indeed, most processors and platforms also incorporate isolation (enclave) technology to partition specific critical software subsystems from other software running on that platform. This is related to the long-standing software engineering principle of information hiding—and the hardware design principle of process separation. These systems also provide mechanisms for storing secrets for that software and allowing that software to cryptographically authenticate itself when interacting with other isolated components.
A modern implementation of this paradigm goes by the rubric confidential computing and incorporates cryptographic hardware roots of trust. These have already been applied to protect limited high-risk data such as storage and use of cryptographic keys or sensitive communications (e.g., the Signal messaging app). However, this level of partitioning and isolation is a rare exception in most cyber systems. Also, such confidentiality mechanisms are not without trade-offs; for example, an attack on the protocols can compromise availability (secrets cannot be decrypted) or integrity (the secrets can be corrupted).
Furthermore, security dogma insists that people and subsystems should exercise well-identified “segregated duties” and any action taken by such an (isolated) subsystem should enforce the “principle of least privilege,”1 meaning that the segregated subsystem should only take actions when the requesting principal (a software subsystem or a person) has been authenticated and the action conforms with the access control policies for that action, and entities should only have the access to do their jobs and no more. For example, a bank deposit should only be executed if the depositor to an account has been authenticated, and that depositor might not have the right to do transfers or withdrawals.2
However, some cybersecurity practices have visibly improved. Both software and hardware development systems (e.g., GitHub) now include mechanisms to track
___________________
1 J.H. Saltzer and M.D. Schroeder, 1975, “The Protection of Information in Computer Systems,” Proceedings of the IEEE 63(9):1278–1308.
2 This is a consequence of know your customer (KYC) rules developed to make money laundering more difficult.
provenance of contributions. However, more is needed to ensure the integrity of design and program information—the records need to be immutable, and few systems are designed to keep such a rigorous and robust audit trail.
Modern practice also provides for automated testing of components. Recent progress suggests that a comprehensively curated repository with strong provisioning verification, extensive automated tests, and change tracking dramatically shortens the development time, increases reliability, facilitates understanding, and helps rapid reliable upgrades, including upgrades to fix vulnerabilities. Additionally, many actions in support of secure development can also have the effect of enhancing engineering productivity, both in original development and in evolution. However, cyber systems developers may fail to test or verify the majority of functionality.
Many design systems incorporate automated tools to either satisfy compliance requirements or check for identified flaws. These include type-safe languages and associated compilers, automated design simulation and test, static analysis, fuzzing tools, conventional glitch analysis for hardware, and many other tools. Software systems are often developed with type-safe languages and specification-generated internal checks (e.g., for software fault isolation and control flow integrity) to prevent the exploitation of residual software flaws.
Modern complex systems incorporate extensive monitoring and logging as well as automated tools to analyze collected data to spot problems.
The state of formal verification has advanced and can provide principled protection. Historically, the protection is usually complete only for relatively small systems, but there is a growing population of larger-scale examples.3 More often, the focus is on more narrow critical properties, in which case greater degrees of scaling can be achieved. An example of this is type-safety in languages, such as Java, Rust, and Typescript, where critical properties “come for free” and, additionally, developers may not even be aware of the security benefits that come along with the increased productivity.
Cryptography has provided a firm underpinning for communication security.
These techniques offer important “partial” solutions to the cross-cutting resilient design problem. However, challenges remain.
___________________
3 B. Cook, 2024, “An Unexpected Discovery: Automated Reasoning Often Makes Systems More Efficient and Easier to Maintain,” Amazon Web Services Security Blog, October 17, https://aws.amazon.com/blogs/security/an-unexpected-discovery-automated-reasoning-often-makes-systems-more-efficient-and-easier-to-maintain.
Process assurance, including compliance regimes that include third-party verification of identified properties, can help a little but is usually an inadequate substitute for well-informed, engineered resilience based on direct modeling, analysis, and evidence. Process assurance is also often influenced by powerful stakeholders who can help shape rulemaking, placing small providers at a disadvantage in security perception and standards-based acquisition.
As a result, well-designed complex cyber systems often employ active remediation like rapid patching and reliable verifiable recovery as a resilient design crutch to compensate for future discovered flaws.
___________________
4 S. Sabin, 2024, “Open-Source Developers Face a Potential Social-Engineering Crisis,” Axios, April 19, https://www.axios.com/2024/04/19/open-source-software-social-engineering-hacks.
The previous section analyzed the resilience of small- or medium-scale cyber systems designed by a single supplier. However, the complexity of modern cyber systems, including having been integrated from multiple producers, introduces even greater challenges, including the following:
Complexity is increasingly a factor in limiting the extent to which it is possible to understand a cyber system or CPS. However, proprietary barriers also prevent examination of code, design, hardware, and operations.
Many products have been admirably assessed by normal “market” incentives and mechanism, not cyber. Cyber is not unique in experiencing market failures that prevent reasonable risk assessment and effective regulatory oversight. Indeed, there is an extensive study of this in the used-car market under the “lemon law doctrine” of economics.8 Information asymmetry in the cyber-market means users and regulators cannot effectively identify, reward, or punish better or worse products or services. The result is that “good” cyber products may only sell for the same price as “bad” products because the usual market price discrimination incentives are ineffective. Correspondingly, information asymmetry in the cybersecurity market leaves little competitive pressure for security, and good security products sell for the same price as bad security products.
Absent effective liability protection, which is usually precluded by license disclaimers, information asymmetry effectively bars a principled determination of end-customer risks. Investment in cybersecurity by technology and service providers suffers when security benefits, which are obscured and difficult to measure, are scored against new revenue that can be obtained by equivalent investment in new capabilities. As a result, end-customers have no effective way to value better security, and providers are both unable to compete, pre-sale, on the basis of the security they offer, and they are at risk, post-sale, from unquantifiable losses because of liability protections. This situation is further complicated by the presence of adaptive adversaries whose own research investments in new attacks can serve to devalue past investments in defense.
Attacks on most engineered systems in the past were mounted on targets that were specific, well identified, and were motivated by comprehensible, predictable risk–reward assessments by attackers. They also required proximity, physical access, and specific tooling, so the risk of detection and punishment was significant. Many such attacks
___________________
5 Committee on Oversight and Government Reform, 2016, “The OPM Data Breach: How the Government Jeopardized Our National Security for More Than a Generation,” https://oversight.house.gov/report/opm-data-breach-government-jeopardized-national-security-generation.
6 N. Narea, 2025, “Elon Musk’s Secretive Government IT Takeover, Explained,” Vox, February 5, https://www.vox.com/politics/398366/musk-doge-treasury-sba-opm-budget.
7 Insikt Group, 2025, “RedMike (Salt Typhoon) Exploits Vulnerable Cisco Devices of Global Telecommunications Providers,” Recorded Future, February 13, https://go.recordedfuture.com/hubfs/reports/cta-cn-2025-0213.pdf.
8 G.A. Akerlof, 1970, “The Market for ‘Lemons’: Quality Uncertainty and the Market Mechanism,” The Quarterly Journal of Economics 84(3):488–500, https://doi.org/10.2307/1879431.
could be thwarted by simple physical protection of high-value targets. None of these circumstantial mitigations are commonly helpful in preventing cyberattacks, especially with the proliferation of systems that are reachable through radio signals, or through vulnerable WiFi access points belonging to a remotely vulnerable close-by entity (sometimes described as a “near-neighbor attack”).9
Attackers are often shielded from scrutiny because of the expense of investigations and, sometimes unachievable, standards of proof preventing or limiting attribution or sanction. Moreover, the existence of active attacks over a network by entities (including nation states) throughout the world makes deterrence based on legal mechanisms largely ineffective. The attackers can also plant “false flags” via use of certain tools and techniques that point investigators to a different known threat actor to avoid attribution.
A further complication is the diversity of attack surfaces—the infection vectors through which attackers and their tools engage a system. Modern AI systems, for example, can be attacked during training, while in operation, and in their delivery for use as a component in larger AI-based systems. In addition to diversity, there is also the issue of scalability of attacks. Consider, for example, the possibility of a software update that includes an adverse payload but is correctly signed by the vendor due to the theft or abuse of the code signing certificate. The scale of delivery could, for example, affect an entire fleet of vehicles or embedded CPS devices used across an entire sector.
These difficulties play a starring role in all the identified hard problems.
___________________
9 S. Koessel, S. Adair, and T. Lancaster, 2024, “The Nearest Neighbor Attack: How a Russian APT Weaponized Nearby Wi-Fi Networks for Covert Access,” Volexity, November 22, https://www.volexity.com/blog/2024/11/22/the-nearest-neighbor-attack-how-a-russian-apt-weaponized-nearby-wi-fi-networks-for-covert-access.