We describe below a set of exploratory research project frameworks we intend to bootstrap the center with, which will frame the initial activity, and will welcome evolution and enrichment as the activities unfold and new tasks and workpackages are defined, under the initiative of contributing faculty PIs. Being research project frameworks, they have strategic, long-haul nature, and as such expected to be the cradle where shorter term projects and related activity will develop under each research line.
Robust and Adaptive Fault and Intrusion Tolerance
Introduction and background
The hybrid distributed systems model (a.k.a. “Wormholes” model)[ The Timely Computing Base Model and Architecture. Paulo Verissimo, António Casimiro. IEEE Tac’s on Computers, vol. 51, n. 8, Aug 2002 (online first as DI/FCUL TR 99–2, U. of Lisbon, Apr. 1999). See also “Travelling through Wormholes: a new look at Distributed Systems Models”. Paulo Verissimo. SIGACT News, vol. 37, no. 1, pages 66-81, 2006.] was introduced to overcome the conflict between the advantage of having large-scale and complex systems, and the virtual infeasibility of ensuring they meet strong enough reliability or synchrony assumptions to be useful to build dependable and secure --- e.g., fault and intrusion tolerant (FIT) --- systems. The model has allowed advances in the state of the art of theory and practice of distributed algorithms in several boundary areas, where impossibility results and conflicting goals had established dead-ends for classic, that is, homogeneous, distributed systems.
As a matter of fact, the need for hybridisation has been progressively recognised over the years, either explicitly or implicitly, both by standard manufacturers, with trusted execution environments (TEE: TPM (Trusted Platform Module), SGX (Software Guard Extensions), ARM TrustZone), or application specific hybrids coming from R&D (TTCB, A2M, USIG, CASH, TRINC, etc.). The model has shown its genericity, being a powerful tool to represent all these system variants enjoying the presence of hybrid components, even if conceived independently. This justifies that an increasing number of known authors have published works referring to or explicitly or implicitly following the hybrid distributed systems model in the past few years. In a recent (2019) informal literature survey, almost one hundred papers were identified on the topic. Its power, as a pillar for solving some of the enunciated challenges for the design of Resilient distributed Computing Systems, becomes obvious.
The next step given by several research groups was in the path of sustainability, work in reactive or proactive recovery, a.k.a. self-healing, i.e., in the context of replicated systems, a mechanism aimed at guaranteeing that a system always has enough replicas to defend itself from threats and thus prevent failure or compromise, a safety predicate later defined formally as Exhaustion Safety. Further research revealed the impossibility of exhaustion-safety in homogeneous asynchronous distributed systems (e.g., BFT). This obviously meant that further to fault and intrusion tolerance not being enough, even systems having self-healing provisions might fail to secure the desired resilience and sustainability guarantees. Further research based on architectural hybridisation has provided first solutions to this serious problem, circumventing the impossibility result.
Investigation of robust, adaptive and sustainable fault and intrusion tolerant (FIT) algorithms, models and architectures, leveraging architectural hybridisation and self-healing.
In this exploratory project framework, we intend to address some main challenges remaining w.r.t. the current s.o.t.a. The potential areas of application are manifold, and we intend to put this systems research in context with them, through proof-of-concept prototypes demonstrating results.
First, the survival to extreme levels of continued threat, through adaptability, and the meeting of timing specifications with acceptable performance, reconciled with Byzantine resilience. Rethinking BFT-SMR protocol construction for distributed systems in general, in relation to reconfiguration and morphing capabilities, can open the door to climbing the next step in the resilience ladder, seamless adaptation.
Second, in the area of cyber-physical or IoT systems with R/T communication networks under timing uncertainties and malicious attacks: investigation of Byzantine resilient communication and consensus, like reliable and atomic broadcast as well as state-machine replication protocols, capable of faithful perception of the environment despite threats, for CPS/IoT objects, such as autonomous car ecosystems.
Third, a more long-range set of challenges consists in the integration of machine-learning and artificial intelligence (ML/AI) techniques, under the perspective of automating and enhancing the robustness and resilience of FIT systems.
Ultra-Reliable Micro Trusted Execution Environments
Introduction and background
Previous research in the area of trustworthy embedded components, has been focusing on computing bases, software or hardware, as hardened subsystem architectures, code bases, or chips, ultra-resilient from a fault/intrusion prevention viewpoint, which can be used on their own as TCBs (trusted computing bases) in target systems.
Though trusted components have been around for decades, architectural hybridization has brought a richer perspective on the use of trusted components, endowing them with important system roles in the resulting modular or distributed system models. As followed by several authors, in the context of architectural hybridization these components are seen as trusted-trustworthy “hybrids” (‘better’ parts of the architecture with very well defined functions), which play a crucial role as subsystems systematically assisting hybridization-aware algorithms and protocols. That is, whilst it is admitted that most of the system (call it “payload”) obeys weaker (but more realistic) assumptions, the hybrids must live to their reputation of ultra-reliable. In that sense, besides the standard implementations already mentioned (TPM, SGX, ARM TrustZone), over the years progressively more robust implementations became possible in an application-specific way. Since the inception of the architectural hybridization concept in 1999, systems have gone from software-based hybrids, e.g., using kernel separation, or hypervisor managers such as XEN Dom0, to more robust hardware-based hybrids. Here the evolution has been startling, from small PCI or USB appliances, through FPGAs.
Design challenges for next-generation ultra-reliable micro trusted execution environments (TEE), roots-of-trust and enclaves, leveraging HW assistance at several levels (including SoC, FPGA). Domains of application include, but are not limited to, Internet/Cloud, control, financial, e-gov and e-admin, biomedical.
In this exploratory project framework, we intend to address some main challenges remaining w.r.t. the current s.o.t.a. perspective given above.
First, to keep lowering the level of abstraction at which hybrids are implemented and used, since the threat surface has also been lowering (see e.g., the example of Spectre or Meltdown, and hardware side-channel attacks in general). This suggests the investigation of reference fault and intrusion tolerant mechanisms based on tiles in many-core SoCs, and hybrids designed at that level, with the reconciliation of performance with reliability and security, and no compromises with a 0-defect goal.
A second, longer-range challenge consist in tackling problems that have not deserved the right attention, in the optimistic landscape of the promising uses of ML/AI techniques: how do we enforce ethics on computer agents (which may be rebooted, flashed, morphed, etc, at will) in the current architectures? How do we achieve “explainable AI” just by working on trust at the data plane level, what about the trustworthiness of the system support plane? Architectural hybridisation may provide a path to address these, through the right ultra-resilient hybrid guardians.
Privacy and Integrity-preserving Data Processing
Introduction and background
This line concerns the analysis of the problems of data privacy and integrity in highly sensitive sectors for citizens and organisations. We are interested in enabling techniques to support efficient and effective data processing, namely through the marriage of data-level and system-level constructs through infrastructure-awareness (a.k.a. system-awareness considered on a large scale). We intend to leverage such results in two application sub-areas as a starting point: biomedical data privacy and integrity; and blockchain/cryptocurrency resilience and privacy. Further research work is foreseen to extend to other verticals.
Investigation of infrastructure-aware data storage and processing algorithms and protocols is an important facet. In this line of thought, the approach advocated for the data plane of resilient distributed systems (RdC) systems is for data processing to be aware of the properties of the underlying system. Most data processing subsystems over distributed systems are agnostic of the underlying support, whereas advantages can be taken if the algorithms are systems- or topology-aware. This has been endorsed in specialized systems,in highly homogeneous and symmetric architectures, but the same principle can be applied in general distributed systems. Examples exist in communication in large-scale distributed systems, particular layouts of genomic processing federations in digital health, or for securing the data plane of software defined networking (SDN) fabrics.
Biomedical data privacy and integrity with high-performance and sharing, presents main challenges such as the reconciliation of protection with the ease of sharing and performance of use. We plan to investigate e-health architectures as ecosystems that allow reconciling data sharing with high levels of privacy, and algorithms powering those architectures. First results showed the importance of privacy-preserving early-filtering, i.e., identifying and filtering sensitive genomic sequences immediately they are generated (i.e. at the mouth of the NGS machine) to segregate and protect them before it is too late. Building on those initial steps, hardware-assisted hybrid algorithms for high-yield DNA alignment for incomplete genomes (after privacy-preserving digital excision of sensitive nucleotides) showed promising results.
Blockchain/cryptocurrency resilience and privacy has today main challenges coming from fundamental problems that have been haunting blockchain technologies, in the way of their resilience to attacks on integrity and privacy. We plan to address them in our workplan, and examples of techniques of election we plan to investigate are: adaptive Byzantine fault tolerance, post-compromise security, and resilience techniques. We plan to draw on previous works that have successfully investigated techniques to improve existing PoW (proof-of-work) based blockchain systems, neutralising attacks from powerful adversaries, leveraging Byzantine fault tolerance algorithms to prevent uncertainties, either constraining the formation of colluding cliques that control the network, or guaranteeing the correct operation even when the former temporarily dominate the network (stochastic robustness).
Privacy and Integrity-preserving data processing (e.g., biomedical, fintech, social, etc.), leveraging system-awareness to combine decentralisation, high-performance and scalability.
In this exploratory project framework, we plan to address several sets of challenges, starting with mechanisms and architectural concepts following the infrastructure-awareness paradigm. We further plan to investigate innovative algorithms and data processing mechanisms capable of enjoying infrastructure-awareness, and we will start by biomedical data privacy and integrity, and blockchain/cryptocurrency resilience and privacy. Further research work is foreseen to extend to other verticals.
First, under infrastructure awareness, we plan to investigate hybridisation-based data processing architectures leveraging the (future) existence of hybrid subsystems in the two areas of application, biomedical, and blockchain. We believe that a quantum leap my happen in both federated biomedical, and in blockchain/cryptocurrency ecosystems, should they be provided with the adequate (reliable and performing) infrastructure support, instead of relying solely on data-level functionality agnostic of the system support.
Second, the plan for biomedical data privacy and integrity concerns the further development of the ideas explained in the introduction, for the remaining stages of the biomedical workflow cycle, e.g. genome-wide and other population association studies, which have a growing societal interest.
Third, we plan to tackle some remaining challenges in blockchain/cryptocurrency resilience and privacy, which lie on reducing the cost of PoW approach, perhaps through other paradigms such as proof-of-stake (PoS), as well as improving ultimate performance, by sharding the blockchain flow, potentially multiplying the sheer single-thread performance. This line of work has been deserving quite some interest lately.
Next-generation Threat and Intrusion Detection / Prevention Systems
Introduction and background
Cybersecurity has classically been using as main paradigm, prevention and ad-hoc detection of intrusions caused by attacks on system vulnerabilities. In a nutshell, by using techniques like vulnerability prevention by rigorous code design, vulnerability removal by verification and testing, attack prevention by blocking malicious attempts on system interfaces, or even attack removal by taking measures to discontinue ongoing attacks. The whole of the above-mentioned techniques prefigures what we call intrusion prevention, i.e. the attempt to avoid the occurrence of intrusion faults, which would lead to system failure.
Given the infeasibility of guaranteeing it in absolute terms, the cybersecurity B.o.K. incorporated intrusion detection techniques, the so-called intrusion-detection systems (IDS), detecting a mix of mere attacks and activated vulnerabilities producing system errors. The so-called intrusion-prevention systems (IPS) try to bring the process further, proposing automated remedies upon detection of intrusions. These techniques, though necessary, have revealed shortcomings, as threats and threat agents have evolved. Some key problems have been the subject of improvements in the field: the precision and recall of detection mechanisms; speed of detection actions; and effectiveness of mitigation decisions.
There is potential though for improvement, by bridging between fault tolerance and security, in the significant body of research on distributed computing architectures, methodologies and algorithms touching both domains. Several works over the past years have illustrated this concept, from highly distributed honeypot systems, to sensor correlation from several (“replicated”) IDS, or use of distributed streaming engines. More recently, some works have been improving threat and intrusion detection/prevention system (IDPS), though mostly under an ad-hoc solution perspective.
Investigation of next-generation threat and intrusion detection/prevention system (IDPS) paradigms, leveraging ML/AI-assisted validation, detection and mitigation.
In this exploratory project framework, we intend to address some shortcomings w.r.t. the current s.o.t.a. of classic intrusion detection and prevention systems, by bringing in concepts from distributed systems, fault tolerance, and ML/AI.
First, we plan to improve detection mechanisms, for example in network IDS, by investigating hybrid architectures whereby the detection precision and recall can be improved by algorithms leveraging information exchange through distributed hybrids.
Second, we plan to investigate innovative IDPS algorithms which can achieve robust automation, by relying on principled F/T mechanisms of error processing, such as forward error recovery, moving target defense, leveraging ML/AI assisted assumption and threat plane validation.
Third, a longer-range challenge consists in an interesting potential use of the forthcoming ML/AI techniques to analyse and process data and code bases in response to vulnerability diagnosis from IDS, in order to synthesize continuous component diversification, a means to maintain the attacker work factor in face of persistent threats, a s.o.t.a. problem today.
High-confidence Vertical Software Verification
Introduction and background
Formal specification and verification are important pillars of any strategy seeking strong guarantees of correctness. Other forms of verification such as testing are relevant tools as well, achieving approximate but readily obtainable results, but allowing additional metrics such as quantitative performance assessment. Formal verification techniques such as interactive theorem proving (ITP) are amongst the strongest processes available. A systems perspective advocate as well approaches that leverage existing tooling (e.g., VST, Coq ecosystem, CompCert) to produce vertical verification (from the high-level specification down to the machine code) of the fundamental pieces of the SW suites of interest.
Important challenges have been addressed in the recent years. Namely, the enhancement of the Coq-based framework to reason about Byzantine (or malicious) faults. Notably, the development of a framework for mechanical proofs of properties of BFT (Byzantine Fault-Tolerant) protocols, tested on the fundamental properties of the PBFT implementation by Castro & Liskov, like safety ones. That work was later extended to develop a new Coq-based framework to implement and reason about hybrid fault tolerant protocols, verifying critical safety properties of the hybrid BFT protocol minBFT.
Investigation of innovative frameworks providing high-confidence vertical verification (from spec to code) of mid-size software against arbitrary faults (accidental or malicious). Extension to alternative methods, such as testing, allowing insights on metrics such as performance and QoS.
In this exploratory project framework, some main challenges remain in proof-assistant based verification of FIT protocols, homogeneous or hybrid, such as dealing with liveness and more importantly, timeliness (fundamental in R/T systems), as well as to support reasoning about rejuvenation and group membership.