Ian Glen Neal | Publications

Dissertation

2023

Automating the Detection and Correction of Failures in Modern Persistent Memory Systems

Ian Neal

July 2023

Abs PDF Url

Modern software systems are deeply embedded into our daily lives; the failures of these systems can therefore result in massive real-world harm. Consequently, considerable resources are spent finding and fixing bugs in testing. Overall, the software industry spends billions of dollars each year on fixing bugs, and ultimately loses trillions of dollars each year due to poor software quality (as a result of bugs that escape testing and wreak havoc once deployed). One particularly challenging domain of software development for developers is the area of Persistent Memory (PM) programming, an abstraction where developers write software that accesses and updates long-term storage with direct memory operations. The PM programming abstraction has become popular in recent years due to new hardware advances in low-latency, byte-addressable storage devices. Unfortunately, writing crash-consistent PM applications is challenging, as untimely program crashes can result in data corruption and loss if the application does not carefully order updates to PM, and testing all possible crashes for data consistency is intractable. Furthermore, crash-consistency bugs are difficult to manually debug and repair, taking weeks or months for a developer to correctly fix. Without advancements in PM testing and program repair tools, developers will be unable to effectively write correct and efficient applications for modern PM platforms, hampering the ease of their adoption. Motivated by these PM software development challenges, this dissertation explores research in developing software techniques that automate difficult and time-consuming PM development tasks. We study PM system design, bugs, and bugs fixes and observe that we can automatically provide scalable and high-coverage bug detection and correction by approximating the reasoning performed by developers as they develop their applications. Based on this insight, we first explore automated bug detection and correction for PM application bugs caused by the misuse of platform-specific PM primitives. We develop a testing technique that prioritizes testing program paths that heavily modify PM, as these paths are more likely to misuse PM. We implement this technique in AGAMOTTO, a symbolic-execution tool that thoroughly explores PM applications to uncover platform-specific bugs, which we use to find 84 new bugs while incurring no false positives. We then develop a technique for generating fixes for PM platform-specific bugs that are provably correct, coupled with heuristic performance optimizations that do not compromise correctness, and implement the technique in a compiler tool, HIPPOCRATES. Second, this dissertation explores automated bug detection for general crash-consistency bugs in PM applications (i.e., bugs caused by the improper ordering of PM updates). We develop a technique that automatically identifies groups of PM program behaviors that are likely to result in the same crash-consistency bugs and only tests one behavior out of the group, thus providing high testing accuracy (by testing all types of behaviors thoroughly) while also increasing efficiency (by eliminating redundant testing on functionally-similar behaviors). We implement this technique in SQUINT, a model-checking tool that selectively tests groups of PM program behaviors identified from a dynamic program trace, which we use to find 108 PM crash-consistency bugs. The works presented in this dissertation provide a holistic automated testing and program repair solution for PM software developers. In sum, these tools have been used to find and fix over two hundred PM bugs in real-world PM systems, demonstrating both the need for such tools and the efficacy of the tools presented in this dissertation.

Peer-Reviewed Publications

2021

USENIX Sec’21

DOLMA: Securing Speculation with the Principle of Transient Non-Observability

Kevin Loughlin, Ian Neal, Jiacheng Ma, Elisa Tsai, Ofir Weisse, Satish Narayanasamy, and Baris Kasikci

In 30th USENIX Security Symposium (USENIX Security 21)
August 2021

Abs PDF Url

Modern processors allow attackers to leak data during transient (i.e., mis-speculated) execution through microarchitectural covert timing channels. While initial defenses were channel-specific, recent solutions employ speculative information flow control in an attempt to automatically mitigate attacks via any channel. However, we demonstrate that the current state-of-the-art defense fails to mitigate attacks using speculative stores, still allowing arbitrary data leakage during transient execution. Furthermore, we show that the state of the art does not scale to protect data in registers, incurring 30.8–63.4% overhead on SPEC 2017, depending on the threat model. We then present DOLMA, the first defense to automatically provide comprehensive protection against all known transient execution attacks. DOLMA combines a lightweight speculative information flow control scheme with a set of secure performance optimizations. By enforcing a novel principle of transient non-observability, DOLMA ensures that a time slice on a core provides a unit of isolation in the context of existing attacks. Accordingly, DOLMA can allow speculative TLB/L1 cache accesses and variable-time arithmetic without loss of security. On SPEC 2017, DOLMA achieves comprehensive protection of data in memory at 10.2–29.7% overhead, adding protection for data in registers at 22.6–42.2% overhead (8.2–21.2% less than the state of the art, with greater security).
OSDI’21

DMon: Efficient Detection and Correction of Data Locality Problems using Selective Profiling

Tanvir Ahmed Khan, Ian Neal, Gilles Pokam, Barzan Mozafari, and Baris Kasikci

In 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21)
July 2021

Abs PDF Url

Poor data locality hurts an application’s performance. While compiler-based techniques have been proposed to improve data locality, they depend on heuristics, which can sometimes hurt performance. Therefore, developers typically find data locality issues via dynamic profiling and repair them manually. Alas, existing profiling techniques incur high overhead and cannot be deployed in production, where programs may exhibit previously-unseen performance problems. We present selective profiling, a technique that locates data locality problems with low-enough overhead that is suitable for production use. To achieve low overhead, selective profiling gathers runtime execution information selectively and incrementally. Using selective profiling, we build DMon, a system that can automatically locate data locality problems in production, identify access patterns that hurt locality, and repair such patterns using targeted optimizations. Thanks to selective profiling, DMon’s profiling overhead is 1.36% on average, making it feasible for production use. DMon’s targeted optimizations provide 16.83% speedup on average (up to 53.14%), compared to a baseline that uses the highest level of compiler optimization. DMon speeds up PostgreSQL, one of the most popular database systems, by 6.64% on average (up to 17.48%).
ASPLOS’21

HIPPOCRATES: Healing Persistent Memory Bugs Without Doing Any Harm

Ian Neal, Andrew Quinn, and Baris Kasikci

In Proceedings of the Twenty-Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)
April 2021

Abs Extended Abs (PDF) PDF Url

Persistent memory (PM) technologies aim to revolutionize storage systems, providing persistent storage at near-DRAM speeds. Alas, programming PM systems is error-prone, as the misuse or omission of the durability mechanisms (i.e., cache flushes and memory fences) can lead to durability bugs (i.e., unflushed updates in CPU caches that violate crash consistency). PM-specific testing and debugging tools can help developers find these bugs, however even with such tools, fixing durability bugs can be challenging. To determine the reason behind this difficulty, we first study durability bugs and find that although the solution to a durability bug seems simple, the actual reasoning behind the fix can be complicated and timeconsuming. Overall, the severity of these bugs coupled with the difficultly of developing fixes for them motivates us to consider automated approaches to fixing durability bugs. We introduce Hippocrates, a system that automatically fixes durability bugs in PM systems. Hippocrates automatically performs the complex reasoning behind durability bug fixes, relieving developers of time-consuming bug fixes. Hippocrates’s fixes are guaranteed to be safe, as they are guaranteed to not introduce new bugs (“do no harm”). We use Hippocrates to automatically fix 23 durability bugs in real-world and research systems. We show that Hippocrates produces fixes that are functionally equivalent to developer fixes. We then show that solely using Hippocrates’s fixes, we can create a PM port of Redis which has performance rivaling and exceeding the performance of a manually-developed PM-port of Redis.
FAST’21

Rethinking File Mapping for Persistent Memory

Ian Neal, Gefei Zuo, Eric Shiple, Tanvir Ahmed Khan, Youngjin Kwon, Simon Peter, and Baris Kasikci

In 19th USENIX Conference on File and Storage Technologies (FAST 21)
February 2021

Abs PDF Slides Talk Url

Persistent main memory (PM) dramatically improves IO performance. We find that this results in file systems on PM spending as much as 70% of the IO path performing file mapping (mapping file offsets to physical locations on storage media) on real workloads. However, even PM-optimized file systems perform file mapping based on decades-old assumptions. It is now critical to revisit file mapping for PM. We explore the design space for PM file mapping by building and evaluating several file-mapping designs, including different data structure, caching, as well as meta-data and block allocation approaches, within the context of a PM-optimized file system. Based on our findings, we design HashFS, a hash-based file mapping approach. HashFS uses a single hash operation for all mapping and allocation operations, bypassing the file system cache, instead prefetching mappings via SIMD parallelism and caching translations explicitly. HashFS’s resulting low latency provides superior performance compared to alternatives. HashFS increases the throughput of YCSB on LevelDB by up to 45% over page-cached extent trees in the state-of-the-art Strata PM-optimized file system.

2020

OSDI’20

AGAMOTTO: How Persistent is your Persistent Memory Application?

Ian Neal, Ben Reeves, Ben Stoler, Andrew Quinn, Youngjin Kwon, Simon Peter, and Baris Kasikci

In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20)
November 2020

Abs PDF Talk Url

Awarded IEEE Micro Top Picks 2021 Honorable Mention!

Persistent Memory (PM) can be used by applications to directly and quickly persist any data structure, without the overhead of a file system. However, writing PM applications that are simultaneously correct and efficient is challenging. As a result, PM applications contain correctness and performance bugs. Prior work on testing PM systems has low bug coverage as it relies primarily on extensive test cases and developer annotations. In this paper we aim to build a system for more thoroughly testing PM applications. We inform our design using a detailed study of 63 bugs from popular PM projects. We identify two application-independent patterns of PM misuse which account for the majority of bugs in our study and can be detected automatically. The remaining application-specific bugs can be detected using compact custom oracles provided by developers. We then present AGAMOTTO, a generic and extensible system for discovering misuse of persistent memory in PM applications. Unlike existing tools that rely on extensive test cases or annotations, AGAMOTTO symbolically executes PM systems to discover bugs. AGAMOTTO introduces a new symbolic memory model that is able to represent whether or not PM state has been made persistent. AGAMOTTO uses a state space exploration algorithm, which drives symbolic execution towards program locations that are susceptible to persistency bugs. AGAMOTTO has so far identified 84 new bugs in 5 different PM applications and frameworks while incurring no false positives.

2019

MICRO’19

NDA: Preventing speculative execution attacks at their source

Ofir Weisse, Ian Neal, Kevin Loughlin, Thomas F Wenisch, and Baris Kasikci

In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture
October 2019

Abs PDF Url

Speculative execution attacks like Meltdown and Spectre work by accessing secret data in wrong-path execution. Secrets are then transmitted and recovered by the attacker via a covert channel. Existing mitigations either require code modifications, address only specific exploit techniques, or block only the cache covert channel. Rather than battling exploit techniques and covert channels one by one, we seek to close off speculative execution attacks at their source. Our key observation is that these attacks require a chain of dependent wrong-path instructions to access and transmit secret data. We propose NDA, a technique to restrict speculative data propagation. NDA breaks the attacks’ wrong-path dependence chains while still allowing speculation and dynamic scheduling. We describe a design space of NDA variants that differ in the constraints they place on dynamic scheduling and the classes of speculative execution attacks they prevent. NDA preserves much of the performance advantage of out-of-order execution: on SPEC CPU 2017, NDA variants close 68-96% of the performance gap between in-order and unconstrained (insecure) out-of-order execution.

2018

ATC’18

TxFS: Leveraging File-System Crash Consistency to Provide ACID Transactions

Yige Hu, Zhiting Zhu, Ian Neal, Youngjin Kwon, Tianyu Cheng, Vijay Chidambaram, and Emmett Witchel

In Proceedings of the 2018 USENIX Conference on Usenix Annual Technical Conference
July 2018

Abs PDF Url

We introduce TxFS, a novel transactional file system that builds upon a file system’s atomic-update mechanism such as journaling. Though prior work has explored a number of transactional file systems, TxFS has a unique set of properties: a simple API, portability across different hardware, high performance, low complexity (by building on the journal), and full ACID transactions. We port SQLite and Git to use TxFS, and experimentally show that TxFS provides strong crash consistency while providing equal or better performance.

Workshop Papers

2021

NVMW’21

Towards Bug-free Persistent Memory Applications

Ian Neal, Andrew Quinn, and Baris Kasikci

March 2021

Abs PDF Slides Talk Url

Persistent Memory (PM) aims to revolutionize the storage-memory hierarchy, but programming these systems is error-prone. Our work investigates how to to help developers write better, bug-free PM applications by automatically debugging them. We first perform a study of bugs in persistent memory applications to identify the opportunities and pain-points of debugging these systems. Then, we discuss our work on AGAMOTTO, a generic and extensible system for automatically detecting PM bugs. Unlike existing tools that rely on extensive test cases or annotations, AGAMOTTO automatically detects bugs in PM systems by extending symbolic execution to model persistent memory. AGAMOTTO has so far identified 84 new bugs in 5 different PM applications and frameworks while incurring no false positives. We then discuss HIPPOCRATES, a system that automatically fixes bugs in PM systems. HIPPOCRATES “does no harm”: its fixes are guaranteed to fix an PM bug without introducing new bugs. We show that HIPPOCRATES produces fixes that are functionally equivalent to developer fixes and that HIPPOCRATES fixes have performance that rivals m anually-developed code.

Invited Talks

2021

OSDI’21

Preview: Persistent Memory

Ian Neal

At 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21)
July 2021

Slides Talk Url

Patents

2020

User-specific video frame brightness filter

Matthew Richard Wozniak, Mehmet Kucukgoz, Holly Helene Pollock, Ian Glen Neal, and Rafael Vincent Prieto Vertido

September 2020

Url
Video frame brightness filter

Matthew Richard Wozniak, Mehmet Kucukgoz, Holly Helene Pollock, Ian Glen Neal, and Rafael Vincent Prieto Vertido

June 2020

Url
Color-specific video frame brightness filter

Matthew Richard Wozniak, Mehmet Kucukgoz, Holly Helene Pollock, Ian Glen Neal, and Rafael Vincent Prieto Vertido

June 2020

Url

Honors Thesis

2017

The Advantages of a Transactional Interface: Porting Applications to TxFS

Ian Neal

May 2017

Abs PDF Url

In this paper I explore the value of transactional file systems by showing how such systems can benefit existing applications while not adding additional complexity to the codebase. I first discuss the concept of transactions in computing and how transactional semantics are used to provide consistency and durability to an application’s state. I examine a new work developed at the University of Texas, TxFS, which provides a very simple and powerful transactional interface. I then introduce how existing systems can be modified to take advantage of TxFS by modifying SQLite, a widely used embedded database, and by modifying OpenLDAP, a widely used implementation of the Lightweight Directory Access Protocol. These modified systems benefit from running on TxFS by having a simplified transactional system, reduced locking, no user-level logging, and enhanced support for multithreaded operations. Additionally, I show how simple it is to port existing systems to TxFS, and demonstrate how easy it would be for other systems to adopt TxFS to ensure durability and consistency for their users.