Hiep's Homepage

Recent/Unpublished Work

Automatic Reasoning on Recursive Data Structures with Sharing
Duc-Hiep Chu and Joxan Jaffar.
[abstract] [ arXiv ] [ Draft PDF ]

Abstract: We consider the problem of automatically verifying programs which manipulate arbitrary data structures. Our specification language is expressive, contains a notion of separation, and thus enables a precise specification of frames. The main contribution then is a program verification method which combines strongest postcondition reasoning in the form symbolic execution, unfolding recursive definitions of the data structure in question, and a new frame rule to achieve local reasoning so that proofs can be compositional. Finally, we present an implementation of our verifier, and demonstrate automation on a number of representative programs. In particular, we present the first automatic proof of a classic graph marking algorithm, paving the way for dealing with a class of programs which traverse a complex data structure.
Bounded Verification and Testing of Heap Programs
Duc-Hiep Chu, Joxan Jaffar, and Andrew E. Santosa
S3N: An Efficient Native String Theory Solver for Z3
Minh-Thai Trinh, Duc-Hiep Chu, and Joxan Jaffar
Incremental Quantitative Analysis on Dynamic Costs
Duc-Hiep Chu, Joxan Jaffar, and Vijay Murali.
[abstract] [ arXiv ] [ Draft PDF ]

Abstract: In quantitative program analysis, values are assigned to execution traces to represent a quality measure. Such analyses cover important applications, e.g. resource usage. Examining all traces is well known to be intractable and therefore traditional algorithms reason over an over-approximated set. Typically, inaccuracy arises due to inclusion of infeasible paths in this set. Thus path-sensitivity is one cure. However, there is another reason for the inaccuracy: that the cost model, i.e., the way in which the analysis of each trace is quantified, is dynamic. That is, the cost of a trace is dependent on the context in which the trace is executed. Thus the goal of accurate analysis, already challenged by path-sensitivity, is now further challenged by context-sensitivity.
In this paper, we address the problem of quantitative analysis defined over a dynamic cost model. Our algorithm is an ``anytime'' algorithm: it generates an answer quickly, but if the resource budget allows, it progressively produces better solutions via refinement iterations. The result of each iteration remains sound, but importantly, must converge to an exact analysis when given an unlimited resource budget. In this way, our algorithm is incremental. We finally give evidence that a new level of practicality is achieved by an evaluation on a realistic collection of benchmarks.

Peer-Reviewed Publications

S3: Syntax- and Semantic-Guided Repair Synthesis via Programming by Examples
Xuan-Bach D. Le, Duc-Hiep Chu, David Lo, Claire Le Goues, and Wilem Visser.
ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE'17).
[abstract] [ Paper PDF ]

Abstract: A notable class of techniques for automatic program repair is known as semantics-based. Such techniques, e.g., Angelix, infer semantic specifications via symbolic execution, and then use program synthesis to construct new code that satisfies those inferred specifications. However, the obtained specifications are naturally incomplete, leaving the synthesis engine with a difficult task of synthesizing a general solution from a sparse space of many possible solutions that are consistent with the provided specifications but that do not necessarily generalize. We present S3, a new repair synthesis engine that leverages programming-by-examples methodology to synthesize high-quality bug repairs. The novelty in S3 that allows it to tackle the sparse search space to create more general repairs is three-fold: (1) A systematic way to customize and constrain the syntactic search space via a domain-specific language, (2) An efficient enumeration- based search strategy over the constrained search space, and (3) A number of ranking features based on measures of the syntactic and semantic distances between candidate solutions and the original buggy program. We compare S3’s repair effectiveness with state-of-the-art synthesis engines Angelix, Enumerative, and CVC4. S3 can successfully and correctly fix at least three times more bugs than the best baseline on datasets of 52 bugs in small programs, and 100 bugs in real-world large programs.
JFIX: Semantics-Based Repair of Java Programs via Symbolic PathFinder
Xuan-Bach D. Le, Duc-Hiep Chu, David Lo, Claire Le Goues, and Wilem Visser.
ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA'17). Tool Demo.
[abstract] [ Paper PDF ]

Abstract: Recently there has been a proliferation of automated program repair (APR) techniques, targeting various programming languages. Such techniques can be generally classified into two families: syntactic- and semantics-based. Semantics-based APR, on which we focus, typically uses symbolic execution to infer semantic constraints and then program synthesis to construct repairs conforming to them. While syntactic-based APR techniques have been shown success- ful on bugs in real-world programs written in both C and Java, semantics-based APR techniques mostly target C programs. This leaves empirical comparisons of the APR families not fully explored, and developers without a Java-based semantics APR technique. We present JFix, a semantics-based APR framework that targets Java, and an associated Eclipse plugin. JFix is implemented atop Sym- bolic PathFinder, a well-known symbolic execution engine for Java programs. It extends one particular APR technique (Angelix), and is designed to be sufficiently generic to support a variety of such techniques. We demonstrate that semantics-based APR can indeed efficiently and effectively repair a variety of classes of bugs in large real-world Java programs. This supports our claim that the frame- work can both support developers seeking semantics-based repair of bugs in Java programs, as well as enable larger scale empirical studies comparing syntactic- and semantics-based APR targeting Java. The demonstration of our tool is available via the project website at:
https://xuanbachle.github.io/semanticsrepair/
Model Counting for Recursively-Defined Strings
Minh-Thai Trinh, Duc-Hiep Chu, and Joxan Jaffar.
29^th International Conference on Computer Aided Verification (CAV'17).
[abstract] [ Paper PDF ]

Abstract: We present a new algorithm for model counting of a class of string constraints. In addition to the classic operation of concatenation, our class includes some recursively defined operations such as Kleene closure, and replacement of substrings. Additionally, our class also includes length constraints on the string expressions, which means, by requiring reasoning about numbers, that we face a multi-sortedlogic. In the end, our string constraints are motivated by their use in programming for web applications. Our algorithm comprises two novel features: the ability to use a technique of (1) partial derivatives for constraints that are already in a solved form, i.e. a form where its (string) satisfiability is clearly displayed, and (2) non-progression, where cyclic reasoning in the reduction process may be terminated (thus allowing for the algorithm to look elsewhere). Finally, we experimentally compare our model counter with two recent works on model counting of similar constraints, SMC [18] and ABC [5], to demonstrate its superior performance.
Making Smart Contracts Smarter
Loi Luu, Duc-Hiep Chu, Hrishi Olickel, Prateek Saxena, Aquinas Hobor
23^rd ACM SIGSAC Conference on Computer and Communications Security (CCS'16).
[abstract] [ Paper PDF ]

Abstract: Cryptocurrencies record transactions in a decentralized data structure called a blockchain. Two of the most popular cryptocurrencies, Bitcoin and Ethereum, support the feature to encode rules or scripts for processing transactions. This feature has evolved to give practical shape to the ideas of smart contracts, or full-fledged programs that are run on blockchains. Recently, Ethereum's smart contract system has seen steady adoption, supporting tens of thousands of contracts, holding tens of millions dollars worth of virtual coins.
In this paper, we investigate the security of running Ethereum smart contracts in an open distributed network like those of cryptocurrencies. We introduce several new security problems in which an adversary can manipulate smart contract execution to gain profit. These bugs suggest subtle gaps in the understanding of the distributed semantics of the underlying platform. As a refinement, we propose ways to enhance the operational semantics of Ethereum to make contracts less vulnerable. For developers writing contracts for the existing Ethereum system, we build a symbolic execution tool called Oyente to find potential security bugs. Among 19,366 existing Ethereum contracts, Oyente flags 8,519 of them as vulnerable. We discuss the severity of attacks for several case studies which have source code available and confirm the attacks (which target only our accounts) in the main Ethereum network.
Progressive Reasoning over Recursively-Defined Strings
Minh-Thai Trinh, Duc-Hiep Chu, and Joxan Jaffar.
28^th International Conference on Computer Aided Verification (CAV'16).
[abstract] [ Paper PDF ]

Abstract: We consider the problem of reasoning over an expressive constraint language for unbounded strings. The difficulty comes from "recursively defined" functions such as replace, making state-of-the-art algorithms non-terminating. Our first contribution is a progressive search algorithm to not only mitigate the problem of non-terminating reasoning but also guide the search towards a "minimal solution" when the input formula is in fact satisfiable. We have implemented our method using the state-of-the-art Z3 framework. Importantly, we have enabled conflict clause learning for string theory so that our solver can be used effectively in the setting of program verification. Finally, our experimental evaluation shows leadership in a large benchmark suite, and a first deployment for another benchmark suite which requires reasoning about string formulas of a class that has not been solved before.
Symbolic Execution for Memory Consumption Analysis
Duc-Hiep Chu, Joxan Jaffar, and Rasool Maghareh.
17^th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, Tools and Theory for Embedded Systems (LCTES'16).
[abstract] [ Paper PDF ] [ Slides PDF ]

Abstract: With the advances in both hardware and software of embedded systems in the past few years, dynamic memory allocation can now be safely used in embedded software. As a result, the need to develop methods to avoid heap overflow errors in safety-critical embedded systems has increased. Resource analysis of imperative programs with non-regular loop patterns and signed integers, to support both memory allocation and deallocation, has long been an open problem. Existing methods can generate symbolic bounds that are parametric w.r.t. the program inputs; such bounds, however, are imprecise in the presence of non-regular loop patterns.
In this paper, we present a worst-case memory consumption analysis, based upon the framework of symbolic execution. Our assumption is that loops (and recursions) of to-be-analyzed programs are indeed bounded. We then can exhaustively unroll loops and the memory consumption of each iteration can be precisely computed and summarized for aggregation. Because of path-sensitivity, our algorithm generates more precise bounds. Importantly, we demonstrate that by introducing a new concept of reuse, symbolic execution scales to a set of realistic benchmark programs.
Precise Cache Timing Analysis via Symbolic Execution
Duc-Hiep Chu, Joxan Jaffar, and Rasool Maghareh.
22^nd IEEE Real-Time Embedded Technology and Applications Symposium (RTAS'16).
[abstract] [ Paper PDF ]

Abstract: We present a framework for WCET analysis of programs with emphasis on cache micro-architecture. Such an analysis is challenging primarily because of the timing model of a dynamic nature, that is, the timing of a basic block is heavily dependent on the context in which it is executed. At its core, our algorithm is based on symbolic execution, and an analysis is obtained by locating the "longest" symbolic execution path. Clearly a challenge is the intractable number of paths in the symbolic execution tree. Traditionally this challenge is met by performing some form of abstraction in the path generation process but this leads to a loss of path-sensitivity and thus precision in the analysis. The key feature of our algorithm is the ability for reuse. This is critical for maintaining a high-level of path-sensitivity, which in turn produces significantly increased accuracy. In other words, reuse allows scalability in path-sensitive exploration. Finally, we present an experimental evaluation on well known benchmarks in order to show two things: that systematic path-sensitivity in fact brings significant accuracy gains, and that the algorithm still scales well.
Automatic Induction Proofs of Data-Structures in Imperative Programs
Duc-Hiep Chu, Joxan Jaffar, and Minh-Thai Trinh.
36^th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'15). [abstract] [ Paper PDF ] [ Slides PDF ] [errata]

Abstract: We consider the problem of automated reasoning about dynamically manipulated data structures. Essential properties are encoded as predicates whose definitions are formalised via user-defined recursive rules. Traditionally, proving relationships between such properties is limited to the unfold-and-match (U+M) paradigm which employs systematic transformation steps of folding/unfolding the rules. A proof, using U+M, succeeds when we find a sequence of transformations that produces a final formula which is obviously provable by simply matching terms.
Our contribution here is the addition of the fundamental principle of induction to this automated process. We first show that some proof obligations that are dynamically generated in the process can be used as induction hypotheses in the future, and then we show how to use these hypotheses in an induction step which generates a new proof obligation aside from those obtained from the fold/unfold operations. While the adding of induction is an obvious need in general, no automated method has managed to include this in a systematic and general way. The main reason for this is the problem of avoiding circular reasoning. Our main result overcomes this with a novel checking condition. Our contribution is a proof method which -- beyond U+M -- performs automatic formula re-writing by treating previously encountered obligations in each proof path as possible induction hypotheses.
In the practical evaluation part of this paper, we show how the commonly used technique of using unproven lemmas can be avoided, using realistic benchmarks. This not only removes the current burden of coming up with the appropriate lemmas, but also significantly boosts up the verification process, since lemma applications, coupled with unfolding, often induce a very large search space. In the end, our method automatically reasons about a new class of formulas arising from practical program verification.

Errata : In Table 1, we made a number of mistakes when compressing the lemmas in the format of explicit heaps into SL formulas. We thank Ta Quang Trung for pointing them out.
A Framework to Synergize Partial Order Reduction with State Interpolation
Duc-Hiep Chu and Joxan Jaffar.
10^th Haifa Verification Conference (HVC'14).
[abstract] [ Paper PDF ] [ Slides PDF ] [ Report PDF ]

Abstract: We address the problem of reasoning about interleavings in safety verification of concurrent programs. In the literature, there are two prominent techniques for pruning the search space. First, there are well-investigated trace-based methods, collectively known as "Partial Order Reduction (POR)", which operate by weakening the concept of a trace by abstracting the total order of its transitions into a partial order. Second, there is state-based interpolation where a collection of formulas can be generalized by taking into account the property to be verified. Our main contribution is a framework that synergistically combines POR with state interpolation so that the sum is more than its parts.
S3: A Symbolic String Solver for Vulnerability Detection in Web Applications
Minh-Thai Trinh, Duc-Hiep Chu, and Joxan Jaffar.
21^st ACM SIGSAC Conference on Computer and Communications Security (CCS'14).
[abstract] [ Paper PDF ]

Abstract: Motivated by the vulnerability analysis of web programs which work on string inputs, we present S3, a new symbolic string solver. Our solver employs a new algorithm for a constraint language that is expressive enough for widespread applicability. Specifically, our language covers all the main string operations, such as those in JavaScript. The algorithm first makes use of a symbolic representation so that membership in a set defined by a regular expression can be encoded as string equations. Secondly, there is a constraint-based generation of instances from these symbolic expressions so that the total number of instances can be limited. We evaluate S3 on a well-known set of practical benchmarks, demonstrating both its robustness (more definitive answers) and its efficiency (about 20 times faster) against the state-of-the-art.
Lazy Symbolic Execution for Enhanced Learning
Duc-Hiep Chu, Joxan Jaffar, and Vijayaraghavan Murali.
14^th International Conference on Runtime Verification (RV'14).
[abstract] [ Paper PDF ]

Abstract: The performance of symbolic execution based verifiers relies heavily on the quality of "interpolants", formulas which succinctly describe a generalization of states proven safe so far. By default, symbolic execution along a path stops the moment when infeasibility is detected in its path constraints, a property we call "eagerness". In this paper, we argue that eagerness may hinder the discovery of good quality interpolants, and propose a systematic method that ignores the infeasibility in pursuit of better interpolants. We demonstrate with a state-of-the-art system on realistic benchmarks that this "lazy" symbolic execution outperforms its eager counterpart by a factor of two or more.
Path-Sensitive Resource Analysis Compliant with Assertions
Duc-Hiep Chu and Joxan Jaffar.
13^th International Conference on Embedded Software (EMSOFT'13).
[abstract] [ Paper PDF ] [ Slides PDF ]

Abstract: We consider the problem of bounding the worst-case resource usage of programs, where assertions about valid program executions may be enforced at selected program points. It is folklore that to be precise, path-sensitivity (up to loops) is needed. This entails unrolling loops in the manner of symbolic simulation. To be tractable, however, the treatment of the individual loop iterations must be greedy in the sense once analysis is finished on one iteration, we cannot backtrack to change it. We show that under these conditions, enforcing assertions produces unsound results. The fundamental reason is that complying with assertions requires the analysis to be fully sensitive (also with loops) wrt. the assertion variables.
We then present an algorithm where the treatment of each loop is separated in two phases. The first phase uses a greedy strategy in unrolling the loop. This phase explores what is conceptually a symbolic execution tree, which is of enormous size, while eliminates infeasible paths and dominated paths that guaranteed not to contribute to the worst case bound. A compact representation is produced at the end of this phase. Finally, the second phase attacks the remaining problem, to determine the worst-case path in the simplified tree, excluding all paths that violate the assertions from bound calculation. Scalability, in both phases, is achieved via an adaptation of a dynamic programming algorithm.
A Complete Method for Symmetry Reduction in Safety Verification
Duc-Hiep Chu and Joxan Jaffar.
24^th International Conference on Computer Aided Verification (CAV'12).
[abstract] [ Paper PDF ] [ Slides PDF ]

Abstract: Symmetry reduction is a well-investigated technique to counter the state space explosion problem for reasoning about a concurrent system of similar processes. Here we present a general method for its ap- plication, restricted to verification of safety properties, but without any prior knowledge about global symmetry. We start by using a notion of weak symmetry which allows for more reduction than in previous notions of symmetry. This notion is relative to the target safety property. The key idea is to perform symmetric transformations on state interpolation, a concept which has been used widely for pruning in SMT and CEGAR. Our method naturally favors "quite symmetric" systems: more similarity among the processes leads to greater pruning of the tree. The main result is that the method is complete wrt. weak symmetry: it only considers states which are not weakly symmetric to an already encountered state.
Symbolic Simulation on Complicated Loops for WCET Path Analysis
Duc-Hiep Chu and Joxan Jaffar.
11^th International Conference on Embedded Software (EMSOFT'11).
[abstract] [ Paper PDF ] [ Slides PDF ]

Abstract: We address the Worst-Case Execution Time (WCET) Path Analysis problem for bounded programs, formalized as discovering a tight upper bound of a resource variable. A key challenge is posed by complicated loops whose iterations exhibit non-uniform behavior. We adopt a brute-force strategy by simply unrolling them, and show how to make this scalable while preserving accuracy.
Our algorithm performs symbolic simulation of the program. It maintains accuracy because it preserves, at critical points, path-sensitivity. In other words, the simulation detects infeasible paths. Scalability, on the other hand, is dealt with by using summarizations, compact representations of the analyses of loop iterations. They are obtained by a judicious use of abstraction which preserves critical information flowing from one iteration to another. These summarizations can be compounded in order for the simulation to have linear complexity: the symbolic execution can in fact be asymptotically shorter than a concrete execution. Finally, we present a comprehensive experimental evaluation using a standard benchmark suite. We show that our algorithm is fast, and importantly, we often obtain not just accurate but exact results.

Others

Interpolation Methods for Symbolic Execution
Duc Hiep Chu.
Doctor of Philosophy Dissertation, National University of Singapore, Mar 2013
[ Dissertation PDF ] [ Slides PDF ]
Exploiting Similarities for State Space Reduction
Duc-Hiep Chu, and Joxan Jaffar.
5^th Workshop on Exploiting Concurrency Efficiently and Correctly ((EC)² 2012), Invited.
A Rule-Based Specification of Software Transactional Memory
Martin Sulzmann, Duc Hiep Chu.
18^th International Symposium on Logic-Based Program Synthesis and Transformation (LOPSTR'08).
Pre-proceedings.
[abstract] [ Paper PDF ]

Abstract: Software Transactional Memory (STM) has the promise to avoid the common pitfalls of locks when writing thread-based concurrent programs. Many papers on the subject deal with low-level implementation details to support the efficient and concurrent execution of multiple transactions. We give a rule-based specification of Software Transactional Memory in terms of Constraint Handling Rules (CHR) which naturally supports the concurrent execution of transactions. Such a high-level description of STM in terms of CHR has the advantage that we can easier understand the workings of STM and we can better analyze and verify STM. We verify correctness of a particular CHR-based STM implementation.