Recent/Unpublished Work
-
Automatic Reasoning on Recursive Data Structures with Sharing
Duc-Hiep Chu and Joxan Jaffar.
[abstract]
[ arXiv ]
[ Draft PDF ]
Abstract:
We consider the problem of automatically verifying programs which
manipulate arbitrary data structures. Our specification language is
expressive, contains a notion of separation, and thus enables a
precise specification of frames. The main contribution then is a
program verification method which combines strongest postcondition
reasoning in the form symbolic execution, unfolding recursive
definitions of the data structure in question, and a new frame rule to
achieve local reasoning so that proofs can be compositional. Finally,
we present an implementation of our verifier, and demonstrate
automation on a number of representative programs. In particular, we
present the first automatic proof of a classic graph marking
algorithm, paving the way for dealing with a class of programs which
traverse a complex data structure.
-
Bounded Verification and Testing of Heap Programs
Duc-Hiep Chu, Joxan Jaffar, and Andrew E. Santosa
-
S3N: An Efficient Native String Theory Solver for Z3
Minh-Thai Trinh, Duc-Hiep Chu, and Joxan Jaffar
-
Incremental Quantitative Analysis on Dynamic Costs
Duc-Hiep Chu, Joxan Jaffar, and Vijay Murali.
[abstract]
[ arXiv ]
[ Draft PDF ]
Abstract:
In quantitative program analysis, values are assigned to execution traces
to represent a quality measure. Such analyses cover important
applications, e.g. resource usage. Examining all traces is well known
to be intractable and therefore traditional algorithms reason over an
over-approximated set. Typically, inaccuracy arises due to
inclusion of infeasible paths in this set. Thus
path-sensitivity is one cure. However, there is another reason for
the inaccuracy: that the cost model, i.e., the way in which the analysis
of each trace is quantified, is dynamic. That is, the cost of
a trace is dependent on the context in which the trace is executed.
Thus the goal of accurate analysis, already challenged by
path-sensitivity, is now further challenged by context-sensitivity.
In this paper, we address the problem of quantitative analysis defined
over a dynamic cost model.
Our algorithm is an ``anytime'' algorithm: it generates
an answer quickly, but if the resource budget allows,
it progressively produces better solutions via refinement
iterations. The result of each iteration remains sound, but
importantly, must converge to an exact analysis when given
an unlimited resource budget.
In this way, our algorithm is incremental.
We finally give evidence that a new level of practicality is achieved
by an evaluation on a realistic collection of benchmarks.
Peer-Reviewed Publications
-
S3: Syntax- and Semantic-Guided Repair Synthesis via Programming by Examples
Xuan-Bach D. Le, Duc-Hiep Chu, David Lo, Claire Le Goues, and Wilem Visser.
ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE'17).
[abstract]
[ Paper PDF ]
Abstract:
A notable class of techniques for automatic program repair is known as
semantics-based. Such techniques, e.g., Angelix, infer semantic
specifications via symbolic execution, and then use program synthesis
to construct new code that satisfies those inferred specifications.
However, the obtained specifications are naturally incomplete, leaving
the synthesis engine with a difficult task of synthesizing a general
solution from a sparse space of many possible solutions that are
consistent with the provided specifications but that do not
necessarily generalize.
We present S3, a new repair synthesis engine that leverages
programming-by-examples methodology to synthesize high-quality bug
repairs. The novelty in S3 that allows it to tackle the sparse search
space to create more general repairs is three-fold: (1) A systematic
way to customize and constrain the syntactic search space via a
domain-specific language, (2) An efficient enumeration- based search
strategy over the constrained search space, and (3) A number of
ranking features based on measures of the syntactic and semantic
distances between candidate solutions and the original buggy
program. We compare S3’s repair effectiveness with state-of-the-art
synthesis engines Angelix, Enumerative, and CVC4. S3 can successfully
and correctly fix at least three times more bugs than the best
baseline on datasets of 52 bugs in small programs, and 100 bugs in
real-world large programs.
-
JFIX: Semantics-Based Repair of Java Programs via Symbolic PathFinder
Xuan-Bach D. Le, Duc-Hiep Chu, David Lo, Claire Le Goues, and Wilem Visser.
ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA'17). Tool Demo.
[abstract]
[ Paper PDF ]
Abstract:
Recently there has been a proliferation of automated program repair
(APR) techniques, targeting various programming languages. Such
techniques can be generally classified into two families: syntactic-
and semantics-based. Semantics-based APR, on which we focus, typically
uses symbolic execution to infer semantic constraints and then program
synthesis to construct repairs conforming to them. While
syntactic-based APR techniques have been shown success- ful on bugs in
real-world programs written in both C and Java, semantics-based APR
techniques mostly target C programs. This leaves empirical comparisons
of the APR families not fully explored, and developers without a
Java-based semantics APR technique. We present JFix, a semantics-based
APR framework that targets Java, and an associated Eclipse
plugin. JFix is implemented atop Sym- bolic PathFinder, a well-known
symbolic execution engine for Java programs. It extends one particular
APR technique (Angelix), and is designed to be sufficiently generic to
support a variety of such techniques. We demonstrate that
semantics-based APR can indeed efficiently and effectively repair a
variety of classes of bugs in large real-world Java programs. This
supports our claim that the frame- work can both support developers
seeking semantics-based repair of bugs in Java programs, as well as
enable larger scale empirical studies comparing syntactic- and
semantics-based APR targeting Java. The demonstration of our tool is
available via the project website at:
https://xuanbachle.github.io/semanticsrepair/
-
Model Counting for
Recursively-Defined
Strings
Minh-Thai
Trinh, Duc-Hiep
Chu, and Joxan
Jaffar.
29th International Conference on Computer Aided Verification (CAV'17).
[abstract]
[ Paper PDF ]
Abstract:
We present a new algorithm
for model counting of a class of string constraints. In addition to
the classic operation of concatenation, our class includes some
recursively defined operations such as Kleene closure, and replacement
of substrings. Additionally, our class also includes length
constraints on the string expressions, which means, by requiring
reasoning about numbers, that we face a multi-sortedlogic. In the end,
our string constraints are motivated by their use in programming for
web applications. Our algorithm comprises two novel features: the
ability to use a technique of (1) partial derivatives for constraints
that are already in a solved form, i.e. a form where its (string)
satisfiability is clearly displayed, and (2) non-progression, where
cyclic reasoning in the reduction process may be terminated (thus
allowing for the algorithm to look elsewhere). Finally, we
experimentally compare our model counter with two recent works on
model counting of similar constraints, SMC [18] and ABC [5], to
demonstrate its superior performance.
-
Making Smart Contracts Smarter
Loi Luu, Duc-Hiep Chu, Hrishi Olickel, Prateek Saxena, Aquinas Hobor
23rd ACM SIGSAC Conference on Computer and Communications Security (CCS'16).
[abstract]
[ Paper PDF ]
Abstract:
Cryptocurrencies record transactions in a decentralized data structure
called a blockchain. Two of the most popular cryptocurrencies, Bitcoin
and Ethereum, support the feature to encode rules or scripts for
processing transactions. This feature has evolved to give practical
shape to the ideas of smart contracts, or full-fledged programs that
are run on blockchains. Recently, Ethereum's smart contract system has
seen steady adoption, supporting tens of thousands of contracts,
holding tens of millions dollars worth of virtual coins.
In this paper, we investigate the security of running Ethereum smart contracts
in an open distributed network like those of cryptocurrencies. We
introduce several new security problems in which an adversary can
manipulate smart contract execution to gain profit. These bugs suggest
subtle gaps in the understanding of the distributed semantics of the
underlying platform. As a refinement, we propose ways to enhance the
operational semantics of Ethereum to make contracts less
vulnerable. For developers writing contracts for the existing Ethereum
system, we build a symbolic execution tool called Oyente to find
potential security bugs. Among 19,366 existing Ethereum contracts,
Oyente flags 8,519 of them as vulnerable. We discuss the severity of
attacks for several case studies which have source code available and
confirm the attacks (which target only our accounts) in the main
Ethereum network.
-
Progressive Reasoning over Recursively-Defined Strings
Minh-Thai Trinh, Duc-Hiep Chu, and Joxan Jaffar.
28th International Conference on Computer Aided Verification (CAV'16).
[abstract]
[ Paper PDF ]
Abstract:
We consider the problem of reasoning over an expressive constraint
language for unbounded strings. The difficulty comes from
"recursively defined" functions such as replace,
making state-of-the-art algorithms non-terminating. Our first
contribution is a progressive search algorithm to not only mitigate
the problem of non-terminating reasoning but also guide the search
towards a "minimal solution" when the input formula
is in fact satisfiable. We have implemented our method using the
state-of-the-art Z3 framework. Importantly, we have enabled conflict
clause learning for string theory so that our solver can be used
effectively in the setting of program verification. Finally, our
experimental evaluation shows leadership in a large benchmark suite,
and a first deployment for another benchmark suite which requires
reasoning about string formulas of a class that has not been solved
before.
-
Symbolic Execution for Memory Consumption Analysis
Duc-Hiep Chu, Joxan Jaffar, and Rasool Maghareh.
17th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, Tools and Theory for Embedded Systems (LCTES'16).
[abstract]
[ Paper PDF ]
[ Slides PDF ]
Abstract:
With the advances in both hardware and software of embedded systems in
the past few years, dynamic memory allocation can now be safely used
in embedded software. As a result, the need to develop methods to
avoid heap overflow errors in safety-critical embedded systems has
increased. Resource analysis of imperative programs with non-regular
loop patterns and signed integers, to support both memory allocation
and deallocation, has long been an open problem. Existing methods can
generate symbolic bounds that are parametric w.r.t. the program
inputs; such bounds, however, are imprecise in the presence of
non-regular loop patterns.
In this paper, we present a worst-case memory consumption analysis,
based upon the framework of symbolic execution. Our assumption is
that loops (and recursions) of to-be-analyzed programs are indeed
bounded. We then can exhaustively unroll loops and the memory
consumption of each iteration can be precisely computed and summarized
for aggregation. Because of path-sensitivity, our algorithm generates
more precise bounds. Importantly, we demonstrate that by introducing
a new concept of reuse, symbolic execution scales to a set of
realistic benchmark programs.
-
Precise Cache Timing Analysis via Symbolic Execution
Duc-Hiep Chu, Joxan Jaffar, and Rasool Maghareh.
22nd IEEE Real-Time Embedded Technology and Applications Symposium (RTAS'16).
[abstract]
[ Paper PDF ]
Abstract:
We present a framework for WCET analysis of programs with emphasis on
cache micro-architecture. Such an analysis is challenging primarily
because of the timing model of a dynamic nature, that is, the
timing of a basic block is heavily dependent on the context in which
it is executed. At its core, our algorithm is based on symbolic
execution, and an analysis is obtained by locating the "longest"
symbolic execution path. Clearly a challenge is the intractable
number of paths in the symbolic execution tree. Traditionally this
challenge is met by performing some form of abstraction in the path
generation process but this leads to a loss of path-sensitivity and
thus precision in the analysis. The key feature of our algorithm is
the ability for reuse. This is critical for maintaining a
high-level of path-sensitivity, which in turn produces significantly
increased accuracy. In other words, reuse allows scalability in
path-sensitive exploration. Finally, we present an experimental
evaluation on well known benchmarks in order to show two things: that
systematic path-sensitivity in fact brings significant accuracy gains,
and that the algorithm still scales well.
-
Automatic Induction Proofs of Data-Structures in Imperative Programs
Duc-Hiep Chu, Joxan Jaffar, and Minh-Thai Trinh.
36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'15).
[abstract]
[ Paper PDF ]
[ Slides PDF ]
[errata]
Abstract:
We consider the problem of automated reasoning about
dynamically manipulated data structures. Essential properties
are encoded as predicates whose definitions are formalised
via user-defined recursive rules. Traditionally, proving
relationships between such properties is limited to the
unfold-and-match (U+M) paradigm which employs systematic
transformation steps of folding/unfolding the rules. A
proof, using U+M, succeeds when we find a sequence of
transformations that produces a final formula which is
obviously provable by simply matching terms.
Our contribution here is the addition of the fundamental
principle of induction to this automated process. We first
show that some proof obligations that are dynamically
generated in the process can be used as induction hypotheses in
the future, and then we show how to use these hypotheses
in an induction step which generates a new proof obligation
aside from those obtained from the fold/unfold operations.
While the adding of induction is an obvious need in general,
no automated method has managed to include this in
a systematic and general way. The main reason for this is
the problem of avoiding circular reasoning.
Our main result overcomes this with a novel checking condition.
Our contribution is a proof method which -- beyond U+M --
performs automatic formula re-writing by treating previously
encountered obligations in each proof path as possible induction
hypotheses.
In the practical evaluation part of this paper, we show how
the commonly used technique of using unproven lemmas can
be avoided, using realistic benchmarks. This not only removes
the current burden of coming up with the appropriate
lemmas, but also significantly boosts up the verification
process, since lemma applications, coupled with unfolding,
often induce a very large search space. In the end, our method
automatically reasons about a new class of formulas arising
from practical program verification.
Errata :
In Table 1, we made a number of mistakes when compressing the lemmas in
the format of explicit heaps into SL formulas. We thank Ta Quang Trung
for pointing them out.
-
A Framework to Synergize Partial Order Reduction with State Interpolation
Duc-Hiep Chu and Joxan Jaffar.
10th Haifa Verification Conference (HVC'14).
[abstract]
[ Paper PDF ]
[ Slides PDF ]
[ Report PDF ]
Abstract:
We address the problem of reasoning about interleavings in safety verification of concurrent programs.
In the literature, there are two prominent techniques for pruning the search space.
First, there are well-investigated trace-based methods, collectively known as "Partial Order Reduction (POR)",
which operate by weakening the concept of a trace by abstracting the total order of its transitions into a partial order.
Second, there is state-based interpolation where a collection of formulas can be generalized by taking into account the
property to be verified. Our main contribution is a framework that synergistically combines POR with state
interpolation so that the sum is more than its parts.
-
S3: A Symbolic String Solver for Vulnerability Detection in Web Applications
Minh-Thai Trinh, Duc-Hiep Chu, and Joxan Jaffar.
21st ACM SIGSAC Conference on Computer and Communications Security (CCS'14).
[abstract]
[ Paper PDF ]
Abstract:
Motivated by the vulnerability analysis of web programs which
work on string inputs, we present S3, a new symbolic string solver.
Our solver employs a new algorithm for a constraint language that
is expressive enough for widespread applicability. Specifically, our
language covers all the main string operations, such as those in
JavaScript. The algorithm first makes use of a symbolic representation
so that membership in a set defined by a regular expression
can be encoded as string equations. Secondly, there is a constraint-based
generation of instances from these symbolic expressions so
that the total number of instances can be limited. We evaluate S3
on a well-known set of practical benchmarks, demonstrating both
its robustness (more definitive answers) and its efficiency
(about 20 times faster) against the state-of-the-art.
-
Lazy Symbolic Execution for Enhanced Learning
Duc-Hiep Chu, Joxan Jaffar, and Vijayaraghavan Murali.
14th International Conference on Runtime Verification (RV'14).
[abstract]
[ Paper PDF ]
Abstract:
The performance of symbolic execution based verifiers relies heavily on the quality of "interpolants", formulas which
succinctly describe a generalization of states proven safe so far.
By default, symbolic execution along a path stops the moment when infeasibility
is detected in its path constraints, a property we call "eagerness". In this paper, we argue that
eagerness may hinder the discovery of good quality interpolants, and propose a systematic method
that ignores the infeasibility in pursuit of better interpolants. We demonstrate with a state-of-the-art
system on realistic benchmarks that this "lazy" symbolic execution outperforms its eager counterpart
by a factor of two or more.
-
Path-Sensitive Resource Analysis Compliant with Assertions
Duc-Hiep Chu and Joxan Jaffar.
13th International Conference on Embedded Software (EMSOFT'13).
[abstract]
[ Paper PDF ]
[ Slides PDF ]
Abstract:
We consider the problem of bounding the worst-case resource usage of programs,
where assertions about valid program executions may be enforced at selected program points.
It is folklore that to be precise, path-sensitivity (up to loops) is needed. This entails
unrolling loops in the manner of symbolic simulation. To be tractable, however,
the treatment of the individual loop iterations must be greedy in the sense once analysis
is finished on one iteration, we cannot backtrack to change it. We show that under these conditions,
enforcing assertions produces unsound results. The fundamental reason is that complying with assertions
requires the analysis to be fully sensitive (also with loops) wrt. the assertion variables.
We then present an algorithm where the treatment of each loop is separated in two phases.
The first phase uses a greedy strategy in unrolling the loop. This phase explores what is
conceptually a symbolic execution tree, which is of enormous size, while eliminates infeasible
paths and dominated paths that guaranteed not to contribute to the worst case bound.
A compact representation is produced at the end of this phase. Finally, the second phase attacks
the remaining problem, to determine the worst-case path in the simplified tree,
excluding all paths that violate the assertions from bound calculation.
Scalability, in both phases, is achieved via an adaptation of a dynamic programming algorithm.
-
A Complete Method for Symmetry Reduction in Safety Verification
Duc-Hiep Chu and Joxan Jaffar.
24th International Conference on Computer Aided Verification (CAV'12).
[abstract]
[ Paper PDF ]
[ Slides PDF ]
Abstract: Symmetry reduction is a well-investigated technique to counter
the state space explosion problem for reasoning about a concurrent system of similar processes.
Here we present a general method for its ap- plication, restricted to verification of safety properties,
but without any prior knowledge about global symmetry. We start by using a notion of weak symmetry
which allows for more reduction than in previous notions of symmetry. This notion is relative
to the target safety property. The key idea is to perform symmetric transformations on state interpolation,
a concept which has been used widely for pruning in SMT and CEGAR. Our method naturally favors "quite symmetric"
systems: more similarity among the processes leads to greater pruning of the tree.
The main result is that the method is complete wrt. weak symmetry: it only considers
states which are not weakly symmetric to an already encountered state.
-
Symbolic Simulation on Complicated Loops for WCET Path Analysis
Duc-Hiep Chu and Joxan Jaffar.
11th International Conference on Embedded Software (EMSOFT'11).
[abstract]
[ Paper PDF ]
[ Slides PDF ]
Abstract:
We address the Worst-Case Execution Time (WCET) Path Analysis problem for bounded programs,
formalized as discovering a tight upper bound of a resource variable. A key challenge
is posed by complicated loops whose iterations exhibit non-uniform behavior. We adopt a
brute-force strategy by simply unrolling them, and show how to make this scalable while preserving accuracy.
Our algorithm performs symbolic simulation of the program. It maintains accuracy because it preserves,
at critical points, path-sensitivity. In other words, the simulation detects infeasible paths.
Scalability, on the other hand, is dealt with by using summarizations, compact representations
of the analyses of loop iterations. They are obtained by a judicious use of abstraction
which preserves critical information flowing from one iteration to another.
These summarizations can be compounded in order for the simulation to have linear complexity:
the symbolic execution can in fact be asymptotically shorter than a concrete execution.
Finally, we present a comprehensive experimental evaluation using a standard benchmark suite.
We show that our algorithm is fast, and importantly, we often obtain not just accurate but exact results.