Introduction
This is a tutorial aimed at training developers how to leverage the Software Analysis Workbench (SAW) and Cryptol to develop and verify cryptographic implementations in a Continuous Reasoning paradigm. Continuous Reasoning is, roughly,
formal reasoning about a (changing) codebase … done in a fashion which mirrors the iterative, continuous model of software development that is increasingly practiced in industry.
SAW and Cryptol can be used by a Continuous Integration (CI) system to enforce invariants (safety, security, and functional) that software must have at certain stages in a software development pipeline. Some industrial examples include AWS’s s2n and Supranational’s BLST.
Prerequisites
Before working through this lab, you’ll need
- A recent version of Python, Cryptol, SAW, saw-remote-api, SAW’s Python client package, and the clang C compiler to be installed and
- an editor for completing the exercises in this file.
You’ll also need experience with the following languages
- Python
- Cryptol
- SAW
- C
Skills You’ll Learn
By the end of this lab you will be comfortable using SAW’s remote API to formally verify cryptographic implementations. Since the bulk of work with SAW lies in crafting linkages between the source language and spec, this lab is filled with examples of different types of data structures and how to represent them in SAW. Hence, this lab also exists as a kind of reference that you may return to when using these tools. As well, there are some examples here and a full-featured tutorial on SAW here.
Why Use SAW?
Previous labs demonstrated how to look at algorithm specifications from math papers and formally represent them in Cryptol. A software/hardware developer now can create a performant implementation with confidence since they are armed with a bit-precise, unambiguous, machine-readable, formally verified Cryptol specification.
Of course, any developer desires that the code they write completely adheres to all of the cryptographic properties pertaining to the target algorithm. They also want to uphold best coding practices for their implementation language and avoid vulnerabilities such as buffer or integer overflows. After all, what good is cryptography if an attacker can take advantage of an implementation flaw?
This is where the Software Analysis Workbench (SAW) comes in. SAW is a tool that formally verifies properties of compiled code using automated reasoning tools. Alright, that’s a lot of fancy words, but essentially SAW can, in a fairly push-button way, verify that, for all inputs, a function does what it’s supposed to do and nothing more. This makes SAW a much more powerful tool for assuring code compared to unit tests, which are limited in their testing scope and cannot test for all possible, potentially dangerous edge cases.
In order to get SAW to work its magic, source code needs to be compiled - SAW works on low-level implementation languages such as LLVM bitcode, Java byte code, low-level Verilog, and even some assembly languages. The examples in this lab demonstrate verification of cryptographic implementations written in the C language, so below you’ll see C source code being compiled to LLVM bitcode.
Next, we need to tell SAW exactly what to verify. For every function/method, SAW needs to know:
- State (i.e. global variables/objects)
- Input arguments
- Output arguments
- Preconditions on state and arguments
- Postconditions on state and arguments
This information is packaged into a script for SAW to process. This script can either be written in Python or SAW’s special purpose language, SAWScript. For this lab, the scripts are written in Python since, as a general purpose programming language, it provides a wider range of techniques when expressing properties and it is much more widely known than SAWScript. Python also fairly easily integrates into CI systems.
Combining all of these different languages (Python, C, LLVM, SAWScript, Cryptol) and tools together into one workflow is definitely a challenging process, especially for newcomers. However, learning how to use these components provides a very powerful method for developing cryptographic implementations and proving them correct and secure.
Before setting up the lab, let’s recap the overall SAW workflow using the diagram below.
- A developer begins with a math paper algorithm specification that defines how an algorithm behaves.
- They then take that paper and write up a version of it in Cryptol (or grab one that already exists). Trust is gained in the Cryptol specification via properties defined in the paper and proven in the Cryptol specification.
- The developer begins implementing the algorithm in a general purpose programming language such as C/C++ or Java.
- As they develop their implementation, SAW is used to verify that functions/methods behave according to the properties in a Cryptol specification.
- The implementation is continuously improved and tested until verified by SAW.
Setting Everything Up
To run any of the examples in this lab, first start the Software
Analysis Workbench (SAW) remote API (saw-remote-api
). If using the
development container that comes with this course
(ghcr.io/weaversa/cryptol-course
), enter the following command in
a terminal:
$ start-saw-remote-api
Otherwise, this tool can be installed by following the instructions
for SAW found in the Installation lab. Once
installed, to run saw-remote-api
, enter the following commands in
a terminal:
$ export SAW_SERVER_URL=http://0.0.0.0:36691
$ saw-remote-api http --host 0.0.0.0 --port 36691 / &
Congrats! A server version of SAW is now running in the background
ready to accept commands on port 36691
.
SAW and Python
If you followed the instructions above, you now have SAW running in
the background, waiting for a connection on port 36691
. Galois
provides a Python client package
that allows you to write formal/logical contracts to enforce
invariants on some given software.
Python Contracts Introduction
Here’s the general layout for SAW in Python:
# Import SAW's Python modules
class contractName(Contract):
def specification(self):
# Initialization and Preconditions
# Execute Function
# Initialization for output variables (optional)
# Postconditions and Return
class testName(unittest.TestCase):
def specificTestName(self):
# Verify contracts
if __name__ == "__main__":
unittest.main()
Imports
To start making Python contracts first import the necessary files:
import unittest
from saw_client import *
from saw_client.crucible import *
from saw_client.llvm import *
from saw_client.proofscript import *
from saw_client.llvm_type import *
We put *
for simplicity, but if one only wanted certain functions,
they could refer to this table for some common imports:
Package | Values |
---|---|
saw_client.crucible |
cry , cry_f |
saw_client.llvm |
Contract , CryptolTerm , SetupVal , FreshVar , i8 , i32 , i64 , void , null , array_ty , field , struct , alias_ty |
saw_client.llvm_type |
LLVMType , LLVMArrayType |
Left Circular Shift Example
To see contracts in action we need an example. Here’s some C code for left circular shift we want to verify:
uint32_t rotl(uint32_t bits, uint32_t shift) {
return (bits << shift) | (bits >> (sizeof(bits) * 8 - shift));
}
In this example, SAW won’t actually verify C source, but rather C
compiled down to LLVM intermediate representation (IR), or
bitcode. This can be accomplished via the clang
compiler. In this
instance, we can create the bitcode by entering the following commands
in a terminal.
$ mkdir labs/SAW/rotl/artifacts
$ clang -emit-llvm labs/SAW/rotl/src/rotl1.c -c -o labs/SAW/rotl/artifacts/rotl.bc
We can inspect the bitcode, using SAW, by loading the module and
printing some meta-data. Note that when we run the following
example, we assume to be located in the labs/SAW/rotl/
directory.
$ saw
┏━━━┓━━━┓━┓━┓━┓
┃ ━━┓ ╻ ┃ ┃ ┃ ┃
┣━━ ┃ ╻ ┃┓ ╻ ┏┛
┗━━━┛━┛━┛┗━┛━┛ version 1.2.0.99 (<non-dev-build>)
sawscript> r <- llvm_load_module "artifacts/rotl.bc"
sawscript> print r
[00:30:35.338] Module: artifacts/rotl.bc
Types:
Globals:
External references:
Definitions:
i32 @rotl(i32 %0, i32 %1)
The corresponding Cryptol specification for left circular shift is:
rotl : [32] -> [32] -> [32]
rotl xs shift = xs <<< shift
For the SAW Python API we make a Contract
object with the required
specification
function:
class rotl_Contract(Contract):
def specification(self):
bits = self.fresh_var(i32, "bits")
shift = self.fresh_var(i32, "shift")
self.execute_func(bits, shift)
self.returns_f("rotl {bits} {shift}")
Let’s break down specification
piece by piece.
Fresh Symbolic Variables
The command self.fresh_var(type, name)
creates a new symbolic
variable of type type
and name name
, where name
is a
string. Names for fresh symbolic variables are not optional inputs,
though they mostly serve displaying names in counter-examples and
error messages. The string can be anything, but it makes sense to give
it the name of the variable being defined.
Execute Functions
The command self.execute_func(input1, input2, ...)
will symbolically
execute the function we’re writing a contract for. There should be
exactly as many comma separated inputs as there are in the C
function. One can only place preconditions before this command and
postconditions after this command.
Return Statements
The command self.returns_f(string)
is a postcondition that asserts
the function returns a given Cryptol term (parsed from a Python
string). To use Python variables in scope within the string use
{variable_name}
. For example,
self. |
returns_f( |
"rotl {bits} {shift}" |
) |
---|---|---|---|
In this contract | assert the current function returns the Cryptol term | left circular shift bits by shift |
. |
Sometimes we don’t want to return a Cryptol term. In those cases we
can just use returns(someSetupValue)
. The specification function of
a Contract must always have a self.returns(someSetupValue)
or
self.returns_f(string)
statement. If the function returns void
, one
can use self.returns(void)
.
Terms from Cryptol
The command cry(string)
converts a Python string into a
CryptolTerm
that can be used in SAW. The cry_f(string)
command is
similar to cry
, but the _f
indicates one can pass Python local
variables into the strings. To do this, surround the variable with
braces as we did in returns_f("{bits} >>> {shift}")
. In fact,
returns_f
is just syntactic sugar for returns(cry_f(string))
. In
general, cry_f
and friends are mostly a wrapper around formatted string literals
called “f-strings”.
The CryptolTerm
class is a subclass of SetupVal
. This allows using
CryptolTerm
as a SetupVal
.
Braces are sometimes used in Cryptol to assign type parameters or declare records. The symbols `` are used to denote literal braces for these cases when parsing Python strings. For example, let’s think about how to parse the following line:
self.returns_f("{{a = take`{{5}} {var}, b = take`{{{N}}} {var} }} == foo `eel")
If var
is a local Python variable equal to 23
and N
is a
local Python variable equal to 2
, then the string parses in Cryptol as
{a = take`{5} 23, b = take`{2} 23} == foo `eel
where foo
is some Cryptol function returning a record and eel
is
some Cryptol type in the currently loaded specification.
Unit Testing
class rotlTest(unittest.TestCase):
def test_rotl(self):
connect(reset_server=True)
if __name__ == "__main__": view(LogResults(verbose_failure=True))
bcpath = "/some/path/to/your/file.bc"
mod = llvm_load_module(bcpath)
crypath = "/some/path/to/your/file.cry"
cryptol_load_file(crypath)
rotl_result = llvm_verify(mod, 'rotl', rotl_Contract())
self.assertIs(rotl_result.is_success(), True)
For a contract, the specification function should be called
specification
. For tests, it doesn’t matter what you name your
tests. Here we named it test_rotl
. These tests will be run when you
run the Python program.
Let’s break down the first few lines of this function:
- The command
connect(reset_server=True)
connects to the server so we can use the SAW Python API. - The line
if __name__ == "__main__": view(LogResults(verbose_failure=True))
allows us to view the output with verbose error messages. If you don’t want verbose error messages, then just useif __name__ == "__main__": view(LogResults())
. - The line
bcpath = "/some/path/to/your/file.bc"
declares the bitcode file to analyze. If there are multiple bitcode files, make a variable for each file. - The line
mod = llvm_load_module(bcpath)
creates the object to pass to verification that represents the bitcode. - The line
crypath = "/some/path/to/your/file.cry"
specifies the path to a Cryptol specification. - The line
cryptol_load_file(crypath)
loads a Cryptol specification.
Now that the environment is set up, let’s actually verify our contract! This is done at the line
rotl_result = |
llvm_verify( |
mod, |
'rotl', |
rotl_Contract() |
) |
---|---|---|---|---|---|
Assign this variable | to the result of trying to verify | the bitcode | function with this name | using this contract | . |
Now that we have the result, we want to assert this result succeeded
using self.assertIs(rotl_result.is_success(), True)
.
Debugging C with SAW
from pathlib import Path
import unittest
from saw_client import *
from saw_client.crucible import *
from saw_client.llvm import *
from saw_client.proofscript import *
from saw_client.llvm_type import *
class rotl_Contract(Contract):
def specification(self):
xs = self.fresh_var(i32, "xs")
shift = self.fresh_var(i32, "shift")
self.execute_func(xs, shift)
self.returns_f("rotl {xs} {shift}")
class rotlTest(unittest.TestCase):
def test_rotl(self):
connect(reset_server=True)
if __name__ == "__main__": view(LogResults(verbose_failure=True))
basedir = Path(__file__).absolute().parents[1]
bcpath = basedir/"artifacts/rotl.bc"
crypath = basedir/"specs/rotl.cry"
cryptol_load_file(str(crypath))
mod = llvm_load_module(str(bcpath))
rotl_result = llvm_verify(mod, 'rotl', rotl_Contract())
self.assertIs(rotl_result.is_success(), True)
if __name__ == "__main__":
unittest.main()
Remember to always include
if __name__ == "__main__":
unittest.main()
or else the python script won’t do anything!
Note that we defined our bitcode and cryptol file paths (bcpath
and
crypath
, respectively) relative to the rotl lab directory using
Path
from Python’s pathlib
library.
basedir = Path(__file__).absolute().parents[1]
bcpath = basedir/"artifacts/rotl.bc"
crypath = basedir/"specs/rotl.cry"
cryptol_load_file(str(crypath))
mod = llvm_load_module(str(bcpath))
Doing so gives us freedom to call the proof script from any initial
working directory. It also provides portability to multiple platforms
since we don’t need to worry whether or not the operating system uses
/
or \
in paths. However, we must convert the path names to a
string so to satisfy SAW’s API for file loads.
We can now run the proof script.
$ python3 proof/rotl.py
[03:08:29.986] Verifying rotl ...
[03:08:29.987] Simulating rotl ...
[03:08:29.988] Checking proof obligations rotl ...
[03:08:30.007] Subgoal failed: rotl safety assertion:
internal: error: in rotl
Undefined behavior encountered
Details:
Poison value created
The second operand of `shl` was equal to or greater than the number of bits in the first operand
[03:08:30.007] SolverStats {solverStatsSolvers = fromList ["W4 ->z3"], solverStatsGoalSize = 382}
[03:08:30.007] ----------Counterexample----------
[03:08:30.007] shift0: 2147483648
[03:08:30.007] ----------------------------------
F
======================================================================
FAIL: test_rotl (__main__.rotlTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "proof/rotl.py", line 31, in test_rotl
self.assertIs(rotl_result.is_success(), True)
AssertionError: False is not True
----------------------------------------------------------------------
Ran 1 test in 0.750s
FAILED (failures=1)
🛑 The goal failed to verify.
SAW alerted us about potentially undefined behavior; the C99 standard specifies the following about bit shifts:
If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.
As expected, this alerts us to a bug:
The second operand of `shl` was equal to or greater than the number of bits in the first operand
[03:08:30.007] SolverStats {solverStatsSolvers = fromList ["W4 ->z3"], solverStatsGoalSize = 382}
[03:08:30.007] ----------Counterexample----------
[03:08:30.007] shift0: 2147483648
[03:08:30.007] ----------------------------------
SAW also provides a handy counterexample, namely, when shift =
2147483648
(clearly larger than 31), we encounter undefined behavior.
One remedy to this is the following:
uint32_t rotl(uint32_t bits, uint32_t shift) {
shift %= sizeof(bits) * 8;
return (bits << shift) | (bits >> (sizeof(bits) * 8 - shift));
}
Recompiling and running SAW gives:
$ clang -c -emit-llvm -o artifacts/rotl.bc src/rotl2.c && python3 proof/rotl.py
[03:11:54.334] Verifying rotl ...
[03:11:54.334] Simulating rotl ...
[03:11:54.335] Checking proof obligations rotl ...
[03:11:54.351] Subgoal failed: rotl safety assertion:
internal: error: in rotl
Undefined behavior encountered
Details:
Poison value created
The second operand of `shl` was equal to or greater than the number of bits in the first operand
[03:11:54.351] SolverStats {solverStatsSolvers = fromList ["W4 ->z3"], solverStatsGoalSize = 388}
[03:11:54.351] ----------Counterexample----------
[03:11:54.351] shift0: 0
[03:11:54.351] ----------------------------------
F
======================================================================
FAIL: test_rotl (__main__.rotlTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "proof/rotl.py", line 31, in test_rotl
self.assertIs(rotl_result.is_success(), True)
AssertionError: False is not True
----------------------------------------------------------------------
Ran 1 test in 0.723s
FAILED (failures=1)
🛑 The goal failed to verify.~
Aha! The counter example shows that we forgot about the case when
shift = 0
! This causes (sizeof(bits) * 8 - 0)
to be 32
,
which is equal to the word-size of bits
, and hence causes <<
to
exhibit undefined behavior.
Let’s try again with
uint32_t rotl(uint32_t bits, uint32_t shift) {
shift %= sizeof(bits)*8;
if(shift == 0) return bits;
return (bits << shift) | (bits >> (sizeof(bits) * 8 - shift));
}
$ clang -c -emit-llvm -o artifacts/rotl.bc src/rotl.c && python3 proof/rotl.py
[03:14:09.561] Verifying rotl ...
[03:14:09.562] Simulating rotl ...
[03:14:09.563] Checking proof obligations rotl ...
[03:14:09.614] Proof succeeded! rotl
✅ Verified: lemma_rotl_Contract (defined at cryptol-course/labs/SAW/rotl/proof/rotl.py:30)
.
----------------------------------------------------------------------
Ran 1 test in 0.780s
OK
✅ The goal was verified!
Finally, SAW is happy. More importantly, the C is correct and free of undefined behavior!
Note that we provide a Makefile for our SAW labs. If you
navigate to the target lab, you can run make prove
to generate
the bitcode for the module and run the proof!
$ cd labs/SAW/rotl
$ make prove
mkdir -p artifacts
clang -c -g -emit-llvm -o artifacts/rotl.bc src/rotl.c
python3 proof/rotl.py
Pointers and Arrays
We’ll begin by writing a function that given two arrays of a common fixed size, say five, adds the second to the first. One way to accomplish this is to pass in the two arrays, mutate the first and return nothing:
void addRow5Mutate(uint32_t a[5], uint32_t b[5]) {
for(int i = 0 ; i < 5; i++) {
a[i] += b[i];
}
return;
}
The corresponding Cryptol specification is:
addRow5 : [5][32] -> [5][32] -> [5][32]
addRow5 a b = a + b
Initializing Arrays and Pointers
To initialize the arrays and pointers we’ll use the alloc
command
and array_ty
type constructor:
class addRow5Mutate_Contract(Contract):
def specification(self):
a = self.fresh_var(array_ty(5, i32), "a")
a_p = self.alloc(array_ty(5, i32), points_to=a)
b = self.fresh_var(array_ty(5, i32), "b")
b_p = self.alloc(array_ty(5, i32), points_to=b, read_only=True)
self.execute_func(a_p, b_p)
self.points_to(a_p, cry_f("addRow {a} {b}"))
self.returns(void)
- The
array_ty(type, length)
command creates a type representing an array with entries of typetype
and lengthlength
. - The
alloc
command will initialize a symbolic pointer. Let’s break down the initialization ofb_p
:
b_p |
self.alloc( |
array_ty(5, i32) |
, |
points_to=b |
, |
read_only=True |
) |
---|---|---|---|---|---|---|---|
Assign this variable to | a symbolic pointer in the current contract | that has the following type | and | points to this object | with | read only permissions | . |
Since arrays are passed as pointers in C, when we call execute_func
we supply a_p
and b_p
rather than a
and b
.
To verify correctness, we assert after execution a_p
points to what the Cryptol specification claims it should:
self.points_to( |
a_p, |
cry_f( "addRow {a} {b}" |
)) |
---|---|---|---|
Assert in the current contract that the pointer | with this name | points to this Cryptol term | . |
Finally, specification
must contain self.returns
or self.returns_f
, so we use self.returns(void)
.
Python Helper Functions
To limit code reuse we can define helper functions in Python. Code reuse is good. Code repetition is bad! For example, the following construct is often used:
def ptr_to_fresh(c : Contract, ty : LLVMType, name : Optional[str] = None, read_only : Optional[bool] = False) -> Tuple[FreshVar, SetupVal]:
var = c.fresh_var(ty, name)
ptr = c.alloc(ty, points_to = var, read_only=read_only)
return (var, ptr)
Given a contract and a type this function outputs a tuple (var, ptr)
where var
is a fresh symbolic variable of the given type and ptr
is a pointer pointing to this variable. We give optional arguments to
name the fresh symbolic variable and to assert read-only pointer
constraints.
To see this in action, let’s rewrite our previous contract:
class addRow5Mutate_Contract(Contract):
def specification(self):
(a, a_p) = ptr_to_fresh(self, array_ty(5, i32), name="a")
(b, b_p) = ptr_to_fresh(self, array_ty(5, i32), name="b", read_only=True)
self.execute_func(a_p, b_p)
self.points_to(a_p, cry_f("addRow {a} {b}"))
self.returns(void)
Auxiliary Variables
In this section we discuss an alternative way add two rows by using an auxiliary variable. For example, consider the function
uint32_t* addRow5NewVar(uint32_t a[5], uint32_t b[5]) {
uint32_t* c = (uint32_t*) malloc(5*sizeof(uint32_t));
for(int i = 0 ; i < 5; i++) {
c[i] = a[i] + b[i];
}
return c;
}
Recall that a SAW contract is strictly divided into three parts:
- Preconditions
- Execution
- Postconditions
SAW complains if you place a precondition after execute_func
and
similarly for postcondition. If a function returns a value that was
not passed through execute_func
, then you will have to initialize
new fresh symbolic variables. For example, consider the proposed
contract for addRow5NewVar
:
class addRow5NewVar_Contract(Contract):
def specification(self):
(a, a_p) = ptr_to_fresh(self, array_ty(5, i32), name="a")
(b, b_p) = ptr_to_fresh(self, array_ty(5, i32), name="b", read_only=True)
(c, c_p) = ptr_to_fresh(self, array_ty(5, i32), name="c")
self.execute_func(a_p, b_p)
self.points_to(c_p, cry_f("addRow {a} {b}"))
self.returns(c_p)
Running a unit test yields the following error message:
[16:42:51.066] Subgoal failed: addRow5NewVar safety assertion:
internal: error: in _SAW_verify_prestate SAWServer
The following pointers had to alias, but they didn't:
(12, 0x0:[64])
(7, 0x0:[64])
[16:42:51.067] SolverStats {solverStatsSolvers = fromList ["W4 ->z3"], solverStatsGoalSize = 29}
[16:42:51.067] ----------Counterexample----------
[16:42:51.067] <<All settings of the symbolic variables constitute a counterexample>>
[16:42:51.067] ----------------------------------
Think about the precondition block, the part before execute_func
, as
setting up the symbolic variables before they enter the
function. With this in mind, it doesn’t make sense to declare c_p
in
this block because c_p
is defined within the C function. The fix
to the previous contract is moving the declaration of c_p
to the
postcondition block:
class addRow5NewVar_Contract(Contract):
def specification(self):
(a, a_p) = ptr_to_fresh(self, array_ty(5, i32), name="a")
(b, b_p) = ptr_to_fresh(self, array_ty(5, i32), name="b", read_only=True)
self.execute_func(a_p, b_p)
(c, c_p) = ptr_to_fresh(self, array_ty(5, i32), name="c")
self.points_to(c_p, cry_f("addRow {a} {b}"))
self.returns(c_p)
Python supports wildcards, denoted by _
, like Cryptol. Wildcards are placeholders for values we don’t use. For example, we could rewrite the addRow5NewVar_Contract
as follows:
class addRow5NewVar_Contract(Contract):
def specification(self):
(a, a_p) = ptr_to_fresh(self, array_ty(5, i32), name="a")
(b, b_p) = ptr_to_fresh(self, array_ty(5, i32), name="b", read_only=True)
self.execute_func(a_p, b_p)
(_, c_p) = ptr_to_fresh(self, array_ty(5, i32), name="c")
self.points_to(c_p, cry_f("addRow {a} {b}"))
self.returns(c_p)
Postconditions and points_to
One could replace the points_to
line with a postcondition_f
line to get an equivalent contract:
class addRow5NewVar_Contract(Contract):
def specification(self):
(a, a_p) = ptr_to_fresh(self, array_ty(5, i32), name="a", read_only=True)
(b, b_p) = ptr_to_fresh(self, array_ty(5, i32), name="b", read_only=True)
self.execute_func(a_p, b_p)
(c, c_p) = ptr_to_fresh(self, array_ty(5, i32), name="c")
self.postcondition_f("{c} == addRow {a} {b}")
self.returns(c_p)
While Cryptol has arrays, it doesn’t have a notion of pointer, so we can’t pass a symbolic pointer into cry_f
. For example, if we wanted to assert a_p
points to b_p
then we can use self.points_to(a_p, b_p)
but NOT self.postcondition_f("{a_p} == {b_p}")
.
Parameterized Contracts
Now let’s consider a third possible way to write the same function. Here we are given two arrays of arbitrary size and a length.
uint32_t* addRowAlias(uint32_t* a, uint32_t* b, uint8_t length) {
for(int i = 0 ; i < length; i++) {
a[i] += b[i];
}
return a;
}
A corresponding Cryptol specification might is
addRow : {length} (fin length) => [length][32] -> [length][32] -> [length][32]
addRow a b = a + b
SAW currently does not have inductive reasoning capabilities and so can only reason about concrete types. However, SAWCore terms can be formalized in Coq, a much more powerful theorem prover that has inductive reasoning (and many more) capabilities. This is obviously well beyond the scope of the course, but worth mentioning.
We could make a new contract for each value of length
used in the C
code. Instead we make a single contract with a length
parameter:
class addRowAlias_Contract(Contract):
def __init__(self, length : int):
super().__init__()
self.length = length
def specification(self):
(a, a_p) = ptr_to_fresh(self, array_ty(self.length, i32), name="a")
(b, b_p) = ptr_to_fresh(self, array_ty(self.length, i32), name="b", read_only=True)
length = fresh_var(i8, "length")
self.execute_func(a_p, b_p, length)
self.points_to(a_p, cry_f("addRow`{{{self.length}}} {a} {b}"))
self.returns(a_p)
However, the above is not quite right just yet. As well we also need to make a test for each individual length we desire to verify (say, lengths 5 and 10):
class ArrayTests(unittest.TestCase):
def test_rowAdds(self):
connect(reset_server=True)
if __name__ == "__main__": view(LogResults(verbose_failure=True))
bcpath = "/some/path/to/your/file.bc"
crypath = "/some/path/to/your/file.cry"
cryptol_load_file(crypath)
mod = llvm_load_module(bcpath)
addRowAlias05_result = llvm_verify(mod, 'addRowAlias', addRowAlias_Contract(5))
self.assertIs(addRowAlias05_result.is_success(), True)
addRowAlias10_result = llvm_verify(mod, 'addRowAlias', addRowAlias_Contract(10))
self.assertIs(addRowAlias10_result.is_success(), True)
Full Code Example and Debugging SAW
from pathlib import Path
import unittest
from saw_client import *
from saw_client.crucible import *
from saw_client.llvm import *
from saw_client.proofscript import *
from saw_client.llvm_type import *
def ptr_to_fresh(c : Contract, ty : LLVMType, name : Optional[str] = None, read_only : Optional[bool] = False) -> Tuple[FreshVar, SetupVal]:
var = c.fresh_var(ty, name)
ptr = c.alloc(ty, points_to = var, read_only=read_only)
return (var, ptr)
class addRow5Mutate_Contract(Contract):
def specification(self):
(a, a_p) = ptr_to_fresh(self, array_ty(5, i32), name="a")
(b, b_p) = ptr_to_fresh(self, array_ty(5, i32), name="b", read_only=True)
self.execute_func(a_p, b_p)
self.points_to(a_p, cry_f("addRow {a} {b}"))
self.returns(void)
class addRow5NewVar_Contract(Contract):
def specification(self):
(a, a_p) = ptr_to_fresh(self, array_ty(5, i32), name="a")
(b, b_p) = ptr_to_fresh(self, array_ty(5, i32), name="b", read_only=True)
self.execute_func(a_p, b_p)
(c, c_p) = ptr_to_fresh(self, array_ty(5, i32), name="c")
self.points_to(c_p, cry_f("addRow {a} {b}"))
self.returns(c_p)
class addRowAlias_Contract(Contract):
def __init__(self, length : int):
super().__init__()
self.length = length
def specification(self):
(a, a_p) = ptr_to_fresh(self, array_ty(self.length, i32), name="a")
(b, b_p) = ptr_to_fresh(self, array_ty(self.length, i32), name="b", read_only=True)
length = fresh_var(i8, "length")
self.execute_func(a_p, b_p, length)
self.points_to(a_p, cry_f("addRow`{{{self.length}}} {a} {b}"))
self.returns(a_p)
class ArrayTests(unittest.TestCase):
def test_rowAdds(self):
connect(reset_server=True)
if __name__ == "__main__": view(LogResults(verbose_failure=True))
basedir = Path(__file__).absolute().parents[1]
bcpath = basedir/"artifacts/addRow.bc"
crypath = basedir/"specs/addRow.cry"
cryptol_load_file(str(crypath))
mod = llvm_load_module(str(bcpath))
addRow5Mutate_result = llvm_verify(mod, 'addRowMutate', addRowMutate_Contract())
self.assertIs(addRow5Mutate_result.is_success(), True)
addRow5NewVar_result = llvm_verify(mod, 'addRowNewVar', addRowNewVar_Contract())
self.assertIs(addRow5NewVar_result.is_success(), True)
addRowAlias05_result = llvm_verify(mod, 'addRowAlias', addRowAlias_Contract(5))
self.assertIs(addRowAlias05_result.is_success(), True)
addAddAlias10_result = llvm_verify(mod, 'addRowAlias', addRowAlias_Contract(10))
self.assertIs(addRowAlias10_result.is_success(), True)
What do you think will happen if we run this code?
$ make prove -C labs/SAW/addRow
make: Entering directory '/workspace/cryptol-course/labs/SAW/addRow'
mkdir -p artifacts
clang -c -g -emit-llvm -o artifacts/addRow.bc src/addRow.c
python3 proof/addRow.py
[15:40:51.330] Verifying addRow5Mutate ...
[15:40:51.330] Simulating addRow5Mutate ...
[15:40:51.335] Checking proof obligations addRow5Mutate ...
[15:40:51.362] Proof succeeded! addRow5Mutate
✅ Verified: lemma_addRow5Mutate_Contract (defined at /home/cryptol/cryptol-course/labs/SAW/addRow/proof/addRow.py:64)
[15:40:51.430] Verifying addRow5NewVar ...
[15:40:51.430] Simulating addRow5NewVar ...
[15:40:51.435] Checking proof obligations addRow5NewVar ...
[15:40:51.462] Proof succeeded! addRow5NewVar
✅ Verified: lemma_addRow5NewVar_Contract (defined at /home/cryptol/cryptol-course/labs/SAW/addRow/proof/addRow.py:67)
SAW verifies the first two contracts but fails to verify the third contract. It alerts us there is a memory error:
[15:40:51.527] Verifying addRowAlias ...
[15:40:51.528] Simulating addRowAlias ...
[15:40:51.532] Symbolic simulation completed with side conditions.
[15:40:51.535] Checking proof obligations addRowAlias ...
[15:40:51.575] Subgoal failed: addRowAlias safety assertion:
internal: error: in addRowAlias
Error during memory load
and even produces the following counterexample:
[15:40:51.575] SolverStats {solverStatsSolvers = fromList ["W4 ->z3"], solverStatsGoalSize = 331}
[15:40:51.575] ----------Counterexample----------
[15:40:51.575] length0: 6
[15:40:51.575] : False
[15:40:51.575] ----------------------------------
⚠️ Failed to verify: lemma_addRowAlias_Contract (defined at /home/cryptol/cryptol-course/labs/SAW/addRow/proof/addRow.py:38):
error: Proof failed.
stdout:
[15:40:51.527] Verifying addRowAlias ...
[15:40:51.528] Simulating addRowAlias ...
[15:40:51.532] Symbolic simulation completed with side conditions.
[15:40:51.535] Checking proof obligations addRowAlias ...
[15:40:51.575] Subgoal failed: addRowAlias safety assertion:
internal: error: in addRowAlias
Error during memory load
[15:40:51.575] SolverStats {solverStatsSolvers = fromList ["W4 ->z3"], solverStatsGoalSize = 331}
[15:40:51.575] ----------Counterexample----------
[15:40:51.575] length0: 6
[15:40:51.575] : False
[15:40:51.575] ----------------------------------
F
======================================================================
FAIL: test_rowAdds (__main__.ArrayTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "proof/addRow.py", line 71, in test_rowAdds
self.assertIs(addRowAlias05_result.is_success(), True)
AssertionError: False is not True
----------------------------------------------------------------------
Ran 1 test in 1.735s
FAILED (failures=1)
🛑 1 out of 3 goals failed to verify.
SAW is telling us we forgot to add a precondition to assert our
symbolic length
agrees with our Python parameter self.length
. This
is an easy fix:
class addRowAlias_Contract(Contract):
def __init__(self, length : int):
super().__init__()
self.length = length
def specification(self):
(a, a_p) = ptr_to_fresh(self, array_ty(self.length, i32), name="a")
(b, b_p) = ptr_to_fresh(self, array_ty(self.length, i32), name="b", read_only=True)
length = self.fresh_var(i8, "length")
self.precondition_f("{length} == {self.length}")
self.execute_func(a_p, b_p, length)
self.points_to(a_p, cry_f("addRow`{{{self.length}}} {a} {b}"))
self.returns(a_p)
And now SAW happily verifies the third and fourth contracts!
$ python3 addRow.py
✅ Verified: lemma_addRow5Mutate_Contract (defined at /home/cryptol/cryptol-course/labs/SAW/proof/addRow.py:64)
✅ Verified: lemma_addRow5NewVar_Contract (defined at /home/cryptol/cryptol-course/labs/SAW/proof/addRow.py:67)
✅ Verified: lemma_addRowAlias_Contract (defined at /home/cryptol/cryptol-course/labs/SAW/proof/addRow.py:38)
✅ Verified: lemma_addRowAlias_Contract (defined at /home/cryptol/cryptol-course/labs/SAW/proof/addRow.py:38)
.
----------------------------------------------------------------------
Ran 1 test in 1.115s
OK
✅ All 4 goals verified!
Note that we can be even more efficient in our proof. Instead of
creating a symbolic length
variable, we can just use the Python
parameter self.length
. Doing so will simplify the solver’s proof
workload as it really only needs to consider one concrete value
rather than one value from a range of random symbolic ones.
class addRowAlias_Contract(Contract):
def __init__(self, length : int):
super().__init__()
self.length = length
def specification(self):
(a, a_p) = ptr_to_fresh(self, array_ty(self.length, i32), name="a")
(b, b_p) = ptr_to_fresh(self, array_ty(self.length, i32), name="b", read_only=True)
self.execute_func(a_p, b_p, cry(self.length))
self.points_to(a_p, cry_f("addRow`} {a} {b}"))
self.returns(a_p)
Explicit Arrays
Another way to initialize arrays is through the array
command. For
example, suppose we have the following C function (taken from
here):
#include <stdint.h>
void array_swap(uint32_t a[2]) {
uint32_t tmp = a[0];
a[0] = a[1];
a[1] = tmp;
}
Here’s a contract that doesn’t even require Cryptol (taken from here):
class ArraySwapContract(Contract):
def specification(self):
a0 = self.fresh_var(i32, "a0")
a1 = self.fresh_var(i32, "a1")
a = self.alloc(array_ty(2, i32), points_to=array(a0, a1))
self.execute_func(a)
self.points_to(a[0], a1)
self.points_to(a[1], a0)
self.returns(void)
Explicit arrays can be useful when you want to assert a condition on a particular element of the array. Of course, a drawback is you have to initialize every member of the array. This can be problematic for large arrays. Ideally, one would just have to declare the array implicitly as before and then pass this array to a Cryptol specification for postconditions as was done in the previous examples.
Null Pointers
Consider the following C code to check if a pointer is null:
int f(int *x) {
return (x == (int*)0);
}
The following contract has possibly unexpected behavior (in C booleans are encoded as 0
for false and 1
for true).
from pathlib import Path
import unittest
from saw_client import *
from saw_client.crucible import *
from saw_client.llvm import *
class FContract(Contract):
def specification(self):
p = self.alloc(i32)
self.execute_func(p)
self.returns(cry("1 : [32]"))
class LLVMAssertNullTest(unittest.TestCase):
def test_llvm_assert_null(self):
connect(reset_server=True)
if __name__ == "__main__": view(LogResults(verbose_failure=True))
basedir = Path(__file__).absolute().parents[1]
bcpath = basedir/"artifacts/null.bc"
mod = llvm_load_module(str(bcpath))
result = llvm_verify(mod, 'f', FContract())
self.assertIs(result.is_success(), True)
if __name__ == "__main__":
unittest.main()
It turns out the contract above will fail!
$ make prove -C labs/SAW/null
make: Entering directory '/workspace/cryptol-course/labs/SAW/null'
mkdir -p artifacts
clang -c -g -emit-llvm -o artifacts/null.bc src/null.c
python3 proof/null.py
[17:21:44.701] Verifying isNull ...
[17:21:44.701] Simulating isNull ...
[17:21:44.703] Checking proof obligations isNull ...
[17:21:44.724] Subgoal failed: isNull safety assertion:
internal: error: in _SAW_verify_prestate SAWServer
Literal equality postcondition
[more error message]
FAILED (failures=1)
🛑 The goal failed to verify.
The counterexample produced may seem mystical.
[17:21:44.725] ----------Counterexample----------
[17:21:44.725] <<All settings of the symbolic variables constitute a counterexample>>
[17:21:44.725] ----------------------------------
However, if every setting is a counterexample, then this is telling us the pointer must have been the null pointer! An initialized symbolic pointer that hasn’t been assigned a symbolic variable to point to is NOT equivalent to a null pointer in SAW. We can use null()
in situations where we want a null pointer. For example, if we change the contract above to
class FContract(Contract):
def specification(self):
self.execute_func(null())
self.returns(cry("1 : [32]"))
then SAW is happy.
$ python3 null.py
[17:33:50.802] Verifying isNull ...
[17:33:50.802] Simulating isNull ...
[17:33:50.804] Checking proof obligations isNull ...
[17:33:50.804] Proof succeeded! isNull
✅ Verified: lemma_isNull_Contract (defined at proof/null.py:22)
.
----------------------------------------------------------------------
Ran 1 test in 1.301s
OK
✅ The goal was verified!
This example is from the GaloisInc/saw-script repo.
Structs
In this section we will learn how to verify code involving structs by analyzing a game. The code for the game can be found here.
To complete this lab, navigate to the Game directory. In there, you’ll notice the following:
Makefile
: Provides the necessary steps to generate our bitcode and run our SAW Python scripts.src/
: Contains the source code we’ll be analyzing.proof/
: Contains our Python scripts to run our SAW contracts. Your job will be to complete theTODO
sections marked throughout Game.py. If you get stuck, you can refer to Game_answers.py or look at the discussions mentioned later in this markdown file!specs/
: Contains our Cryptol specs that our SAW contracts can call. Feel free to add your own Cryptol functions in Game.cry to help you complete this lab!DLC/
: Contains an extended version of this lab (think Downloadable Content) with even more Game functions for you to play with! While there aren’t any lab worksheets included in there, you can reference the contents to learn how to tackle additional functions. For more information regarding to what each function intends to teach, refer to GameDLC.md.
With that knowledge, make sure you have start-saw-remote-api
running, open up
Game.py, fill out your answers, and test your work by running make prove
.
Game on!
Struct Initialization
The game defines the following sprite_t
and character_t
types.
#define MAX_NAME_LENGTH 12
#define GAITS 2
#define DIRECTIONS 4
#define ANIMATION_STEPS 3
typedef struct {
uint8_t frames[GAITS][DIRECTIONS][ANIMATION_STEPS];
uint32_t xPos;
uint32_t yPos;
} sprite_t;
typedef struct {
uint8_t name[MAX_NAME_LENGTH];
uint32_t level;
uint32_t hp;
uint32_t atk;
uint32_t def;
uint32_t spd;
sprite_t* sprite;
} character_t;
typedef character_t player_t;
For now, let’s focus on the player_t
type. Note that player_t
is just an
alias for character_t
. Let’s consider the function, initDefaultPlayer
,
that initializes a player with some default values.
player_t* initDefaultPlayer()
{
player_t* player = (player_t*) malloc(sizeof(player_t));
uint8_t i = 0;
uint32_t hp_default = 10;
uint32_t atk_default = 5;
uint32_t def_default = 4;
uint32_t spd_default = 3;
for (i = 0; i < MAX_NAME_LENGTH; i++)
{
player->name[i] = 'A';
}
player->level = 1;
player->hp = hp_default;
player->atk = atk_default;
player->def = def_default;
player->spd = spd_default;
player->sprite = NULL;
return player;
}
Observing initDefaultPlayer
, we can see that the function first allocates
memory for a player_t
struct called player
. The function then assigns a
value to each of player
’s fields. When it comes to the sprite
field, the
function sets that value to NULL
. The Game
library assumes that a
character’s sprite (and by extension, player sprites too) can only be set
once. Given that the only information known to the game is to initialize a
default player, it does not yet know what sprite information it should use.
The game will rely on a separate function call to initialize the sprite
field
when more information is known later.
We can craft a contract to verify this initialization function like so:
MAX_NAME_LENGTH = 12
class initDefaultPlayer_Contract(Contract):
def specification (self):
self.execute_func()
player = self.alloc(alias_ty("struct.character_t"))
self.points_to(player['name'], cry_f("repeat 0x41 : [{MAX_NAME_LENGTH}][8]"))
self.points_to(player['level'], cry_f("1 : [32]"))
self.points_to(player['hp'], cry_f("10 : [32]"))
self.points_to(player['atk'], cry_f("5 : [32]"))
self.points_to(player['def'], cry_f("4 : [32]"))
self.points_to(player['spd'], cry_f("3 : [32]"))
self.points_to(player['sprite'], null())
self.returns(player)
Notice that our contract immediately goes into the execute state. There isn’t a
need to define any variables in the initial state since there aren’t any inputs
passed to initDefaultPlayer
. Although the function does allocate memory for
player
, that is done within the function, and passed as a return value. As
such, we represent that allocation in the contract’s final state.
For every C symbol defined using #define
in the source code, we make a
corresponding Python global variable. At this point, we only care about
MAX_NAME_LENGTH
since that is the only constant initDefaultPlayer
references.
The command alias_ty("struct.<typedef name>")
creates a type corresponding
to the structure. In our contract,
player = self.alloc(alias_ty("struct.character_t"))
creates a symbolic pointer variable, player
, pointing to a structure of type
character_t
. Since player_t
is an alias for character_t
, we need to use
the base struct name, character_t
. If we instead defined our player
symbolic pointer as
player = self.alloc(alias_ty("struct.player_t"))
SAW would throw an error saying that player_t
is an unknown identifier:
error: unsupported type: %struct.player_t
Details:
Unknown type alias Ident "struct.player_t"
It is not SAW’s fault that player_t
is an unknown type alias. The issue here
is that knowledge of player_t
isn’t maintained when the C source code is
compiled to LLVM bitcode. To illustrate this point, let’s take a look at what
is contained in Game.bc
:
$ cd labs/SAW/Game
$ make
mkdir -p artifacts
clang -c -g -emit-llvm -o artifacts/Game.bc src/Game.c
$ saw
sawscript> r <- llvm_load_module "artifacts/Game.bc"
sawscript> print r
[00:24:06.901] Module: artifacts/Game.bc
Types:
%struct.character_t = type { [12 x i8], i32, i32, i32, i32, i32,
%struct.sprite_t* }
%struct.sprite_t = type { [2 x [4 x [3 x i8]]], i32, i32 }
Globals:
External references:
declare default void @llvm.dbg.declare(metadata, metadata,
metadata)
declare default i8* @malloc(i64)
Definitions:
%struct.character_t* @initDefaultPlayer()
i32 @initDefaultSprite(%struct.character_t* %0)
void @resolveAttack(%struct.character_t* %0, i32 %1)
i32 @checkStats(%struct.character_t* %0)
Even though Game.h
contains:
typedef character_t player_t;
the generated bitcode does not keep this information. In fact, the function
prototype for initDefaultPlayer()
in Game.h
is:
player_t* initDefaultPlayer();
However, as we saw in above in the above LLVM bitcode, the definition for
initDefaultPlayer()
has the return type %struct.character_t*
.
Remember that SAW evaluates bitcode during Formal Verification. If the information isn’t contained in the bitcode, it isn’t known to SAW.
Let’s continue and breakdown one of the contract’s points_to
command:
self.points_to( |
player |
['name'] |
, |
cry_f("repeat 0x41 : [{MAX_NAME_LENGTH}][8]") |
) |
---|---|---|---|---|---|
Assert in the current contract that the following pointer | with this name | points to a struct with this named field | and the value of that field is | this expression | . |
Above, we use strings to reference fields of structures. However, we can only do
this when strings are present in the bitcode. In other words, when debug
symbols are included in the generated bitcode. The -g
clang flag tells the
compiler to include the field names of the structs in the bitcode. For the
full compilation details, check out the Makefile template,
prove.mk,
which is used for all of our SAW labs.
If we didn’t have the debug symbols in the bitcode, SAW would produce an error:
$ cd labs/SAW/Game
$ mkdir artifacts
$ clang -c -emit-llvm -o artifacts/Game.bc src/Game.c
$ python3 proof/Game.py
⚠️ Failed to verify: lemma_initDefaultPlayer_Contract (defined at proof/Game.py:513):
error: Unable to resolve struct field name: '"name"'
Could not resolve setup value debug information into a struct type.
Perhaps you need to compile with debug symbols enabled.
If the bitcode lacked debug symbols, then could instead reference the struct fields by using their corresponding indices:
class initDefaultPlayer_Contract(Contract):
def specification (self):
self.execute_func(player)
self.points_to(player[0], cry_f("repeat 0x41 : [{MAX_NAME_LENGTH}][8]"))
self.points_to(player[1], cry_f("1 : [32]"))
self.points_to(player[2], cry_f("10 : [32]"))
self.points_to(player[3], cry_f("5 : [32]"))
self.points_to(player[4], cry_f("4 : [32]"))
self.points_to(player[5], cry_f("3 : [32]"))
self.points_to(player[6], null())
self.returns(player)
Explicit Structs
We can also explicitly define a struct in SAW. Let’s go back and consider the
sprite_t
struct:
#define GAITS 2
#define DIRECTIONS 4
#define ANIMATION_STEPS 3
typedef struct {
uint8_t frames[GAITS][DIRECTIONS][ANIMATION_STEPS];
uint32_t xPos;
uint32_t yPos;
} sprite_t;
The idea behind the sprite_t
struct is to hold all of the sprites associated
with a character that we will present in our game. frames
is a 3D uint8_t
array where each element represents an asset identifier from our art collection.
Why a 3D array? Well, we want to provide animations for our characters that way
it looks like they are moving on the screen (and it’s a great excuse to discuss
multi-dimensional arrays in SAW).
- The first dimension refers to the number of gaits we want to represent, that being walking and running.
- The second dimension refers to the number of directions a character can face. Imagine that we are working with a 2D top down game, so we have 4 directions: up, down, left, and right.
- The third dimension refers to the number of frames per movement.
Let’s think about how we walk forward (feel free to try this at home). If you are walking forward, you first stand up, move one foot in front of the other, place that foot down, lift up your back foot, and move that back foot ahead of your front foot. Rinse and repeat, and you’ll be walking ‘cross the floor.
Now that’s a lot of steps! We can simplify this process by only considering 3 frames: standing up, left foot in front, and right foot in front. We can then reuse the standing up frame as a transition between the left foot forward and the right foot forward frames.
Alright, that’s enough of a crash course in animation. Let’s get back to our
struct. We have two more fields: xPos
and yPos
. These are simply positional
references for where a character is located relative to the screen.
With that understanding, let’s consider a function that uses the sprite_t
struct:
uint32_t initDefaultSprite(character_t* character)
{
if (character == NULL)
{
return FAILURE;
}
else if (character->sprite != NULL)
{
return FAILURE;
}
sprite_t* sprite = (sprite_t*) malloc(sizeof(sprite_t));
for (uint8_t i = 0; i < GAITS; i++)
{
for (uint8_t j = 0; j < DIRECTIONS; j++)
{
for (uint8_t k = 0; k < ANIMATION_STEPS; k++)
{
sprite->frames[i][j][k] = 0x00;
}
}
}
sprite->xPos = 1;
sprite->yPos = 2;
character->sprite = sprite;
return SUCCESS;
}
This function uses two more constants, SUCCESS
and FAILURE
, which are
defined as:
#define SUCCESS 170
#define FAILURE 85
Those are some strange looking values, huh? Normally we would want to represent
successes and failures as booleans TRUE
and FALSE
, so 1
and 0
respectively right? Well, we could but the
Hamming Distance
between those values is just 1. This means a single bit flip could turn a
SUCCESS
into FAILURE
and vice versa. Software tends to rely on the validity
of status codes for both functionality and security reasons. If we want to
increase our confidence that these status values are what they should be, we
should increase the Hamming Distance to better differentiate our statuses.
Decimal Value | Hex Representation | Binary Representation |
---|---|---|
170 | 0xAA | 10101010 |
85 | 0x55 | 01010101 |
We can see that 170 and 85 have a Hamming Distance of 8. Take that bit flips!
As a side note, keep in mind that the constants SUCCESS
and FAILURE
are
actually 32-bit integers in C. While we could further increase the Hamming
Distance between the two numbers, we wanted to show a simplified proof of
concept here. Alright, let’s go back to understanding initDefaultSprite
.
The function first performs input checking on character
to ensure that it is
not a NULL
pointer. After all, it would be a shame to perform a NULL
dereference. The function then checks character
’s sprite
field to uphold
the Game
library’s assumption that character sprites can only be set once.
If any of these checks fail, initDefaultSprite
immediately terminates and
returns FAILURE
. Otherwise, the function continues to do its job in
allocating memory for a sprite_t
struct and initializing the sprite
fields for character
.
Now, let’s go ahead and make a contract to represent this function:
SUCCESS = 170
FAILURE = 85
GAITS = 2
DIRECTIONS = 4
ANIMATION_STEPS = 3
class initDefaultSprite_Contract(Contract):
def specification (self):
name = self.fresh_var(array_ty(MAX_NAME_LENGTH, i8), "character.name")
level = self.fresh_var(i32, "character.level")
hp = self.fresh_var(i32, "character.hp")
atk = self.fresh_var(i32, "character.atk")
defense = self.fresh_var(i32, "character.def")
spd = self.fresh_var(i32, "character.spd")
character_p = self.alloc( alias_ty("struct.character_t")
, points_to = struct( name
, level
, hp
, atk
, defense
, spd
, null() ))
self.execute_func(character_p)
sprite_p = self.alloc( alias_ty("struct.sprite_t")
, points_to = struct( cry_f("zero : [{GAITS}][{DIRECTIONS}][{ANIMATION_STEPS}][8]")
, cry_f("1 : [32]")
, cry_f("2 : [32]") ))
self.points_to(character_p, struct( name
, level
, hp
, atk
, defense
, spd
, sprite_p ))
self.returns_f("`({SUCCESS}) : [32]")
There’s quite a bit going on with the three uses of the struct
keyword.
Let’s look at them one at a time. Like array
, the struct
keyword
declares a symbolic struct given a variable for each struct field.
The First struct
Instance:
character_p = self.alloc( alias_ty("struct.character_t")
, points_to = struct( name
, level
, hp
, atk
, defense
, spd
, null() ))
For character_p
, we assert the precondition that our pointer points to this
symbolic struct containing our fresh variables name
, level
, hp
, atk
,
defense
, spd
, and null()
. Notice that we don’t need to allocate memory
yet in the contract’s initial state for a sprite_t
struct since the function
assumes the passed character
does not have an allocated sprite. As a result,
we match the function’s assumption that the sprite
field is NULL
.
The Second struct
Instance:
sprite_p = self.alloc( alias_ty("struct.sprite_t")
, points_to = struct( cry_f("zero : [{GAITS}][{DIRECTIONS}][{ANIMATION_STEPS}][8]")
, cry_f("1 : [32]")
, cry_f("2 : [32]") ))
After the call to symbolically execute the function, we allocate memory for
a symbolic sprite_t
struct. In the same line, we also set a postcondition
using struct
to assert how the sprite
fields should look according to
initDefaultSprite
.
The Third struct
Instance:
self.points_to(character_p, struct( name
, level
, hp
, atk
, defense
, spd
, sprite_p ))
Now that sprite_p
has been allocated and its field values were asserted, we
connect sprite_p
to character_p
. The other fields for character
do not
change in the function, so we can reuse the symbolic variables that were
defined in the contract’s initial state.
As we can see, the struct
keyword is quite versatile as we can pass symbolic
variables, symbolic pointers, and even define specific Cryptol values to it.
Structs as Cryptol Tuples
Explicit structs is not the only way to represent values for preconditions and postconditions. We can also use tuples since Cryptol interprets symbolic structs as tuples.
For example, we could rewrite the second struct
instance explored above as:
sprite_p = self.alloc( alias_ty("struct.sprite_t")
, points_to = cry_f("""( zero : [{GAITS}][{DIRECTIONS}][{ANIMATION_STEPS}][8]
, 1 : [32]
, 2 : [32] )"""))
Notice that we use 3 double quotes """
in our cry_f
call. This technique is
useful when we want to separate our expected Cryptol-defined behaviors over
multiple lines to improve code readability. Python considers the 3 double
quotes as a multi-line string. Multi-line strings may also be used as block
comments in Python.
However, the setup we see above results in wide open spaces, which will be
noticeable when debugging strings passed to the family of cry
functions. These
spaces can be mitigated using dedent
from the textwrap
package that comes
with Python. For example:
from textwrap import dedent
# ...
self.returns(cry(dedent("""
(y, z)
where
y = "foo"
z = "bar"
""").strip()))
This renders (without leading/trailing whitespace) as:
(y, z)
where
y = "foo"
z = "bar"
Structs as Cryptol Records
While Cryptol’s record types could also represent structs, SAW does not currently support translating Cryptol’s record types into crucible-llvm’s type system. If we tried to represent the struct as a Cryptol record:
sprite_p = self.alloc( alias_ty("struct.sprite_t")
, points_to = cry_f("""{{ frames = zero : [{GAITS}][{DIRECTIONS}][{ANIMATION_STEPS}][8]
, xPos = 1 : [32]
, yPos = 2 : [32]
}}
"""))
SAW would return this error:
⚠️ Failed to verify: lemma_initDefaultSprite_Contract (defined at proof/Game.py:525):
error: SAW doesn't yet support translating Cryptol's record type(s) into crucible-llvm's type system.
If a Cryptol specification uses a record type to represent structs, then we can define a Python helper function for wrapping.
Exercise: Resolving an Attack!
Feeling pretty confident with our little player_t
and character_t
structs? How about we go for a full on attack then? Well, in our game of course ;)
Consider the following function:
void resolveAttack(character_t* target, uint32_t atk)
{
if ( target->def >= atk)
{
// The target's defense mitigates the attack
target->hp = target->hp;
}
else if ( target->hp <= (atk - target->def) )
{
// The attack will knock out the target
target->hp = 0;
}
else
{
// Calculate damage as normal
target->hp = target->hp - (atk - target->def);
}
}
Our function, resolveAttack
, takes two inputs:
character_t* target
: Our defending characteruint32_t atk
: The attack stat of our attacker
Think about how to make a SAW contract for this function. Go ahead and try it if you’d like! We’ll wait for you :)
Got an idea? Great! Let’s go over the steps we need to go through…
First, let’s set up our basic format for the contract:
class resolveAttack_Contract(Contract):
def specification (self):
# Declare variables
(target, target_p) = ptr_to_fresh(self, alias_ty("struct.character_t"), name="target")
atk = self.fresh_var(i32, "atk")
# Assert preconditions
# Symbolically execute the function
self.execute_func(target_p, atk)
# Assert postconditions
self.returns(void)
Alright, we got our basic setup! But wait a second, there are 3 possible assignments for the target’s hp… Each of these behaviors are determined by the stat ranges for the defender and attacker. How are we going to represent these different behaviors?
First, let’s consider the 3 possible cases for resolveAttack
:
- Case 1: Attack mitigated
- Precondition:
target->def >= atk
- Postcondition:
target->hp = target->hp
- Precondition:
- Case 2: Immediate KO
- Precondition:
target->hp <= (atk - target-> def)
- Postcondition:
target->hp = 0
- Precondition:
- Case 3: Regular attack calculation
- Precondition: Anything other than Cases 1 and 2
- Postcondition:
target->hp = target->hp - (atk - target->def)
With those 3 cases laid out, we now have a battle plan to represent the preconditions and postconditions needed for our proof. Now one option to account for all of these cases is to create 3 different contracts where each one specifies the appropriate precondition and postcondition pair for our case. Do we really want to copy and paste the bulk of our proof across 3 contracts though? Probably not. How about we bring our knowledge of parameterized contracts to battle! Tactics!
class resolveAttack_Contract(Contract):
def __init__(self, case : int):
super().__init__()
self.case = case
# There are 3 possible cases for resolveAttack
# Case 1: Attack mitigated
# Case 2: Immediate KO
# Case 3: Regular attack
def specification (self):
# Declare variables
(target, target_p) = ptr_to_fresh(self, alias_ty("struct.character_t"), name="target")
atk = self.fresh_var(i32, "atk")
# Assert preconditions
# Determine the preconditions based on the case parameter
if (self.case == 1):
# target->def >= atk
self.precondition_f("{target}.4 >= {atk}")
elif (self.case == 2):
# target->hp <= (atk - target->def)
self.precondition_f("({target}.2 + {target}.4) <= {atk}")
else:
# Assume any other case follows the formal attack calculation
self.precondition_f("{target}.4 < {atk}")
self.precondition_f("({target}.2 + {target}.4) > {atk}")
# Symbolically execute the function
self.execute_func(target_p, atk)
# Assert postconditions
# Determine the postcondition based on the case parameter
if (self.case == 1):
self.points_to(target_p['hp'], cry_f("{target}.2 : [32]"))
elif (self.case == 2):
self.points_to(target_p['hp'], cry_f("0 : [32]"))
else:
self.points_to(target_p['hp'], cry_f("resolveAttack ({target}.2) ({target}.4) {atk}"))
self.returns(void)
Our contract is looking pretty good now! Notice that we use {target}.2
and {target}.4
for our postconditions. This notation lets us access fields within our target type. Specifically, {target}.2
refers to the hp
field, and {target}.4
refers to the def
field.
It should be noted that using these cases in a parameterized contract is not the only way to set up our contract. We could have designed some fancy Cryptol that considers all of the different stat ranges for our preconditions and postconditions. However, we wanted to show this method of using cases since we can leverage Python’s if statements in our proofs.
Let’s set up our unit test:
class GameTests(unittest.TestCase):
def test_Game(self):
connect(reset_server=True)
if __name__ == "__main__": view(LogResults(verbose_failure=True))
basedir = Path(__file__).absolute().parents[1]
bcpath = basedir/"artifacts/Game.bc"
crypath = basedir/"specs/Game.cry"
cryptol_load_file(str(crypath))
module = llvm_load_module(str(bcpath))
resolveAttack_case1_result = llvm_verify(module, 'resolveAttack', resolveAttack_Contract(1))
resolveAttack_case2_result = llvm_verify(module, 'resolveAttack', resolveAttack_Contract(2))
resolveAttack_case3_result = llvm_verify(module, 'resolveAttack', resolveAttack_Contract(3))
self.assertIs(resolveAttack_case1_result.is_success(), True)
self.assertIs(resolveAttack_case2_result.is_success(), True)
self.assertIs(resolveAttack_case3_result.is_success(), True)
if __name__ == "__main__":
unittest.main()
Excellent, now for the moment of truth!
$ clang -c -emit-llvm -o artifacts/Game.bc src/Game.c
$ python3 proof/Game.py
[01:59:53.576] Verifying resolveAttack ...
[01:59:53.577] Simulating resolveAttack ...
[01:59:53.580] Checking proof obligations resolveAttack ...
[01:59:53.594] Proof succeeded! resolveAttack
✅ Verified: lemma_resolveAttack_Contract (defined at proof/Game.py:179)
[01:59:53.658] Verifying resolveAttack ...
[01:59:53.659] Simulating resolveAttack ...
[01:59:53.662] Checking proof obligations resolveAttack ...
[01:59:53.689] Subgoal failed: resolveAttack safety assertion:
internal: error: in llvm_points_to SAWServer
Literal equality postcondition
[01:59:53.689] SolverStats {solverStatsSolvers = fromList ["W4 ->z3"], solverStatsGoalSize = 438}
[01:59:53.689] ----------Counterexample----------
[01:59:53.689] target0: ([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 0, 4294967293, 0, 3, 0)
[01:59:53.689] atk0: 4294967295
[01:59:53.689] ----------------------------------
⚠️ Failed to verify: lemma_resolveAttack_Contract0 (defined at proof/Game.py:179):
error: Proof failed.
stdout:
[01:59:53.658] Verifying resolveAttack ...
[01:59:53.659] Simulating resolveAttack ...
[01:59:53.662] Checking proof obligations resolveAttack ...
[01:59:53.689] Subgoal failed: resolveAttack safety assertion:
internal: error: in llvm_points_to SAWServer
Literal equality postcondition
[01:59:53.689] SolverStats {solverStatsSolvers = fromList ["W4 ->z3"], solverStatsGoalSize = 438}
[01:59:53.689] ----------Counterexample----------
[01:59:53.689] target0: ([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 0, 4294967293, 0, 3, 0)
[01:59:53.689] atk0: 4294967295
[01:59:53.689] ----------------------------------
[01:59:53.789] Verifying resolveAttack ...
[01:59:53.789] Simulating resolveAttack ...
[01:59:53.792] Checking proof obligations resolveAttack ...
[01:59:53.905] Proof succeeded! resolveAttack
✅ Verified: lemma_resolveAttack_Contract1 (defined at proof/Game.py:179)
F
======================================================================
FAIL: test_Game (__main__.GameTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "proof/Game.py", line 549, in test_Game
self.assertIs(resolveAttack_case2_result.is_success(), True)
AssertionError: False is not True
----------------------------------------------------------------------
Ran 1 test in 1.132s
FAILED (failures=1)
🛑 1 out of 3 goals failed to verify.
make: *** [Makefile:14: all] Error 1
A counterexample?! Let’s take a closer look at what SAW provides us and reassess our strategy.
[01:59:53.689] ----------Counterexample----------
[01:59:53.689] target0: ([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 0, 4294967293, 0, 3, 0)
[01:59:53.689] atk0: 4294967295
[01:59:53.689] ----------------------------------
Notice that our unit case failed for case 2, but passed for cases 1 and 3. How about we look at our precondition for case 2 again:
elif (self.case == 2):
# target->hp <= (atk - target->def)
self.precondition_f("({target}.2 + {target}.4) <= {atk}")
and if we plug in the counterexample values SAW provides us…
{target}.2 + {target}.4 <= {atk}
4294967293 + 3 <= 4294967295
4294967296 <= 4294967295
Well that doesn’t make sense, now does it? Well actually, it does make some sense when we recognize one BIG detail. And yes, I’m referring to the big values SAW gave us. Doesn’t the atk stat look familiar? Like the upper bound for unsigned 32-bit integers?
Taking that into consideration, then our efforts to test out the counterexample was incorrect. We forgot to account for integer overflow!
4294967293 + 3 <= 4294967295
4294967295 + 1 <= 4294967295
0 <= 4294967295
Audible gasp Good thing we have SAW on our side! Now, let’s determine what we can do to fix this issue. First, let’s compare our precondition again to the source code’s if condition:
self.precondition_f("({target}.2 + {target}.4) <= {atk}")
else if ( target->hp <= (atk - target->def) )
So maybe we tried being too fancy showing off our associative property knowledge. While moving the defense term to the left-hand side of the expression does not violate any mathematical rules on paper, it does violate our expectations in practice when accounting for limited bit widths.
Fine, let’s toss out our fancy math skills and write our contract’s precondition for case 2 exactly as we see it in the source code:
elif (self.case == 2):
# target->hp <= (atk - target->def)
self.precondition_f("{target}.2 <= ({atk} - {target}.4)")
$ clang -c -emit-llvm -o artifacts/Game.bc src/Game.c
$ python3 proof/Game.py
[02:34:39.387] Verifying resolveAttack ...
[02:34:39.388] Simulating resolveAttack ...
[02:34:39.391] Checking proof obligations resolveAttack ...
[02:34:39.409] Proof succeeded! resolveAttack
✅ Verified: lemma_resolveAttack_Contract (defined at proof/Game.py:179)
[02:34:39.481] Verifying resolveAttack ...
[02:34:39.481] Simulating resolveAttack ...
[02:34:39.483] Checking proof obligations resolveAttack ...
[02:34:39.508] Subgoal failed: resolveAttack safety assertion:
internal: error: in llvm_points_to SAWServer
Literal equality postcondition
[02:34:39.508] SolverStats {solverStatsSolvers = fromList ["W4 ->z3"], solverStatsGoalSize = 438}
[02:34:39.509] ----------Counterexample----------
[02:34:39.509] target0: ([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 0, 134217728, 0, 4184634256, 0)
[02:34:39.509] atk0: 23884688
[02:34:39.509] ----------------------------------
⚠️ Failed to verify: lemma_resolveAttack_Contract0 (defined at proof/Game.py:179):
error: Proof failed.
stdout:
[02:34:39.481] Verifying resolveAttack ...
[02:34:39.481] Simulating resolveAttack ...
[02:34:39.483] Checking proof obligations resolveAttack ...
[02:34:39.508] Subgoal failed: resolveAttack safety assertion:
internal: error: in llvm_points_to SAWServer
Literal equality postcondition
[02:34:39.508] SolverStats {solverStatsSolvers = fromList ["W4 ->z3"], solverStatsGoalSize = 438}
[02:34:39.509] ----------Counterexample----------
[02:34:39.509] target0: ([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 0, 134217728, 0, 4184634256, 0)
[02:34:39.509] atk0: 23884688
[02:34:39.509] ----------------------------------
[02:34:39.613] Verifying resolveAttack ...
[02:34:39.613] Simulating resolveAttack ...
[02:34:39.616] Checking proof obligations resolveAttack ...
[02:34:39.735] Proof succeeded! resolveAttack
✅ Verified: lemma_resolveAttack_Contract1 (defined at proof/Game.py:179)
F
======================================================================
FAIL: test_Game (__main__.GameTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "proof/Game.py", line 550, in test_Game
self.assertIs(resolveAttack_case2_result.is_success(), True)
AssertionError: False is not True
----------------------------------------------------------------------
Ran 1 test in 1.200s
FAILED (failures=1)
🛑 1 out of 3 goals failed to verify.
make: *** [Makefile:14: all] Error 1
Nope, that didn’t work either. But hey, SAW gave us a different counterexample. Let’s look at that one:
[02:34:39.509] ----------Counterexample----------
[02:34:39.509] target0: ([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 0, 134217728, 0, 4184634256, 0)
[02:34:39.509] atk0: 23884688
[02:34:39.509] ----------------------------------
Plugging those values into our updated case 2 precondition:
{target}.2 <= ({atk} - {target}.4)
134217728 <= 23884688 - 4184634256
134217728 <= 134217728
Nice, we got an integer underflow counterexample. Based on these input values, the source code would actually meet the first if condition:
target->def >= atk
This means our precondition for case 2 is lacking something. Let’s adjust it in this way:
elif (self.case == 2):
# target->hp <= (atk - target->def)
self.precondition_f("{target}.4 < {atk}")
self.precondition_f("{target}.2 <= ({atk} - {target}.4)")
$ clang -c -emit-llvm -o artifacts/Game.bc src/Game.c
$ python3 proof/Game.py
[02:55:30.437] Verifying resolveAttack ...
[02:55:30.437] Simulating resolveAttack ...
[02:55:30.440] Checking proof obligations resolveAttack ...
[02:55:30.455] Proof succeeded! resolveAttack
✅ Verified: lemma_resolveAttack_Contract (defined at proof/Game.py:179)
[02:55:30.551] Verifying resolveAttack ...
[02:55:30.551] Simulating resolveAttack ...
[02:55:30.554] Checking proof obligations resolveAttack ...
[02:55:30.570] Proof succeeded! resolveAttack
✅ Verified: lemma_resolveAttack_Contract0 (defined at proof/Game.py:179)
[02:55:30.672] Verifying resolveAttack ...
[02:55:30.673] Simulating resolveAttack ...
[02:55:30.675] Checking proof obligations resolveAttack ...
[02:55:30.789] Proof succeeded! resolveAttack
✅ Verified: lemma_resolveAttack_Contract1 (defined at proof/Game.py:179)
.
----------------------------------------------------------------------
Ran 1 test in 1.140s
OK
✅ All 3 goals verified!
Woo hoo! We finally won the battle! From our exercise, we found that SAW provides us with some pretty useful counterexamples to consider for edge cases that may be lacking in traditional software unit testing.
Now let’s imagine that very large character stats is a concern in our game. In order to balance characters in our game and avoid problems with very large values, let’s say we decided to add a function (checkStats
) that checks character stats. Let’s also assume that this function is always called before functions that use those stats like what we saw in resolveAttack
.
uint32_t checkStats(character_t* character)
{
// Assume failure by default
uint32_t result = FAILURE;
// Check the stats
if (character->hp <= MAX_STAT &&
character->atk <= MAX_STAT &&
character->def <= MAX_STAT &&
character->spd <= MAX_STAT )
{
result = SUCCESS;
}
return result;
}
Is this a good idea security-wise? Eh, maybe not. The assumption that checkStats
is ALWAYS called before functions that use character stats may be missed. So what can we do? Using resolveAttack
as an example, should we rely on its caller to also call checkStats
and perform error handling ahead of time? An argument could be made for performance benefits, but at the cost of security. Should we call checkStats
within resolveAttack
? That strategy would provide more security, but the repeated checkStats
may be redundant and could hurt performance depending on the use case. As we can see, the answer for checkStats
’s best placement follows the classic “it depends”.
For the sake of argument, let’s go with the assumption that checkStats
is always called BEFORE the call to resolveAttack
. While the stat checks aren’t performed in resolveAttack
, we could include them as preconditions in our contract. Let’s add the stat checks to our original resolveAttack_Contract
, problems with case 2’s preconditions and all!
class resolveAttack_Contract(Contract):
def __init__(self, case : int):
super().__init__()
self.case = case
def specification (self):
# Declare variables
(target, target_p) = ptr_to_fresh(self, alias_ty("struct.character_t"), name="target")
atk = self.fresh_var(i32, "atk")
# Assert the precondition that the stats are below the max stat cap
# Suggests checkStats was called before resolveAttack
self.precondition_f("{atk} <= `{MAX_STAT}")
self.precondition_f("{target}.2 <= `{MAX_STAT}")
self.precondition_f("{target}.4 <= `{MAX_STAT}")
# Determine the preconditions based on the case parameter
if (self.case == 1):
# target->def >= atk
self.precondition_f("{target}.4 >= {atk}")
elif (self.case == 2):
# target->hp <= (atk - target->def)
self.precondition_f("({target}.2 + {target}.4) <= {atk}")
else:
# Assume any other case follows the formal attack calculation
self.precondition_f("{target}.4 < {atk}")
self.precondition_f("({target}.2 + {target}.4) > {atk}")
# Symbolically execute the function
self.execute_func(target_p, atk)
# Determine the postcondition based on the case parameter
if (self.case == 1):
self.points_to(target_p['hp'], cry_f("{target}.2 : [32]"))
elif (self.case == 2):
self.points_to(target_p['hp'], cry_f("0 : [32]"))
else:
self.points_to(target_p['hp'], cry_f("resolveAttack ({target}.2) ({target}.4) {atk}"))
self.returns(void)
Would this contract pass verification? Absolutely. Given that the MAX_STAT
preconditions limits our input values, we would never see SAW’s counterexample of an integer overflow/underflow from case 2.
Using Gained Knowledge
Assumptions and Lemmas
Only functions with implementations in the bitcode can be verified. If a function is imported from an external library, that function’s implementation is defined elsewhere. As such, the implementation will not be present in the generated bitcode. Instead of verifying these library functions, we can use contracts to assume their behavior.
For example, the height of
height-balanced binary
trees
on n nodes is given by the ceiling of log(n)
. In C, we might try to
implement this using
#include <math.h>
uint64_t ceilLog2(uint64_t i) {
return ceil(log2(i));
}
We may not care to verify this function if it’s from a well-known library that has been widely tested and inspected by the public. Alternatively, we may not care to verify the function if that library has already been formally verified. Even if we have verified this function ourselves, we may want to speed up the verification process by black-boxing its behavior. This is particularly important to consider if a function we formally verified took a long time to be verified.
To accomplish these goals, we make a contract outlining the behavior of
our ceilLog2
function:
class ceilLog2Contract(Contract):
def specification(self):
i = self.fresh_var(i64, "i")
self.execute_func(i)
self.returns_f("lg2 {i}")
As a quick mental exercise, does our ceilLog2
function defined earlier
always yield the behavior outlined in ceilLog2Contract
? In other words,
is ceil(log2(i))
in C always equivalent to Cryptol’s lg2 i
?
Objection, Your Honor! Leading the witness!
It turns out that ceil(log2(i))
in C is NOT equal to Cryptol’s lg2 i
for values of i
equal to 1 << j + 1 where j > 49
. Don’t believe us?
We would like to call Cryptol and C to the stand:
Cryptol> let j49 = 1 << 49 + 1 : [64]
Cryptol> j49
1125899906842624
Cryptol> lg2 j49
50
Cryptol> let j50 = 1 << 50 + 1 : [64]
Cryptol> j50
2251799813685248
Cryptol> lg2 j50
51
Cryptol> let j50m1 = j50 - 1
Cryptol> j50m1
2251799813685247
Cryptol> lg2 j50m1
51
#include <stdio.h>
#include <math.h>
#include <stdint.h>
int main()
{
uint64_t j49 = 1125899906842624; // 1 << j + 1 where j == 49
uint64_t j50 = 2251799813685248; // 1 << j + 1 where j == 50
uint64_t j50m1 = 2251799813685247; // j50 - 1
printf("ceil(log2(%ld)) = %llu\n", j49, (unsigned long long)ceil(log2(j49)));
printf("ceil(log2(%ld)) = %llu\n", j50, (unsigned long long)ceil(log2(j50)));
printf("ceil(log2(%ld)) = %llu\n", j50m1, (unsigned long long)ceil(log2(j50m1)));
return 0;
}
$ gcc -o log2Test log2Test.c
$ ./log2Test
ceil(log2(1125899906842624)) = 50
ceil(log2(2251799813685248)) = 52
ceil(log2(2251799813685247)) = 52
The lesson here, beware of float
s claiming to mimic int
s. Now instead of using
ceilLog2
as our assumption example, we could consider other functions like
CLZ instead.
int __builtin_clzll (unsigned long long)
As a bonus exercise, try to verify __builtin_clzll
. Check out
ceilLog2.py to compare your results.
However, let’s go back to our current ceilLog2
function that uses
ceil(log2(i))
. We’ll add a precondition to our SAW contract to account
for the disparity we mentioned earlier.
class ceilLog2Contract(Contract):
def specification(self):
i = self.fresh_var(i64, "i")
self.precondition_f("lg2 {i} <= 49")
self.execute_func(i)
self.returns_f("lg2 {i}")
In the unit test, we would assume the ceilLog2Contract
:
log2i_assume = llvm_assume(mod, 'ceilLog2', ceilLog2Contract())
If a C function ever used ceilLog2
, then we could pass in the
assumption as an optional argument to verification:
getHeight_result = llvm_verify(mod, 'getHeight', getHeightContract(), lemmas=[ceilLog2_assume])
The optional argument is a list because we can supply multiple assumptions or previous verifications. For example, we might have
addNode_result = llvm_verify(mod, 'addNode', addNodeContract(), lemmas=[getHeight_result])
removeNode_result = llvm_verify(mod, 'removeNode', removeNodeContract(), lemmas=[getHeight_result])
addOneRemoveTwo_result = llvm_verify(mod, 'addOneRemoveTwo', addOneRemoveTwo(), lemmas=[addNode_result, removeNode_result])
One can think of lemmas as rewrite rules under the hood. Whenever SAW
encounters ceilLog2 function in the C it will interpret its behavior as
what is specified in the ceilLog2Contract()
.
Uninterpreting Functions
Another way to simplify proofs is to uninterpret a function. This is useful when you don’t care about the specifics of the values produced by a total function, but rather, the types of the value produced. For example, hashing algorithms are often total functions. We often don’t care about the particular bits of a hash function, but rather, that the hash function returned some 32-bit integer.
This is also useful in conjunction with lemmas, reducing proof complexity by decomposing a complex specification into manageable logical steps (constrained by the lemmas), much as verifying function implementations and using results as lemmas does for the corresponding implementation.
See SMT: Equality Logic With Uninterpreted Functions, which describes how uninterpreted functions and constraints are applied to Satisfiability Modulo Theories.
Solicitation
How was your experience with this lab? Suggestions are welcome in the form of a ticket on the course GitHub page: https://github.com/weaversa/cryptol-course/issues
From here, you can go somewhere!
- Cryptographic Properties |