CC0 Spring 2021 Update

January 17, 2021

For the upcoming spring semester, we have released another, smaller, update for CC0.

Changelog

Compiler

C0VM

Libraries

Runtime

Other

Backtraces

As my experience with 15-410 taught me, backtraces are probably one of the most effective debugging tools we have. Unfortunately, the previous implementation of backtraces in CC0 was not entirely complete. It was able to pinpoint which function a problem occured in, but not where exactly in the function the problem happened. The reason for this was because the previous implementation was written in around 30 minutes, and worked by hijacking existing ‘caller location’ information which was inserted as part of contracts. For example:

int foo(int x) { return x + 4; }
int bar(int y)
//@requires y > 0;
//@ensures \result > 0;
{
    int z = 123;
    return 2 * foo(123);
}

int main() {
    return bar(123);
}

gets translated into something similar to:

int foo(int x, string caller) { 
    int return_value = x + 4;
    return return_value;
}

int bar(int y, string caller) {
    assert(y > 0, "@requires annotation failed, caller location: " + caller);

    int z = 123;
    int return_value = 2 * foo(123, "bar: 7.16-7.19");

    assert(return_value > 0, "@ensures annotation failed");
    return return_value;
}

int main() {
    return bar(123, "main: 11.12-11.15");
}

Then we can use this caller string with a pair of functions c0_push_callstack() and c0_pop_callstack like this (using the abstract syntax for assert and + for string_join):

int foo(int x, string caller) { 
    c0_push_callstack(caller);

    int return_value = x + 4;

    c0_pop_callstack();
    return return_value;
}

int bar(int y, string caller) {
    c0_push_callstack(caller);
    assert(y > 0, "@requires annotation failed, caller location: " + caller);

    int z = 123;
    int return_value = 2 * foo(123, "bar: 7.16-7.19");

    assert(return_value > 0, "@ensures annotation failed");

    c0_pop_callstack();
    return return_value;
}

int main() {
    return bar(123, "main: 11.12-11.15");
}

It’s very simple, but isn’t really what people expect from a backtrace. If foo were more complicated, we wouldn’t really know where in foo the problem occured. Furthermore, every function call requires us to invoke c0_push_callstack. So if we call a function inside a tight loop, this hurts performance since we have to constantly push and pop from the callstack.

In order to solve this, we exploit the fact that there already exists an excellent library for performing backtraces on C programs. As C0 translates quite directly to C, this already solves a large portion of the problem. Once we have a backtrace for the C program, we just need to translate the C locations to C0 locations. We do this by embedding a source map into the binary. For example, for the above program we generate this C program, with the corresponding source map at the bottom

#include "cc0lib.h"
#include "c0rt.h"
#include "contracts.c0.h"

int _c0_foo(int _c0v_x, c0_string _c0t__caller) {
  int _c0t__result;
  _c0t__result = (_c0v_x + 4);
  return _c0t__result;
}

int _c0_bar(int _c0v_y, c0_string _c0t__caller) {
  int _c0t__result;
  cc0_assert((_c0v_y > 0), c0_string_join(c0_string_fromliteral("contracts.c0: 3.4-3.19: @requires annotation failed\n"), c0_string_join(_c0t__caller, c0_string_fromliteral(": caller location"))));
  int _c0v_z = 123;
  int _c0t__tmp_1 = _c0_foo(123, c0_string_fromliteral("bar (contracts.c0: 7.16-7.24)"));
  _c0t__result = (2 * _c0t__tmp_1);
  cc0_assert((_c0t__result > 0), c0_string_fromliteral("contracts.c0: 4.4-4.24: @ensures annotation failed"));
  return _c0t__result;
}

int _c0_main() {
  c0_string _c0t__caller = c0_string_fromliteral("(program start)");
  int _c0t__result;
  int _c0t__tmp_2 = _c0_bar(123, c0_string_fromliteral("main (contracts.c0: 11.12-11.20)"));
  _c0t__result = _c0t__tmp_2;
  return _c0t__result;
}
long map_length = 29;
const char* source_map[29] = {
  [13] = "contracts.c0: 3.4-3.19",
  [15] = "contracts.c0: 7.16-7.24",
  [17] = "contracts.c0: 4.4-4.24",
  [24] = "contracts.c0: 11.12-11.20"
};

We have the nice property that source_map[i] gives the C0 location of line i of the C program (or NULL if no location information is available). Then when printing out stack frames, we just print out the corresponding C0 location.

This also takes advantage of the fact that C0 has a well-defined order of evaluation. For example

int* p = NULL;
f(1 / 0, *p);

will always generate a division by 0 instead of a segfault. However, C does not make any such guarantee. Therefore, C0 isolates potentially effectful expressions:

int* p = NULL;
int tmp1 = 1 / 0;
int tmp2 = *p;
f(tmp1, tmp2);

This means that when translated to C, we only have one effectful computation per line of C code. Since the only lines of code which can appear in a backtrace are effectful statements (e.g. function call, dereference, array subscript, division/mod/shift, etc.), this means we will not have one line need multiple locations in C.

Overall, this means that backtraces will now display the precise location of the problem, as opposed to just which function went wrong. In addition, since libbacktrace works by crawling the stack, this means that this feature has 0 performance overhead. This means that we also get backtraces when not compiling with the -d option.

Function pointer/contract wrapper bug

C0 code is capable of calling C functions from dynamic libraries. In addition, these functions can also have contracts attached to them. For example,

int* parse_int(string s, int base)
/*@requires 2 <= base && base <= 36; @*/ ;

Although parse_int is implemented in C in libparse.so/libparse.dylib, we still need to generate a C0 ‘wrapper function’ which checks the contract for the base. So we get something like:

int* parse_int__1(string s, int base, string caller) {
  assert(2 <= base && base <= 36, "@requires annotation failed: " + caller);
  return parse_int(s, base);
}

Note the extra caller argument which is used to provide an error message. The compiler replaces every use of parse_int with this wrapper parse_int__1. For example

typedef int* parse_fn(string s, int base);

int main() {
  parse_fn* f = &parse_int;
  return *(*f)("0x123", 16);
}

becomes

typedef int* parse_fn(string s, int base, string caller);

int main() {
  parse_fn* f = &parse_int__1;
  return *(*f)("0x123", 16, "main: 5.12-5.13");
}

The address-of target has been changed, and the function pointer invocation has had the caller string inserted. Unfortunate, this causes problems if a native function does not have contracts, such as println:

void println(string msg);

In this case, no wrapper is generated. Notable, there is also no extra argument for the caller.

The compiler does not replace usages of println with a wrapper. But function pointer invocations still have the caller string inserted, since the compiler cannot know whether f points to a function which accepts a caller string or doesn’t. For example:

#use <conio>

typedef void u(string s);

int main() {
    u* f = &println;

    (*f)("hello world");

    int a = 2 + 2;
    return a;
}

This generates the following C code:

int _c0_main() {
  c0_string _c0t__caller = c0_string_fromliteral("(none)");
  int _c0t__result;
  _c0_u* _c0v_f = &println;
  (cc0_deref(_c0_u, _c0v_f))(c0_string_fromliteral("hello world"), c0_string_fromliteral("fp.c1:18.5-18.24"));
  int _c0v_a = (2 + 2);
  _c0t__result = _c0v_a;
  return _c0t__result;
}

Note the extra argument to the function pointer invocation. Because of the x86 and x86_64 calling convention, this never caused any issues for around 8 years. Unfortunately, when compiling to C0VM, this creates an issue because the extra caller argument would remain on the stack and interfere.

Ultimately, we resolved this by creating a wrapper function for every native function.

Upcoming changes

These changes didn’t make it in time for Spring 2021, but should make it into the Summer or Fall 2021 release.

Compiler

C0VM

Runtime