Blocks rewriting with clang
Introduction
Back in 2009, Snow Leopard was quite an exciting OS X release. It
didn’t focus on new user-visible features but instead introduced a
handful of low level technologies. Two of those technologies
Grand Central Dispatch (a.k.a. GCD
)
and OpenCL
were designed to help developers benefit from the new computing power
of modern computer architectures: multicore processors for the former
and GPUs for the latter.
Alongside the GCD engine came a C language extension called
blocks
.
Blocks are the C-based flavor of what is commonly called
a closure:
a callable object that captures the context in which it was created.
The syntax for blocks is very similar to the one used for functions,
with the exception that the pointer star is *
replaced by
a caret ^
. This allows inline definition of callbacks which
often can help improving the readability of the code.
#include <stdio.h>
#include <stdlib.h>
static void call_blk(void (^blk)(const char *str))
{
blk("world");
}
int main(int argc, char *argv[])
{
int count = argc > 1 ? atoi(argv[1]) : 1;
call_blk(^(const char *str) {
printf("Hello %s from %s!\n", str, argv[0]);
});
call_blk(^(const char *str) {
for (int i = 0; i < count; i++) {
printf("%s!\n", str);
}
});
return 0;
}
Blocks are a desirable feature
Standard C does not contain closures. GCC supports nested functions that are closures to some extent. However, nested functions cannot escape their definition scope, and therefore cannot be used in asynchronous code.
As a consequence, continuations in C are often implemented as a pair
of a callback function and a context. The context contains variables
needed by the callback to continue the execution of the program. Thus
the type of the context is specific to the callback. That’s why APIs
that use continuations usually take a pair of a function and
a void *
pointer, which itself is given back to the function
when it get called. This is very flexible, but deeply unsafe since using
void *
forbids any type checking by the compiler, the code can easily be
broken during a refactoring, allowing a mismatch between the data
expected by the callback and the actual content of the context it
receives.
For example, this is how we could implement the previous example using callbacks instead of blocks:
#include <stdio.h>
#include <stdlib.h>
static void call_cb(void (*cb)(const char *str, void *ctx), void *ctx) {
{
(*cb)("world", ctx);
}
static void say_hello_from(const char *str, void *ctx)
{
const char *from = ctx;
printf("Hello %s from %s!\n", str, from);
}
static void repeat(const char *str, void *ctx)
{
int count = (int)(intptr_t)ctx;
for (int i = 0; i < count; i++) {
printf("%s!\n", str);
}
}
int main(int argc, char *argv[]) {
int count = argc > 1 ? atoi(argv[1]) : 1;
call_cb(&say_hello_from, argv[0]);
call_cb(&repeat, (void *)(intptr_t)count);
return 0;
}
On the other hand, blocks are type-safe. The compiler checks that the programmer is passing a block of the correct type and then identifies the variables from the parent scopes that are used by the block (we say those variables are captured by the block). It then automatically generates the code that creates and forwards the context. This ensures the context is always correct.
Blocks provide a safer and more concise syntax for a very common pattern.
How it works
In practice, the compiler does almost the same thing as we did with callback and context. It goes a bit further, though, by putting the pointer to the function in the context, that way a block is a single object that contains a pointer to the function and the associated data. Here is a second take on our rewrite of our example in plain (blockless) C. This second approach uses something very similar to what the compiler does with blocks:
#include <stdio.h>
#include <stdlib.h>
struct take_str_block {
void (*cb)(void *ctx, const char *str);
};
static void call_cb(const struct take_str_block *blk) {
{
(*blk->cb)(blk, "world");
}
struct say_hello_from_block {
void (*cb)(void *ctx, const char *str);
/* Captured variables */
const char *from;
};
static void say_hello_from(void *ctx, const char *str)
{
const struct say_hello_from_block *blk = ctx;
printf("Hello %s from %s!\n", str, from);
}
struct repeat_block {
void (*cb)(void *ctx, const char *str);
/* Captured variables */
int count;
};
static void repeat(void *ctx, const char *str)
{
const struct repeat_block *blk = ctx;
for (int i = 0; i < blk->count; i++) {
printf("%s!\n", str);
}
}
int main(int argc, char *argv[])
{
int count = argc > 1 ? atoi(argv[1]) : 1;
{
struct say_hello_from_block ctx = {
.cb = &say_hello_from,
.from = argv[0]
};
call_cb((struct take_str_block *)&ctx);
}
{
struct repeat_block ctx = {
.cb = &repeat,
.count = count
};
call_cb((struct take_str_block *)&ctx);
}
return 0;
}
Here, the block objects are structures that extend the block type expected
by the function call_cb
.
This ensures the function receives an object it can manage and that
each block implementation can add its own variables in its context. As
you can see in this example, both say_hello_from
and
repeat
extend the take_str_block
type with their captured variables. With that approach, a cast is
needed to downgrade the extended structure to the base type, but there
is no required cast for captured variables anymore
(no more (intptr_t)
cast to pass an int into
a void *
).
The actual ABI for blocks is a bit more complex since it must deal with captured variable shared modifications, block persistency, … but the base principles are there.
Why we needed a rewriter
So, we wanted blocks in our code base. Back in 2010, clang
was the only compiler supporting blocks within the Linux environment
1. However, at that time, clang
was still
a very young project and it just didn’t fit our needs in terms of
optimizations compared to GCC
.
On top of that, our software products ran (and have always been
running) on RedHat Enterprise Linux, so using a compiler supported by
RHEL
was highly desirable. Unfortunately, this was not the case
of clang
, not even in its most recent releases.
Thankfully, clang
contained a rewriting infrastructure that
let a custom pass manipulate the
AST2. Even more interestingly, clang
came with a built-in Objective-C to C++ rewriter… and since Objective-C
includes blocks and C++ does not, that rewriter already did the work of
rewriting blocks to C++.
However, our code is written in plain C (well, GNU C99 in fact), and
we actually use the C99 features. If C++ was a strict superset of C,
this wouldn’t be an issue to rewrite our code to C++. Sadly, C++ is not
strictly backward compatible with C and some syntax, like the
designated initializer
syntax I used in the previous code sample are
unavailable in C++.
As a consequence, rewriting our code to C++ is not acceptable as it
would forbid the use of handy C99 features. This was (and still is) a
no-brainer: we needed a rewriter from C with blocks to C supported by
GCC, that way, we could both have blocks in C and use GCC
to compile the code:
Thanks to the existing rewriter, this was not too
hard3.
Before starting to detail our process, let see what was the actual
output of the origin Objective-C to C++ rewriter. The following code is
what we obtained by running clang -cc1 -rewrite-objc
on the program of the introduction of the article
(for the sake of clarity, this excludes the large generated preamble):
struct __block_impl {
void *isa;
int Flags;
int Reserved;
void *FuncPtr;
};
#include <stdio.h>
#include <stdlib.h>
static void call_blk(void (*blk)(const char *str))
{
<sup><a href="#footnote_3_305" id="identifier_3_305" class="footnote-link footnote-identifier-link" title="void (*)(struct __block_impl *, const char *">4</a></sup>((struct __block_impl *)blk)->FuncPtr)((struct __block_impl *)blk,
"world");
}
struct __main_block_impl_0 {
struct __block_impl impl;
struct __main_block_desc_0* Desc;
char **argv;
__main_block_impl_0(void *fp, struct __main_block_desc_0 *desc, char **_argv, int flags=0) : argv(_argv) {
impl.isa = &_NSConcreteStackBlock;
impl.Flags = flags;
impl.FuncPtr = fp;
Desc = desc;
}
};
static void __main_block_func_0(struct __main_block_impl_0 *__cself, const char *str) {
char **argv = __cself->argv; // bound by copy
printf("Hello %s from %s!\n", str, argv[0]);
}
static struct __main_block_desc_0 {
size_t reserved;
size_t Block_size;
} __main_block_desc_0_DATA = { 0, sizeof(struct __main_block_impl_0)};
struct __main_block_impl_1 {
struct __block_impl impl;
struct __main_block_desc_1* Desc;
int count;
__main_block_impl_1(void *fp, struct __main_block_desc_1 *desc, int _count, int flags=0) : count(_count) {
impl.isa = &_NSConcreteStackBlock;
impl.Flags = flags;
impl.FuncPtr = fp;
Desc = desc;
}
};
static void __main_block_func_1(struct __main_block_impl_1 *__cself, const char *str) {
int count = __cself->count; // bound by copy
for (int i = 0; i < count; i++) {
printf("%s!\n", str);
}
}
static struct __main_block_desc_1 {
size_t reserved;
size_t Block_size;
} __main_block_desc_1_DATA = { 0, sizeof(struct __main_block_impl_1)};
int main(int argc, char *argv[])
{
int count = argc > 1 ? atoi(argv[1]) : 1;
call_blk((void (*)(const char *))&__main_block_impl_0((void *)__main_block_func_0, &__main_block_desc_0_DATA, argv));
call_blk((void (*)(const char *))&__main_block_impl_1((void *)__main_block_func_1, &__main_block_desc_1_DATA, count));
return 0;
}
If you look at this closely, you can notice the similarity with our
take_str_block
example.
The compiler generates a function and an
impl
structure for each block.
The base structure __block_impl
contains the function pointer as well as information about the type of block
(mostly used for block persistency).
What we did
First, we had to get rid of the Objective-C specific code: no need to
keep code we don’t care about. We initially hacked directly the code of
the Objective-C to C++ rewriter. But it rapidly proved really hard to
maintain the patched rewriter since we had a lot of conflicts with each
new clang
release, so we actually forked the Objective-C to C++
rewriter into a specialized block rewriter, introducing a
-rewrite-blocks
flag to clang
.
Secondly, we needed to produce pure C code instead of C++ for blocks.
As you can see in the generated code, the generated code uses structure
constructors to initialize the implementation objects. Constructors do
not exist in C, but we have initializers (and far better initializers
than C++ ones). As a consequence, we changed the code that instantiated
the impl
structure to use structure initializers. Note that
our implementation uses two C99 features: the designated initializers
we mentioned already as well as compound literals.
The rewriter also uses another feature that cannot be observed in our
example: the <
a href="http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization"
target="_blank" rel="noopener noreferrer">RAII, which is used only for
variables with the __block
modifier. Those variables are captured by reference instead of being
captured by value. As a consequence, the runtime must perform some
reference counting on those variables. The RAII is used to ensure we
release the reference taken by the definition scope of the variable when
we exit that scope. Standard C contains no equivalent construct. As a
consequence, we had to manually generate the release of the reference
everywhere we exited the definition scope of the variable. Thankfully,
GCC
has a cleanup
attribute.
This attribute associates a cleanup callback to a variable and acts
exactly the same way as the RAII does. As a consequence, this completely
solves our issue in a simple way.
With those issues fixed, here is the output of our rewriter clang -cc1 -rewrite-blocks
:
struct __block_impl {
void *isa;
int Flags;
int Reserved;
void *FuncPtr;
};
#include <stdio.h>
#include <stdlib.h>
static void call_blk(void (*blk)(const char *str))
{
((struct __block_impl *)blk)->FuncPtr)((struct __block_impl *)blk,
"world");
}
struct __main_block_impl_0 {
struct __block_impl impl;
struct __main_block_desc_0* Desc;
char **argv;
#define __main_block_impl_0(__blk_fp, __blk_desc, _argv, __blk_flags) \
{ \
.argv = (_argv), \
.impl = { \
.isa = &_NSConcreteStackBlock, \
.Flags = (__blk_flags), \
.FuncPtr = (__blk_fp), \
}, \
.Desc = (__blk_desc), \
}
#define __main_block_impl_0__INST(...) ({ \
memcpy(&__main_block_impl_0__VAR, &(struct __main_block_impl_0)__main_block_impl_0(__VA_ARGS__), sizeof(__main_block_impl_0__VAR)); \
&__main_block_impl_0__VAR; \
})
};
static void __main_block_func_0(struct __main_block_impl_0 *__cself, const char *str) {
char **argv = __cself->argv; // bound by copy
printf("Hello %s from %s!\n", str, argv[0]);
}
static struct __main_block_desc_0 {
unsigned long reserved;
unsigned long Block_size;
} __main_block_desc_0_DATA = { 0, sizeof(struct __main_block_impl_0)};
struct __main_block_impl_1 {
struct __block_impl impl;
struct __main_block_desc_1* Desc;
int count;
#define __main_block_impl_1(__blk_fp, __blk_desc, _count, __blk_flags) \
{ \
.count = (_count), \
.impl = { \
.isa = &_NSConcreteStackBlock, \
.Flags = (__blk_flags), \
.FuncPtr = (__blk_fp), \
}, \
.Desc = (__blk_desc), \
}
#define __main_block_impl_1__INST(...) ({ \
memcpy(&__main_block_impl_1__VAR, &(struct __main_block_impl_1)__main_block_impl_1(__VA_ARGS__), sizeof(__main_block_impl_1__VAR)); \
&__main_block_impl_1__VAR; \
})
};
static void __main_block_func_1(struct __main_block_impl_1 *__cself, const char *str) {
int count = __cself->count; // bound by copy
for (int i = 0; i < count; i++) {
printf("%s!\n", str);
}
}
static struct __main_block_desc_1 {
unsigned long reserved;
unsigned long Block_size;
} __main_block_desc_1_DATA = { 0, sizeof(struct __main_block_impl_1)};
int main(int argc, char *argv[])
{struct __main_block_impl_1 __main_block_impl_1__VAR;struct __main_block_impl_0 __main_block_impl_0__VAR;
int count = argc > 1 ? atoi(argv[1]) : 1;
call_blk((void (*)(const char *))(void *)__main_block_impl_0__INST((void *)__main_block_func_0, &__main_block_desc_0_DATA, argv, 0
));
call_blk((void (*)(const char *))(void *)__main_block_impl_1__INST((void *)__main_block_func_1, &__main_block_desc_1_DATA, count, 0
));
return 0;
}
Limitations
An issue remains, though, the rewriter cannot rewrite blocks within
macros. This is a direct consequence of how the preprocessor works: it
is a (not so) simple text replacement and the text of a macro is not
guaranteed to be valid C. This could be worked around by rewriting the
code output of the preprocessor instead of rewriting the source code.
However, since we need to compile the output of the rewriter using
GCC
in order to use the optimizer
(which is the most complex part of the compiler) supported by RHEL
this was not possible. Indeed, in our code base, we sometimes use some
compiler specific code paths (for example, until recently,
clang
didn’t
know about the flatten
attribute, GCC
didn’t know about clang
‘s __has_feature()
preprocessor intrinsic…). Since the rewriter is based on clang
,
the macros are expanded to their clang
flavour. As a consequence
, had we worked on the preprocessed code, at best we would have lost some
GCC
-specific optimisations and at worst we would have encountered
code GCC
could not compile. Today, rewriting within macros remains the main
limitation of our toolchain. In early versions, it even used to cause
crashes of clang
, a consequence of a pointer misuse inherited
from the Objective-C rewriter4.
Since we rewrite the original source file, not the preprocessed file, we also cannot rewrite blocks in included files. This means that no blocks are allowed in header files (or in any other included code file). However, we still need to declare block types and functions that take blocks as arguments in those header files. And here comes probably the part that created the greatest amount of confusion among our engineers.
A block type is defined using a caret:
typedef int (^my_block_b)( const char *str);
int call_blk(int (^blk)(const char *str));
That won’t compile with GCC
. So we had to find a way to
allow declaration of block types even in header files with two
constraints: we must use the block syntax when the compiler supports
blocks, but use some standard C syntax when the compiler does not. We
did this by introducing a dedicated BLOCK_CARET
macro that gets
rewritten to a caret when the compiler supports blocks, but as a star when
the compiler does not support them:
#ifdef __BLOCKS__
# define BLOCK_CARET ^
#else
# define BLOCK_CARET *
#endif
typedef int (BLOCK_CARET my_block_b)(const char *str);
int call_blk(int (BLOCK_CARET blk)(const char *str));
This means that GCC
will see block types as functions
types. However we already saw that a block is not a function, but a
structure. The rewriting to a function type is just a trick that makes
the code acceptable to GCC
, but that trick does not
transform blocks to functions. Even if both blocks and functions are
callable C types, they are not interchangeable: a function cannot be
used where a block is expected, and vice-versa. Blocks remain a patch on
top of the C standard. Most languages that have built-in closures won’t
have that kind of limitations, Apple itself fixed this in the Swift
language… five years later.
4 years later…
We’ve been using our rewriter for over 4 years. During the few first months, we had a hard time convincing our engineers that blocks really were worth the extra cost (the extra compilation time and the various limitations of our rewriter), but today, a quarter of our code base uses blocks. The benefits of the lighter syntax introduced by blocks can be observed in various use cases spanning from asynchronous code to the core of our database engines. For example, we can easily create atomic commits using blocks:
db_do_atomically(^{
do_some_write();
do_some_other_write();
});
We also spotted more limitations (mostly with __block
variables), added basic support for C++ (ironically, we now think that
we can generate better C++ that the original Objective-C to C++ rewriter
for the small subset of C++ we support)… Probably the most frustrating
issue today is the performance impact of blocks: they are too tricky to
be inlined by GCC
, even if defined and called in the same
compilation unit. Moreover, in some use cases, we spotted limitation due
to the fact blocks get allocated on the heap when we need to persists
them (using Block_copy()
), causing some contention in the
allocator.
Our hacked clang
is
available on github for all branches
since clang
3.0 (we actually began working on it before
clang 3.0, but that early work was lost when we switched from our own
git-svn
clone to the official llvm git mirror). The code is not clean and
someday we’ll do a great cleanup pass in order to be able to submit our
patches upstream.
- though Apple’s GCC fork supported it too [↩]
- the AST, Abstract Syntax Tree, is a representation of the source code in the form of a tree which is much easier to manipulate than raw text [↩]
- you can still find
our original discussion about this in the
clang
mailing list [↩] - It looks like the Objective-C rewriter was some kind of experiment, and it’s clearly not of production quality [↩]