Blocks rewriting with clang


Back in 2009, Snow Leopard was quite an exciting OS X release. It didn’t focus on new user-visible features but instead introduced a handful of low level technologies. Two of those technologies Grand Central Dispatch (a.k.a. GCD) and OpenCL were designed to help developers benefit from the new computing power of modern computer architectures: multicore processors for the former and GPUs for the latter.

Alongside the GCD engine came a C language extension called blocks. Blocks are the C-based flavor of what is commonly called a closure: a callable object that captures the context in which it was created. The syntax for blocks is very similar to the one used for functions, with the exception that the pointer star is * replaced by a caret ^. This allows inline definition of callbacks which often can help improving the readability of the code.

Continue reading

Hackathon 0x1 – Pimp my Review or the Epic Birth of a Gerrit Plugin

The goal had been set a day or two prior to the beginning of the hackathon: we were hoping to make Gerrit better at recommending relevant reviewers for a given commit. To those who haven’t heard of it, Gerrit is a web-based code review system. It is a nifty Google-backed open-source project evolving amid an active community of users. We have been using this product here at Intersec since 2011 and some famous software projects also rely heavily on it for their development process.

Da Review Pimpers

This would be a good metaphor to illustrate our mindset at the beginning of the hackathon! (credits: Team Fortress 2)

Our team consisted of five people: Kamal, Romain, Thomas, Louis and Romain (myself).

Continue reading

Memory – Part 6: Optimizing the FIFO and Stack allocators


The most used custom allocators at Intersec are the FIFO and the Stack allocators, detailed in a previous article. The stack allocator is extremely convenient, thanks to the t_scope macro, and the FIFO is well fitted to some of our use cases, such as inter-process communication. It is thus important for these allocators to be optimized extensively.

We are two interns at Intersec, and our objective for this 6 week internship was to optimize these allocators as far as possible. Optimizing an allocator can have several meanings: it can be in terms of memory overhead, resistance to contention, performance… As the FIFO allocator is designed to work in single threaded environments, and the t_stack is thread local, we will only cover performance and memory overhead.

Continue reading

Hackathon 0x1 – Interactive mode in Behave

Presentation of the Project

As testers, we spend a lot of time working on behave, our test automation framework1. Our test framework is a great tool, but it takes a lot of time starting and initializing the product, running tests one by one.

We are not only test automation developers. We also need to explore the product under test by experimenting again and again based on the information we gather along the way. Manual testing is the basic approach to perform non-trivial experiments, but it can benefit from automated testing as it offers a quick and reliable way to set up a product in any given state.

The hackathon2 was the perfect opportunity to buy ourselves a new exploratory tool to perform interactive automated testing. To do that, we needed to be able to:

  • Gain more control over the set up steps
  • Pause the product in order to manually test the state of the product
  • Select some more steps to run, check again, and so on
  • Explore with some automatic help

We elaborated an interactive mode to manage the run of a scenario. The goal was to be able to play each sentence on demand. This mode would help us in the future to reproduce issues and to set up the environment faster for testing purpose or even for customer demonstration.

Continue reading

  1. behave is a clone of cucumber written in Python. It is based on the BDD (Behavior Driven Development) principles. Tests are described as a succession of english-sentences (assumptions, then actions, then results) which are themselves mapped to the corresponding Python code. 

  2. a hackathon is an event in which computer programmers and others involved in software development, including graphic designers, interface designers and project managers, collaborate intensively on software projects. Intersec promotes this event to focus and enhance project innovation 

DAZIO: Detecting Activity Zones based on Input/Output sms and calls activity for geomarketing and trade area analysis


Telecom data is a rich source of information for many purposes, ranging from urban planning (Toole et al., 2012), human mobility patterns (Ficek and Kencl, 2012; Gambs et al., 2011), points of interest detection (Vieira et al., 2010), epidemic spread modeling (Lima et al., 2013), community detection (Morales et al., 2013) disaster planning (Pulse, 2013) and social interactions (Eagle et al., 2013).

One common task for these applications is to identify dense areas where many users stay for a significant time (activity zones), the regions relaying theses activity zones (transit zones) as well as the interaction between identified activity zones. Thus, in the present  article we will identify activity and transit zones to monitor and predict the activity levels in the telecom operators network based on the SMS and calls input/output activity levels issued from the Telecom Italia Big Data Challenge. The results of the present study could be directly applied to:

  • Location-Based Advertising
  • defining a suitable place to open a new store in a city
  • planning where to add cell towers to improve QoS

The contribution of this work is twofold: to present a model accounting for changes of activity levels (over time) and to predict those changes using Markov chains. We also propose a methodology to detect activity and transit zones.

Continue reading

More about locality

In the third post of the memory series we briefly explained locality and why it is an important principle to keep in mind while developing a memory-intensive program. This new post is going to be more concrete and explains what actually happens behind the scene in a very simple example.

This post is a follow-up to a recent interview with a (brilliant) candidate1. As a subsidiary question, we presented him with the following two structure definitions:

The question was: what is the difference between these two structures, what are the pros and the cons of both of them? For the remaining of the article we will suppose we are working on an x86_64 architecture.

By coincidence, an intern asked more or less at the same time why we were using bar_t-like structures in our custom database engine.

Continue reading

  1. we’re still hiring 

Memory – Part 5: Debugging Tools


Here we are! We spent 4 articles explaining what memory is, how to deal with it and what are the kind of problems you can expect from it. Even the best developers write bugs. A commonly accepted estimation seems to be around of few tens of bugs per thousand of lines of code, which is definitely quite huge. As a consequence, even if you proficiently mastered all the concepts covered by our articles, you’ll still probably have a few memory-related bugs.

Memory-related bugs may be particularly hard to spot and fix. Let’s take the following program as an example:

This program is supposed to take a message as argument and print “hello <message>!” (the default message being “world”).

The behavior of this program is completely undefined, it is buggy, however it will probably not crash. The function build_message returns a pointer to some memory allocated in its stack-frame. Because of how the stack works, that memory is very susceptible to be overwritten by another function call later, possibly by fputs. As a consequence, if fputs internally uses sufficient stack-memory to overwrite the message, then the output will be corrupted (and the program may even crash), in the other case the program will print the expected message. Moreover, the program may overflow its buffer because of the use of the unsafe sprintf function that has no limit in the number of bytes written.

So, the behavior of the program varies depending on the size of the message given in the command line, the value of MAX_LINE_SIZE and the implementation of fputs. What’s annoying with this kind of bug is that the result may not be obvious: the program “works” well enough with simple use cases and will only fail the day it will receive a parameter with the right properties to exhibit the issue. That’s why it’s important that developers are at ease with some tools that will help them to validate (or to debug) memory management.

In this last article, we will cover some free tools that we consider should be part of the minimal toolkit of a C (and C++) developer.

Continue reading

Memory – Part 4: Intersec’s custom allocators

malloc() is not the one-size-fits-all allocator

malloc() is extremely convenient because it is generic. It does not make any assumptions about the context of the allocation and the deallocation. Such allocators may just follow each other, or be separated by a whole job execution. They may take place in the same thread, or not… Since it is generic, each allocation is different from each other, meaning that long term allocations share the same pool as short term ones.

Consequently, the implementation of malloc() is complex. Since memory can be shared by several threads, the pool must be shared and locking is required. Since modern hardware has more and more physical threads, locking the pool at every single allocation would have disastrous impacts on performance. Therefore, modern malloc() implementations have thread-local caches and will lock the main pool only if the caches get too small or too large. A side effect is that some memory gets stuck in thread-local caches and is not easily accessible from other threads.

Since chunks of memory can get stuck at different locations (within thread-local caches, in the global pool, or just simply allocated by the process), the heap gets fragmented. It becomes hard to release unused memory to the kernel, and it becomes highly probable that two successive allocations will return chunk of memories that are far from each other, generating random accesses to the heap. As we have seen in the previous article, random access is far from being the optimal solution for accessing memory.

As a consequence, it is sometimes necessary to have specialized allocators with predictable behavior. At Intersec, we have several of them to use in various situations. In some specific use cases we increase performance by several orders of magnitude.

Continue reading

Memory – Part 3: Managing memory

Developer point of view

In the previous articles we dealt with memory classification and analysis from an outer point of view. We saw that memory can be allocated in different ways with various properties. In the remaining articles of the series we will take a developer point of view.

At Intersec we write all of our software in C, which means that we are constantly dealing with memory management. We want our developers to have a solid knowledge of the various existing memory pools. In this article we will have an overview of the main sources of memory available to C programmers on Linux. We will also see some rules of memory management that will help you keep your program correct and efficient.

Continue reading