Hackathon 0x1 – Pimp my Review or the Epic Birth of a Gerrit Plugin

This series of articles describes some of the best realisations made by Intersec R&D team during the 2-day Hackathon that took place on the 3rd and 4th of July.

The goal had been set a day or two prior to the beginning of the hackathon: we were hoping to make Gerrit better at recommending relevant reviewers for a given commit. To those who haven’t heard of it, Gerrit is a web-based code review system. It is a nifty Google-backed open-source project evolving amid an active community of users. We have been using this product here at Intersec since 2011 and some famous software projects also rely heavily on it for their development process.

Read more

Memory – Part 6: Optimizing the FIFO and Stack allocators

Introduction

The most used custom allocators at Intersec are the FIFO and the Stack allocators, detailed in a previous article. The stack allocator is extremely convenient, thanks to the t_scope macro, and the FIFO is well fitted to some of our use cases, such as inter-process communication. It is thus important for these allocators to be optimized extensively.

We are two interns at Intersec, and our objective for this 6 week internship was to optimize these allocators as far as possible. Optimizing an allocator can have several meanings: it can be in terms of memory overhead, resistance to contention, performance… As the FIFO allocator is designed to work in single threaded environments, and the t_stack is thread local, we will only cover performance and memory overhead.

Read more

Hackathon 0x1 – Interactive mode in Behave

This series of articles describes some of the best realisations made by Intersec R&D team during the 2-day Hackathon that took place on the 3rd and 4th of July.

Presentation of the Project

As testers, we spend a lot of time working on behave, our test automation framework1. Our test framework is a great tool, but it takes a lot of time starting and initializing the product, running tests one by one.

Read more

DAZIO: Detecting Activity Zones based on Input/Output sms and calls activity for geomarketing and trade area analysis

Introduction

Telecom data is a rich source of information for many purposes, ranging from urban planning (Toole et al., 2012), human mobility patterns (Ficek and Kencl, 2012; Gambs et al., 2011), points of interest detection (Vieira et al., 2010), epidemic spread modeling (Lima et al., 2013), community detection (Morales et al., 2013) disaster planning (Pulse, 2013) and social interactions (Eagle et al., 2013).

One common task for these applications is to identify dense areas where many users stay for a significant time (activity zones), the regions relaying theses activity zones (transit zones) as well as the interaction between identified activity zones. Thus, in the present article we will identify activity and transit zones to monitor and predict the activity levels in the telecom operators network based on the SMS and calls input/output activity levels issued from the Telecom Italia Big Data Challenge. The results of the present study could be directly applied to:

Read more

More about locality

In the third post of the memory series, we briefly explained locality and why it is an important principle to keep in mind while developing a memory-intensive program. This new post is going to be more concrete and explains what actually happens behind the scene in a very simple example.

This post is a follow-up to a recent interview with a (brilliant) candidate1. As a subsidiary question, we resented him with the following two structure definitions:

Read more

Memory – Part 5: Debugging Tools

Introduction

Here we are! We spent 4 articles explaining what memory is, how to deal with it and what are the kind of problems you can expect from it. Even the best developers write bugs. A commonly accepted estimation seems to be around of few tens of bugs per thousand of lines of code, which is definitely quite huge. As a consequence, even if you proficiently mastered all the concepts covered by our articles, you’ll still probably have a few memory-related bugs.

Read more

Memory - Part 4: Intersec’s custom allocators

malloc() not the one-size-fits-all allocator

malloc() is extremely convenient because it is generic. It does not make any assumptions about the context of the allocation and the deallocation. Such allocators may just follow each other, or be separated by a whole job execution. They may take place in the same thread, or not… Since it is generic, each allocation is different from each other, meaning that long term allocations share the same pool as short term ones.

Read more

Memory - Part 3: Managing memory

Developer point of view

In the previous articles we dealt with memory classification and analysis from an outer point of view. We saw that memory can be allocated in different ways with various properties. In the remaining articles of the series we will take a developer point of view.

At Intersec we write all of our software in C, which means that we are constantly dealing with memory management. We want our developers to have a solid knowledge of the various existing memory pools. In this article we will have an overview of the main sources of memory available to C programmers on Linux. We will also see some rules of memory management that will help you keep your program correct and efficient.

Read more

Memory - Part 2: Understanding Process memory

From Virtual to Physical

In the previous article , we introduced a way to classify the memory a process reclaimed. We used 4 quadrants using two axis: private/shared and anonymous/file-backed. We also evoked the complexity of the sharing mechanism and the fact that all memory is basically reclaimed to the kernel.

Everything we talked about was virtual. It was all about reservation of memory addresses, but a reserved address is not always immediately mapped to physical memory by the kernel. Most of the time, the kernel delays the actual allocation of physical memory until the time of the first access (or the time of the first write in some cases)… and even then, this is done with the granularity of a page (commonly 4KiB). Moreover, some pages may be swapped out after being allocated, that means they get written to disk in order to allow other pages to be put in RAM.

Read more

Memory - Part 1: Memory Types

Introduction

At Intersec we chose the C programming language because it gives us a full control on what we’re doing, and achieves a high level of performances. For many people, performance is just about using as few CPU instructions as possible. However, on modern hardware it’s much more complicated than just CPU. Algorithms have to deal with memory, CPU, disk and network I/Os… Each of them adds to the cost of the algorithm and each of them must be properly understood in order to guarantee both the performance and the reliability of the algorithm.

Read more