Posts

Fibonacci sequences of higher order : The Nibonacci Sequence

Image
So today I was messing around with optimising the calculation of Fibonacci sequences. Everyone has written code like this Quickly we run into performance issues because many of the calculations are redundant, and these redundant repeated duplicate calculations result in massive function stacks. So the obvious solution here is to cache the intermediate results. We can do something like this I wanted to create a Fibonacci sequence of order N  (Or Nibonacci sequence) using these same principles. First we need a cache that will store the N previous Nibonacci numbers. I designed this data structure, and I think it fits the bill: Glorious right? This is essentially a circular buffer, and it will work as expected in the previous example when substituted appropriately. We could have used a std::vector, but we would then have to defer the generation of our cache to runtime and, more importantly, we would be storing unnecessary results (our memory requirements would increase w...

The magic behind deep learning : Obtaining gradients using reverse Automatic Differentiation.

Image
Every now and again a wave of rage washes over me when I hear someone (be it a news outlet, etc) refer to deep learning, and the like, as magic. Although, I must admit that I was a bit vague on how you can generate an error function given a network of nodes then minimise it without using numerical differentiation or providing the actual derivative (Although there are (totally) derivative-free minimisation methods such as Simplex etc.). The process of writing software for generating this error function (and a neural network) is not of my concern here, although it might be in a future blog post. What is of my concern is obtaining the gradient of this error function. Automatic Differentiation  First, lets discuss the process of obtaining the derivative of a expression that we already have the computation graph for . You will see that obtaining the computation graph is half the battle and many of the resources on the internet explaining Automatic Differentiation (AD) tend to brus...

Writing a keylogger application in Windows

Image
This post is a little bit of a side-step. I was looking through the linux programming interface and set myself a task to obtain keyboard input using only unix commands. One problem: I only have access to a windows PC at the moment! While I could use a virtual machine, I thought it would be interesting to try and do the same thing in Windows using various methods. The code for this post is on my git here . The first method was to avoid using the Windows API as much as possible. The second was to use the SetWindowsHookEx method. Method I If you look at the code on my github, inside the main function we have something like this while ( 1 ) { for ( char i = 0 ; i < 127 ; i ++ ) { if (GetAsyncKeyState(i) == - 32767 ) { of << i; of.flush(); } } } The only alien command is GetAsyncKeyState.  This obtains the current state of the key. Using  GetKeyState  would be the wrong thing to do as we are not...

Optimising matrix multiplication by exploiting cache lines

In order to keep hold of some sort of sanity, I had better start working on something interesting.  Here is solvant  ! Being naive  When working with matrices we usually translate directly the algorithm for matrix-matrix multiplication. And it works. Most of the time, we just don't care. So long as we get the result. But, if it's one thing I learned from working with AVX (see  here ) is that if we can work on data that is contiguous in memory the compiler will automatically apply vectorised operations or SIMD operations. In series notation, the matrix multiplication $\mathbf{C} = \mathbf{A}*\mathbf{B}$ looks like $$C_{i,j} = \sum_{k=0}^K{A_{i,k}B_{k,j}}.$$ Translating this directly into code gives us template < typename T, std :: size_t R, std :: size_t K, std :: size_t C > inline void matrix_prod( const matrix < T, R, K >& a, const matrix < T, K, C >& b, matrix < T, R, C >& c) { ...

Why can't I "template" virtual member functions?

Setting the scene... This is something I have wanted to do a few times throughout my C++ journey, which started about a year ago. Only when I started to understand templates and virtual functions did it become clear why this is not currently possible. Although it will be nice if the standard introduced it. Suppose we have the following data structures struct name { std :: string first, middle, last; }; struct address { int door; std :: string street; }; //... and the following Io modules and worker class class worker { public: template < typename T > void work(T & data) { std :: cout << "work! : " << typeid (data).name() << std :: endl; } }; class Io { protected: std :: unique_ptr < worker > w; public: Io() : w( new worker){} template < typename T > void push(T & data) { std :: cout << "from Io" << std :: endl...

Going lock-free with atomic variables - Cushion for the Pushing

Recap of locking data structures When working with more than one thread care needs to be taken to ensure that we eliminate problem like data races, which sometimes lead to  undefined behavior , and dead locks, which will stall the application altogether. These problems are related. We use locks to try and prevent data races, but when using locks we might introduce a dead lock situation (see this ). Our favorite C++ containers ( std::deque ,  std::stack , etc ) can be wrapped (coarse grained locking) to obtain thread safe variants. Although this does not harness a lot of the available concurrency, it is usually sufficient for the job. If we want to extract more concurrency out of the data structure we can use fine grained locking. To illustrate the difference, I will give two code examples. One where we implement a push  on a thread-safe stack using coarse grained locking and another using coarse grained locking. template < typename T > class coarse_stack {...

std::equal_range

Today we are talking about equal_range . Take a look at the signatures   here . This is a handy little algorithm that returns the range (as a pair of forward iterators) where each item in the returned range matches the value (where what "matches" is defined by the Compare). We will look at a practical use in this post, that actually relates to a previous post on a piece of software called Contex.  Suppose we have a vector of strings and we have a vector of unique queries. We want to count the number of times a query appears in the vector of strings. A very obvious way to do it takes $O(ns)$ where $n$ is the length of the string vector and $s$ is the number of queries. This is not very good. There are (at least) two better ideas We can attempt to sweep the strings "at the same time" as the queries. We can carry out a binary search on a sorted version of the strings for the beginning and end of a range of strings that satisfy each query, for each query. Both...