i’m back!
TweetOk 1year+ off, and it’s time to get back into it. Once you break the habit it’s hard to restart. A big reason I haven’t restarted writing is because wordpress just depresses me. This place has fallen into total disarray. Spam comments. I’m running an old version, meh. Anyway…
Step #1. Kill wordpress. Tomorrow I’m going to have a personal hackathon to write a wordpress->something converter. Something is almost certainly a static blog generator. I’m thinking ruhoh.
Step #2. Upload all my old svn repos to github because why the hell not. I have the code for my sc2 replay aggregator site, the code for that time I hacked (in the good way) the allegheny county tax system, and the project I did to get an interview at reddit.
Step #3. Profit and stuff.
The good news is I have tons of stuff to possibly write about, and a bunch of drafts that I can polish, so yay content.
Google Voice, LOL
TweetIt would appear the recent “migration” emails that google apps has been sending me weren’t messing around. At some point my “apps” got migrated, and with it went my google voice number. None of my google voice accounts show the google-voice number anymore, the number no longer forwards to my phone, and there appears to be no way to get it back. Google just turned my phone off. The best part is I still get emails for the account that apparently has the phone number (it thinks my account name is: louis%lbrandy.com@gtempaccount.com) hooked up, but I can’t actually log into it, or use it, or transfer it. At least no way I can figure out.
So the very reason for google’s voice attractiveness (existence?), to save me the pain of having to update people with new phone numbers, appears to have backfired gloriously.
I’m buying facebook at $50b
TweetI’ve been silent for a few weeks (months, gah) now, but with good reason. I’ve spent the better part of the last few weeks interviewing all over SF and silicon valley. I’ll just skip to the end: I’ve accepted a job at Facebook and my wife and I will be relocating to silicon valley in the next few weeks.
So up first, my current job. As you may or may not know, I work at a startup face recognition company in Pittsburgh. Our decision to relocate was largely for personal reasons, and choosing to leave my job wasn’t easy. It’s an awesome place to work. If you are interested, we are still hiring.
Interviewing
The job market in California seems quite good for software engineers right now. If you are thinking about heading west, I would wager a guess that now is a great time to do it. The whole process was extremely quick (once I was in contact with a recruiter). I ended up with a handful of offers from several really awesome places (names ranging from the biggest, to awesome little places you’ve only maybe heard of). Having spent quite a bit of time the last 5 years being the interviewer, it was quite enlightening to be on the other side of it, for once. To be sure, there’s another blog post hiding behind that last sentence.
Though the vast majority of my interviews ended up with offers, it wasn’t all good news. I actually got phone-screened by one startup company. I have tons of opinions about the quality of my interviewers throughout this process (and it’s not all good, by any means), but I can’t blame that phone screener. I bumblefucked my way through a fairly easy question (I was unprepared, more on that in a bit) and knew I was blowing it. My wife tried to make me feel better after the call but I told her, having done a ton phone screens myself, that was a definite ‘no’. In fact, if he wanted to interview me after that phone screen, I’d have questioned his whole interview philosophy to begin with! There’s probably another blog post hiding there, too.
In the end, the blown phone screen was a nice wake-up call that I needed to do some more review. I did, and the rest of my interviews went extremely well. Had that phone screen been later in the process, I’m certain it would have turned out much better for me, but c’est la vie.
In the end, the decision to choose Facebook was fairly difficult (and somewhat personal) because I had such a fantastic group of offers. All of them had huge upsides, and the thought of saying “no” to any one elicited this “are you crazy?” feeling in me. The reasons Facebook stood out, for me, was because 1) they have tons of the types of problems I want to learn about, and solve, 2) they are one of the few places in the world where you can learn about things at that scale, and 3) over the next few years, Facebook is almost certainly going to be a fascinating place to be.
This post was mainly an update to anyone who frequently reads. In the next few posts, I’ll go back to the more general style, probably talking a bit more about the interview and the interviewing process. I feel like I could write a book, at this point.
Memory mapped IO for fun and profit
TweetWhat I am going to describe below is a fairly straightforward application of memory mapped IO to get huge benefits versus normal IO when loading static, but large, data structures. You can find the code for it here (note: linux only, though it would be an easy windows port). For our face recognition SDK, our model files are large binary files that are full of various bits of (odd-sized) data. We recently flattened everything out and made our initialization, effectively, zero-copy. This is a write-up of that process on a contrived example to make it a bit simpler. If you’ve ever used mmap to persist a large static data structure, you’ll probably find nothing of value here, but for the rest, read on.
For this example, consider a very large list of strings (and their lengths) that we need to persist to disk. Or, a very large array of these:
typedef struct {
char *data;
int len;
} data_t;
To make the problem non-trivial, we will consider the case of different length strings (in my tests I was using a million strings around 150 bytes in length — plus or minus a few to keep them from all being the same size).
Let’s get started
Here is a naive read/write for this data structure:
void naive_write(char* filename, data_t* data, int n)
{
FILE *f;
int i;
f = fopen (filename, "wb" );
fwrite(&n, sizeof(n), 1, f);
for (i=0;i<n;i++)
{
fwrite(&data[i].len, sizeof(data[i].len), 1, f);
fwrite(data[i].data, sizeof(char), data[i].len, f);
}
fclose(f);
}
data_t* naive_read(char* filename)
{
data_t* answer;
FILE *f;
int i, n;
f = fopen (filename, "rb");
fread(&n, sizeof(n), 1, f);
answer = malloc(n * sizeof(data_t));
for (i=0;i<n;i++)
{
fread(&answer[i].len, sizeof(answer[i].len), 1, f);
answer[i].data = malloc(sizeof(char) * answer[i].len);
fread(answer[i].data, sizeof(char), answer[i].len, f);
}
return answer;
}
We first write the number of strings, and then for each string we write its length followed by the string data. When reading, we read the number of strings (and malloc the array of strings) and then for each string we read its length (to malloc the string data itself) and then read the string. This implementation does many mallocs() and it does many freads(). It is, consequently, fairly slow.
With this data format on disk, though, this would be difficult to (easily) optimize much more. So, let’s change the layout.
A better file format
void flat_write(char* filename, data_t* data, int n)
{
FILE *f;
int i;
f = fopen (filename, "wb" );
fwrite(&n, sizeof(n), 1, f);
for (i=0;i<n;i++)
fwrite(&data[i].len, sizeof(data[i].len), 1, f);
for (i=0;i<n;i++)
fwrite(data[i].data, sizeof(char), data[i].len, f);
fclose(f);
}
We made a very minor change to how we write the data structure to disk. This time, we put all of the lengths up front, and then all of the strings after that. This is going to allow us two rather large simplifications: 1) we can malloc() once for all of the data strings, 2) we can read them all with a single fread(). This is a pretty big change because it effectively removes all malloc() pressure and puts everything into fread(). The read function does, though, become slightly more complicated:
data_t* onecopy_read(char* filename)
{
data_t* answer;
FILE *f;
int i, n, total;
f = fopen (filename, "rb");
fread(&n, sizeof(n), 1, f);
answer = malloc(n * sizeof(data_t));
total = 0;
for (i=0;i<n;i++)
{
fread(&answer[i].len, sizeof(answer[i].len), 1, f);
total += answer[i].len;
}
answer[0].data = malloc(sizeof(char)*total);
fread(answer[0].data, sizeof(char), total, f); /* one giant fread */
/* set all the ptrs */
total=0;
for (i=0;i<n;i++)
{
answer[i].data = answer[0].data + sizeof(char)*total;
total += answer[i].len;
}
return answer;
}
This is a decent optimization but not ideal. fread, still, involves a copy (copying from disk into our program’s address space). The only way around this problem is to not use fread.
Towards zero-copy
Using the exact same file, we can open it using mmap() (windows users see: MappedViewOfFile). This maps the file directly into our address space without having to do any copying whatsoever. We can then just setup all the data pointers to point to the memory mapped file, and we’ve initialized our data structure with zero copies.
data_t* zerocopy_read(char* filename)
{
data_t* answer;
int f;
int i, n, total;
char *data_ptr;
int* hdr_ptr;
struct stat sb;
f = open (filename, O_RDONLY);
fstat(f, &sb);
hdr_ptr = mmap(0, sb.st_size, PROT_READ, MAP_PRIVATE, f, 0);
n = *hdr_ptr++;
answer = malloc(n*sizeof(data_t));
for (i=0;i<n;i++)
answer[i].len = *hdr_ptr++;
data_ptr = (char*) hdr_ptr;
total=0;
for (i=0;i<n;i++)
{
answer[i].data = data_ptr + sizeof(char)*total;
total += answer[i].len;
}
return answer;
}
Timings
So the moral of the story, then, is how much faster are the various versions. Here’s some timings:
(for num_strings=1,000,000 and str_len= ~150 bytes) naive_read time: 0.373289 1copy_read time: 0.202408 0copy_read time: 0.011233
It’s important to understand what we are actually measuring here. I ran this test on Linux (Windows isn’t much different) which has a file cache. The actual disk wasn’t ever really involved in this program or these timings. We are really testing the summed speed of our processing, malloc() (in the first case), fread(), and the kernel’s handling of cached files. (note: if you modified this program to only read the files, and then reboot your machine to clear the file-cache, the first run would be much slower as the file was read in from disk and cached).
Benefits
Having these large static data structures persist using memory mapped files has some rather significant advantages beyond just initializing extremely fast. We’ve shifted the entire burden of managing this memory onto the operating system. If multiple processes are being run, this memory will be shared amongst them. This is not true of the fread() solution. The OS’s file cache will effectively keep the file in memory and ready to go whenever it’s been recently run. If there are portions of your data you aren’t using (in face recognition imagine a scenario where you only do face detection), the rest will never get loaded (or will eventually be swapped out). And last, but not least, the flattening process puts all of the bits contiguous in memory (compare with the naive approach), allowing for the OS to handle the pages of data much more efficiently.
twitter