Some Notes on 'D for the Win' August 23, 2014
So it turns out my last post, D for the Win got all over the place. I published it late at night, went to bed, and when I got back in the morning all hell broke loose. And even more surprisingly, the vast majority of the reactions (at the ones I've seen on reddit and hacker news) were on-topic. I hope it means I got my message through.
Since I'm not able to reply to everything, I'll try to provide answers to some repeating questions and concerns people had.
No, "Dlang Dlang uber Alles" was not meant to be offensive or avert discussion by means of pulling a Godwin. Yes, I am aware of the context. I just thought it's funny and liked the sound of it. My sincere apologies to anyone who got offended by this.
Sorry, I've never heard of Nimrod until now. If I had, I'd surely try it out. But the product I'm working on is already committed to D at this point, and from what I gather Nimrod is still quite unstable.
Some people said that system programming is anything that's run natively (i.e., compiles to machine code). Well no. I'm not sure if there's a definitive definition, but system programming involves interfacing with hardware, drivers and other beasts. It usually also requires fine-grained control over resources (CPU, memory, affinity, etc.), and generally such systems are expected to deliver steady performance (soft-realtime behavior).
Therefore, I'd characterize a system-programming language as one that provides:
- Native pointers, unsafe casts and atomic operations - hardware devices don't chew on your fancy object model. I mean, they gladly will, but I don't suppose you'll be as glad.
- C ABI - interfacing with system libraries and even the kernel directly. Some system calls
are not exported through libraries, so being able to call a
cdeclvariadic function is extremely useful.
- Manual memory management - GC is a great thing in general, but when your system can't pause for indefinite times at random you have to think twice. Also, when you have to control overall memory consumption, preallocating memory is the way to go.
- Generate efficient machine code, inline-assembly a plus - when you reach bottlenecks, every CPU cycle counts.
D has garbage collection and the standard library makes use of it. As I just explained, GC and system programming don't mix well. On the other hand, system programming requires tight control over all aspects of the program, which means you end up with tailor-made data structures anyway. You're not afraid to get your hands dirty and reimplement things the right way (for your needs). So no, the GC is not a real problem.
On the other hand, having GC is great. It means that outside of the critical paths we can use a modern language with all the benefits of one. But even more importantly -- it means you can write a minimum viable product much more easily, and optimize as you make progress. In other words: start with GC and remove it where it hurts.
The C10K problem moved people from a thread-per-IO methodology to asynchronous IO (
IOCP, etc) in a thread-per-core mainloop. The C10M problem moves IO from the kernel into userland ("the kernel is the problem, not the solution"). You simply can't process enough data when you have several system calls per IO. And it's a pity, because you're limited by operating system architecture, not hardware.
Context switches are expensive, the kernel requires accounting on its own, lots of memory gets copied, and you don't get to utilize hardware offloading when you go through the standard kernel stacks. For instance, I got a factor 12 improvement just by moving my code to userland polling instead of relying on
epoll. In both cases my program consumed 100% CPU, but kernel time plummeted from 40% to 1%.
And no, you cannot approach the C10M problem in Java or Python.
C++ is a horrible language. In fact, I don't consider it a language any more. It's a conglomerate of plagiarisms with an awful syntax. I mean, every language borrows ideas from neighboring ones, but C++ has this Perl-6 aroma to it: if it exists, we'll take it in. It doesn't even have to make sense. As a friend put it, "it can do EVERY paradigm but equally badly".
Compiler support is always lacking, compilation times are ever increasing, and you have to fight your way to get the compiler to do what you have in mind. Even shooting yourself in the foot isn't fun anymore.
Go is nice for application programming. In fact, it may be great for application programming. As a web-oriented language with easy string manipulation, built-in coroutines and garbage collection, it surely beats Python in terms of speed. But I find it not that readable and in general I'm not sure if moving to Go is worth the trouble. It feels like a mix of too-high and too-low level constructs, with nothing in between. Awkward is the best word I can think of.
But back to my original point, Go doesn't come near system programming. Unless your systems run in the cloud, that is.
D has poor adoption and lacks the codebase and community of C/C++/Java/Python/Go. But as I explained, system programming teaches you to be self-reliant. You almost never rely on external code as-is (it might allocate/block/be inefficient) so you end up living in a rather closed world. On the other hand, D has C ABI so with a bit of porting effort (converting H files), you gain access to virtually everything.
People also repeatedly said (perhaps even mockingly) that are no success stories or large projects using D. First of all, that's not true. Google around. But I agree there not that many, and it's intriguing why D hasn't gotten the attention it deserves in the long while since it came to be. Anyway, being a startup, my colleagues and I are not afraid of hard work or living on the edge. And hopefully you'll hear about another D success story in a year or two :)