System Programming
System programming is more than just knowing all the system calls of the operating system by heart. System programming means to:
design the components in a way so that the control and data flow between them is done in a reliable, fast, and efficient way
have an eye on the possible errors, and ensure that error paths are reasonably defined
know the hardware well, and to utilize it in the best way
Because of this, system programming is not low-level programming! The challenge is to design the component architecture on a high level so it is made best use of the OS and the hardware. Also, on the system level the basic properties of the application programming environment are defined.
Also, system programming has nothing to do with administration, which is a completely different task.
Overcoming shared-memory parallelism
In the past years, the software industry mostly propagated multi-threading, also known as shared memory parallelism, to tackle problems that require concurrent program execution. On current SMP and cc-NUMA machines (i.e. "multicores") multi-threading shows good performance. However, it is questionable whether this success story can be continued when more and more cores are integrated ("manycores"), because memory bandwidth cannot keep up, and the costs of frequent synchronization explode. There are other issues as well: The programming model is complicated and error-prone, and most application programmers are overstrained when they are faced with shared-memory parallelism. Probably this programming model can be accounted for most software crashes and frozen systems. Last but not least, this model cannot be easily extended from multicores to clusters.
There should be careful analysis whether multi-threading can be replaced by other programming models for concurrency:
Often, concurrency is not required to maximize CPU utilization. In this case, event-driven programming is a good replacement. It also has the nice side effects to keep the response latencies low, to make synchronization trivial, and to increase the reliability of the programs (no crashes because of uncontrolled accesses to shared data).
If utilization of all cores is the goal, one should also look at the veteran: multi-processing, combined with message passing or RPC for IPC. Here, the parallel tasks run independently from each other, and it is not possible to access the data structures of other processes. Also, the data paths of the processes should be separated, to avoid synchronization and to avoid non-local cache accesses. Multi-processing will still be usable when future manycores do not support cache coherency anymore (or only at great runtime costs). Also, multi-processing can be easily extended to clusters - the switch from local IPC to network IPC is a comparatively small change.
Services
These services are only available for Linux and, with restrictions, to Unix and BSD:
Analysis of the component architecture with respect to control and data parallelism
Choosing the right system platform (libraries, frameworks) on top of which the application development will take place
Performance measurements and optimization