26.6 System Calls

A system call is a request for service that a program makes of the kernel. The service is generally something that only the kernel has the privilege to do, such as doing I/O. Programmers don’t normally need to be concerned with system calls because there are functions in the GNU C Library to do virtually everything that system calls do. These functions work by making system calls themselves. For example, there is a system call that changes the permissions of a file, but you don’t need to know about it because you can just use the GNU C Library’s chmod function.

System calls are sometimes called syscalls or kernel calls, and this interface is mostly a purely mechanical translation from the kernel’s ABI to the C ABI. For the set of syscalls where we do not guarantee POSIX Thread cancellation the wrappers only organize the incoming arguments from the C calling convention to the calling convention of the target kernel. For the set of syscalls where we provided POSIX Thread cancellation the wrappers set some internal state in the library to support cancellation, but this does not impact the behaviour of the syscall provided by the kernel.

In some cases, if the GNU C Library detects that a system call has been superseded by a more capable one, the wrapper may map the old call to the new one. For example, dup2 is implemented via dup3 by passing an additional empty flags argument, and open calls openat passing the additional AT_FDCWD. Sometimes even more is done, such as converting between 32-bit and 64-bit time values. In general, though, such processing is only to make the system call better match the C ABI, rather than change its functionality.

However, there are times when you want to make a system call explicitly, and for that, the GNU C Library provides the syscall function. syscall is harder to use and less portable than functions like chmod, but easier and more portable than coding the system call in assembler instructions.

syscall is most useful when you are working with a system call which is special to your system or is newer than the GNU C Library you are using. syscall is implemented in an entirely generic way; the function does not know anything about what a particular system call does or even if it is valid.

The description of syscall in this section assumes a certain protocol for system calls on the various platforms on which the GNU C Library runs. That protocol is not defined by any strong authority, but we won’t describe it here either because anyone who is coding syscall probably won’t accept anything less than kernel and C library source code as a specification of the interface between them anyway.

syscall does not provide cancellation logic, even if the system call you’re calling is listed as cancellable above.

syscall is declared in unistd.h.

Function: long int syscall (long int sysno, …)

Preliminary: | MT-Safe | AS-Safe | AC-Safe | See POSIX Safety Concepts.

syscall performs a generic system call.

sysno is the system call number. Each kind of system call is identified by a number. Macros for all the possible system call numbers are defined in sys/syscall.h

The remaining arguments are the arguments for the system call, in order, and their meanings depend on the kind of system call. If you code more arguments than the system call takes, the extra ones to the right are ignored.

The return value is the return value from the system call, unless the system call failed. In that case, syscall returns -1 and sets errno to an error code that the system call returned. Note that system calls do not return -1 when they succeed.

If you specify an invalid sysno, syscall returns -1 with errno = ENOSYS.

Example:

#include <unistd.h>
#include <sys/syscall.h>
#include <errno.h>

...

int rc;

rc = syscall(SYS_chmod, "/etc/passwd", 0444);

if (rc == -1)
   fprintf(stderr, "chmod failed, errno = %d\n", errno);

This, if all the compatibility stars are aligned, is equivalent to the following preferable code:

#include <sys/types.h>
#include <sys/stat.h>
#include <errno.h>

...

int rc;

rc = chmod("/etc/passwd", 0444);
if (rc == -1)
   fprintf(stderr, "chmod failed, errno = %d\n", errno);