123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434 |
- <HTML>
- <HEAD>
- <TITLE>State Threads Library Programming Notes</TITLE>
- </HEAD>
- <BODY BGCOLOR=#FFFFFF>
- <H2>Programming Notes</H2>
- <P>
- <B>
- <UL>
- <LI><A HREF=#porting>Porting</A></LI>
- <LI><A HREF=#signals>Signals</A></LI>
- <LI><A HREF=#intra>Intra-Process Synchronization</A></LI>
- <LI><A HREF=#inter>Inter-Process Synchronization</A></LI>
- <LI><A HREF=#nonnet>Non-Network I/O</A></LI>
- <LI><A HREF=#timeouts>Timeouts</A></LI>
- </UL>
- </B>
- <P>
- <HR>
- <P>
- <A NAME="porting">
- <H3>Porting</H3>
- The State Threads library uses OS concepts that are available in some
- form on most UNIX platforms, making the library very portable across
- many flavors of UNIX. However, there are several parts of the library
- that rely on platform-specific features. Here is the list of such parts:
- <P>
- <UL>
- <LI><I>Thread context initialization</I>: Two ingredients of the
- <TT>jmp_buf</TT>
- data structure (the program counter and the stack pointer) have to be
- manually set in the thread creation routine. The <TT>jmp_buf</TT> data
- structure is defined in the <TT>setjmp.h</TT> header file and differs from
- platform to platform. Usually the program counter is a structure member
- with <TT>PC</TT> in the name and the stack pointer is a structure member
- with <TT>SP</TT> in the name. One can also look in the
- <A HREF="http://www.mozilla.org/source.html">Netscape's NSPR library source</A>
- which already has this code for many UNIX-like platforms
- (<TT>mozilla/nsprpub/pr/include/md/*.h</TT> files).
- <P>
- Note that on some BSD-derived platforms <TT>_setjmp(3)/_longjmp(3)</TT>
- calls should be used instead of <TT>setjmp(3)/longjmp(3)</TT> (that is
- the calls that manipulate only the stack and registers and do <I>not</I>
- save and restore the process's signal mask).</LI>
- <P>
- Starting with glibc 2.4 on Linux the opacity of the <TT>jmp_buf</TT> data
- structure is enforced by <TT>setjmp(3)/longjmp(3)</TT> so the
- <TT>jmp_buf</TT> ingredients cannot be accessed directly anymore (unless
- special environmental variable LD_POINTER_GUARD is set before application
- execution). To avoid dependency on custom environment, the State Threads
- library provides <TT>setjmp/longjmp</TT> replacement functions for
- all Intel CPU architectures. Other CPU architectures can also be easily
- supported (the <TT>setjmp/longjmp</TT> source code is widely available for
- many CPU architectures).
- <P>
- <LI><I>High resolution time function</I>: Some platforms (IRIX, Solaris)
- provide a high resolution time function based on the free running hardware
- counter. This function returns the time counted since some arbitrary
- moment in the past (usually machine power up time). It is not correlated in
- any way to the time of day, and thus is not subject to resetting,
- drifting, etc. This type of time is ideal for tasks where cheap, accurate
- interval timing is required. If such a function is not available on a
- particular platform, the <TT>gettimeofday(3)</TT> function can be used
- (though on some platforms it involves a system call).
- <P>
- <LI><I>The stack growth direction</I>: The library needs to know whether the
- stack grows toward lower (down) or higher (up) memory addresses.
- One can write a simple test program that detects the stack growth direction
- on a particular platform.</LI>
- <P>
- <LI><I>Non-blocking attribute inheritance</I>: On some platforms (e.g. IRIX)
- the socket created as a result of the <TT>accept(2)</TT> call inherits the
- non-blocking attribute of the listening socket. One needs to consult the manual
- pages or write a simple test program to see if this applies to a specific
- platform.</LI>
- <P>
- <LI><I>Anonymous memory mapping</I>: The library allocates memory segments
- for thread stacks by doing anonymous memory mapping (<TT>mmap(2)</TT>). This
- mapping is somewhat different on SVR4 and BSD4.3 derived platforms.
- <P>
- The memory mapping can be avoided altogether by using <TT>malloc(3)</TT> for
- stack allocation. In this case the <TT>MALLOC_STACK</TT> macro should be
- defined.</LI>
- </UL>
- <P>
- All machine-dependent feature test macros should be defined in the
- <TT>md.h</TT> header file. The assembly code for <TT>setjmp/longjmp</TT>
- replacement functions for all CPU architectures should be placed in
- the <TT>md.S</TT> file.
- <P>
- The current version of the library is ported to:
- <UL>
- <LI>IRIX 6.x (both 32 and 64 bit)</LI>
- <LI>Linux (kernel 2.x and glibc 2.x) on x86, Alpha, MIPS and MIPSEL,
- SPARC, ARM, PowerPC, 68k, HPPA, S390, IA-64, and Opteron (AMD-64)</LI>
- <LI>Solaris 2.x (SunOS 5.x) on x86, AMD64, SPARC, and SPARC-64</LI>
- <LI>AIX 4.x</LI>
- <LI>HP-UX 11 (both 32 and 64 bit)</LI>
- <LI>Tru64/OSF1</LI>
- <LI>FreeBSD on x86, AMD64, and Alpha</LI>
- <LI>OpenBSD on x86, AMD64, Alpha, and SPARC</LI>
- <LI>NetBSD on x86, Alpha, SPARC, and VAX</LI>
- <LI>MacOS X (Darwin) on PowerPC (32 bit) and Intel (both 32 and 64 bit) [universal]</LI>
- <LI>Cygwin</LI>
- </UL>
- <P>
- <A NAME="signals">
- <H3>Signals</H3>
- Signal handling in an application using State Threads should be treated the
- same way as in a classical UNIX process application. There is no such
- thing as per-thread signal mask, all threads share the same signal handlers,
- and only asynchronous-safe functions can be used in signal handlers.
- However, there is a way to process signals synchronously by converting a
- signal event to an I/O event: a signal catching function does a write to
- a pipe which will be processed synchronously by a dedicated signal handling
- thread. The following code demonstrates this technique (error handling is
- omitted for clarity):
- <PRE>
- /* Per-process pipe which is used as a signal queue. */
- /* Up to PIPE_BUF/sizeof(int) signals can be queued up. */
- int sig_pipe[2];
- /* Signal catching function. */
- /* Converts signal event to I/O event. */
- void sig_catcher(int signo)
- {
- int err;
- /* Save errno to restore it after the write() */
- err = errno;
- /* write() is reentrant/async-safe */
- write(sig_pipe[1], &signo, sizeof(int));
- errno = err;
- }
- /* Signal processing function. */
- /* This is the "main" function of the signal processing thread. */
- void *sig_process(void *arg)
- {
- st_netfd_t nfd;
- int signo;
- nfd = st_netfd_open(sig_pipe[0]);
- for ( ; ; ) {
- /* Read the next signal from the pipe */
- st_read(nfd, &signo, sizeof(int), ST_UTIME_NO_TIMEOUT);
- /* Process signal synchronously */
- switch (signo) {
- case SIGHUP:
- /* do something here - reread config files, etc. */
- break;
- case SIGTERM:
- /* do something here - cleanup, etc. */
- break;
- /* .
- .
- Other signals
- .
- .
- */
- }
- }
- return NULL;
- }
- int main(int argc, char *argv[])
- {
- struct sigaction sa;
- .
- .
- .
- /* Create signal pipe */
- pipe(sig_pipe);
- /* Create signal processing thread */
- st_thread_create(sig_process, NULL, 0, 0);
- /* Install sig_catcher() as a signal handler */
- sa.sa_handler = sig_catcher;
- sigemptyset(&sa.sa_mask);
- sa.sa_flags = 0;
- sigaction(SIGHUP, &sa, NULL);
- sa.sa_handler = sig_catcher;
- sigemptyset(&sa.sa_mask);
- sa.sa_flags = 0;
- sigaction(SIGTERM, &sa, NULL);
- .
- .
- .
-
- }
- </PRE>
- <P>
- Note that if multiple processes are used (see below), the signal pipe should
- be initialized after the <TT>fork(2)</TT> call so that each process has its
- own private pipe.
- <P>
- <A NAME="intra">
- <H3>Intra-Process Synchronization</H3>
- Due to the event-driven nature of the library scheduler, the thread context
- switch (process state change) can only happen in a well-known set of
- library functions. This set includes functions in which a thread may
- "block":<TT> </TT>I/O functions (<TT>st_read(), st_write(), </TT>etc.),
- sleep functions (<TT>st_sleep(), </TT>etc.), and thread synchronization
- functions (<TT>st_thread_join(), st_cond_wait(), </TT>etc.). As a result,
- process-specific global data need not to be protected by locks since a thread
- cannot be rescheduled while in a critical section (and only one thread at a
- time can access the same memory location). By the same token,
- non thread-safe functions (in a traditional sense) can be safely used with
- the State Threads. The library's mutex facilities are practically useless
- for a correctly written application (no blocking functions in critical
- section) and are provided mostly for completeness. This absence of locking
- greatly simplifies an application design and provides a foundation for
- scalability.
- <P>
- <A NAME="inter">
- <H3>Inter-Process Synchronization</H3>
- The State Threads library makes it possible to multiplex a large number
- of simultaneous connections onto a much smaller number of separate
- processes, where each process uses a many-to-one user-level threading
- implementation (<B>N</B> of <B>M:1</B> mappings rather than one <B>M:N</B>
- mapping used in native threading libraries on some platforms). This design
- is key to the application's scalability. One can think about it as if a
- set of all threads is partitioned into separate groups (processes) where
- each group has a separate pool of resources (virtual address space, file
- descriptors, etc.). An application designer has full control of how many
- groups (processes) an application creates and what resources, if any,
- are shared among different groups via standard UNIX inter-process
- communication (IPC) facilities.<P>
- There are several reasons for creating multiple processes:
- <P>
- <UL>
- <LI>To take advantage of multiple hardware entities (CPUs, disks, etc.)
- available in the system (hardware parallelism).</LI>
- <P>
- <LI>To reduce risk of losing a large number of user connections when one of
- the processes crashes. For example, if <B>C</B> user connections (threads)
- are multiplexed onto <B>P</B> processes and one of the processes crashes,
- only a fraction (<B>C/P</B>) of all connections will be lost.</LI>
- <P>
- <LI>To overcome per-process resource limitations imposed by the OS. For
- example, if <TT>select(2)</TT> is used for event polling, the number of
- simultaneous connections (threads) per process is
- limited by the <TT>FD_SETSIZE</TT> parameter (see <TT>select(2)</TT>).
- If <TT>FD_SETSIZE</TT> is equal to 1024 and each connection needs one file
- descriptor, then an application should create 10 processes to support 10,000
- simultaneous connections.</LI>
- </UL>
- <P>
- Ideally all user sessions are completely independent, so there is no need for
- inter-process communication. It is always better to have several separate
- smaller process-specific resources (e.g., data caches) than to have one large
- resource shared (and modified) by all processes. Sometimes, however, there
- is a need to share a common resource among different processes. In that case,
- standard UNIX IPC facilities can be used. In addition to that, there is a way
- to synchronize different processes so that only the thread accessing the
- shared resource will be suspended (but not the entire process) if that resource
- is unavailable. In the following code fragment a pipe is used as a counting
- semaphore for inter-process synchronization:
- <PRE>
- #ifndef PIPE_BUF
- #define PIPE_BUF 512 /* POSIX */
- #endif
- /* Semaphore data structure */
- typedef struct ipc_sem {
- st_netfd_t rdfd; /* read descriptor */
- st_netfd_t wrfd; /* write descriptor */
- } ipc_sem_t;
- /* Create and initialize the semaphore. Should be called before fork(2). */
- /* 'value' must be less than PIPE_BUF. */
- /* If 'value' is 1, the semaphore works as mutex. */
- ipc_sem_t *ipc_sem_create(int value)
- {
- ipc_sem_t *sem;
- int p[2];
- char b[PIPE_BUF];
- /* Error checking is omitted for clarity */
- sem = malloc(sizeof(ipc_sem_t));
- /* Create the pipe */
- pipe(p);
- sem->rdfd = st_netfd_open(p[0]);
- sem->wrfd = st_netfd_open(p[1]);
- /* Initialize the semaphore: put 'value' bytes into the pipe */
- write(p[1], b, value);
- return sem;
- }
- /* Try to decrement the "value" of the semaphore. */
- /* If "value" is 0, the calling thread blocks on the semaphore. */
- int ipc_sem_wait(ipc_sem_t *sem)
- {
- char c;
- /* Read one byte from the pipe */
- if (st_read(sem->rdfd, &c, 1, ST_UTIME_NO_TIMEOUT) != 1)
- return -1;
- return 0;
- }
- /* Increment the "value" of the semaphore. */
- int ipc_sem_post(ipc_sem_t *sem)
- {
- char c;
- if (st_write(sem->wrfd, &c, 1, ST_UTIME_NO_TIMEOUT) != 1)
- return -1;
- return 0;
- }
- </PRE>
- <P>
- Generally, the following steps should be followed when writing an application
- using the State Threads library:
- <P>
- <OL>
- <LI>Initialize the library (<TT>st_init()</TT>).</LI>
- <P>
- <LI>Create resources that will be shared among different processes:
- create and bind listening sockets, create shared memory segments, IPC
- channels, synchronization primitives, etc.</LI>
- <P>
- <LI>Create several processes (<TT>fork(2)</TT>). The parent process should
- either exit or become a "watchdog" (e.g., it starts a new process when
- an existing one crashes, does a cleanup upon application termination,
- etc.).</LI>
- <P>
- <LI>In each child process create a pool of threads
- (<TT>st_thread_create()</TT>) to handle user connections.</LI>
- </OL>
- <P>
- <A NAME="nonnet">
- <H3>Non-Network I/O</H3>
- The State Threads architecture uses non-blocking I/O on
- <TT>st_netfd_t</TT> objects for concurrent processing of multiple user
- connections. This architecture has a drawback: the entire process and
- all its threads may block for the duration of a <I>disk</I> or other
- non-network I/O operation, whether through State Threads I/O functions,
- direct system calls, or standard I/O functions. (This is applicable
- mostly to disk <I>reads</I>; disk <I>writes</I> are usually performed
- asynchronously -- data goes to the buffer cache to be written to disk
- later.) Fortunately, disk I/O (unlike network I/O) usually takes a
- finite and predictable amount of time, but this may not be true for
- special devices or user input devices (including stdin). Nevertheless,
- such I/O reduces throughput of the system and increases response times.
- There are several ways to design an application to overcome this
- drawback:
- <P>
- <UL>
- <LI>Create several identical main processes as described above (symmetric
- architecture). This will improve CPU utilization and thus improve the
- overall throughput of the system.</LI>
- <P>
- <LI>Create multiple "helper" processes in addition to the main process that
- will handle blocking I/O operations (asymmetric architecture).
- This approach was suggested for Web servers in a
- <A HREF="http://www.cs.rice.edu/~vivek/flash99/">paper</A> by Peter
- Druschel et al. In this architecture the main process communicates with
- a helper process via an IPC channel (<TT>pipe(2), socketpair(2)</TT>).
- The main process instructs a helper to perform the potentially blocking
- operation. Once the operation completes, the helper returns a
- notification via IPC.
- </UL>
- <P>
- <A NAME="timeouts">
- <H3>Timeouts</H3>
- The <TT>timeout</TT> parameter to <TT>st_cond_timedwait()</TT> and the
- I/O functions, and the arguments to <TT>st_sleep()</TT> and
- <TT>st_usleep()</TT> specify a maximum time to wait <I>since the last
- context switch</I> not since the beginning of the function call.
- <P>The State Threads' time resolution is actually the time interval
- between context switches. That time interval may be large in some
- situations, for example, when a single thread does a lot of work
- continuously. Note that a steady, uninterrupted stream of network I/O
- qualifies for this description; a context switch occurs only when a
- thread blocks.
- <P>If a specified I/O timeout is less than the time interval between
- context switches the function may return with a timeout error before
- that amount of time has elapsed since the beginning of the function
- call. For example, if eight milliseconds have passed since the last
- context switch and an I/O function with a timeout of 10 milliseconds
- blocks, causing a switch, the call may return with a timeout error as
- little as two milliseconds after it was called. (On Linux,
- <TT>select()</TT>'s timeout is an <I>upper</I> bound on the amount of
- time elapsed before select returns.) Similarly, if 12 ms have passed
- already, the function may return immediately.
- <P>In almost all cases I/O timeouts should be used only for detecting a
- broken network connection or for preventing a peer from holding an idle
- connection for too long. Therefore for most applications realistic I/O
- timeouts should be on the order of seconds. Furthermore, there's
- probably no point in retrying operations that time out. Rather than
- retrying simply use a larger timeout in the first place.
- <P>The largest valid timeout value is platform-dependent and may be
- significantly less than <TT>INT_MAX</TT> seconds for <TT>select()</TT>
- or <TT>INT_MAX</TT> milliseconds for <TT>poll()</TT>. Generally, you
- should not use timeouts exceeding several hours. Use
- <tt>ST_UTIME_NO_TIMEOUT</tt> (<tt>-1</tt>) as a special value to
- indicate infinite timeout or indefinite sleep. Use
- <tt>ST_UTIME_NO_WAIT</tt> (<tt>0</tt>) to indicate no waiting at all.
- <P>
- <HR>
- <P>
- </BODY>
- </HTML>
|