設計可重入和執行緒安全的程式碼(Writing Reentrant and Thread-Safe Code)－藍色情懷

設計可重入和執行緒安全的程式碼

Writing Reentrant and Thread-Safe Code

In single-threaded processes, only one flow of control exists. The code executed by these processes thus need not be reentrant or thread-safe. In multi-threaded programs, the same functions and the same resources may be accessed concurrently by several flows of control. To protect resource integrity, code written for multi-threaded programs must be reentrant and thread-safe.

在單執行緒程式中，只有單一控制流程，程式所執行的程式碼不必是可重入或執行緒安全的。在多執行緒程式中，同一函數和同一資源有可能被多個控制流程並行存取。為了保護資源的完整性，多執行緒程式中所使用的程式碼必須是可重入和執行緒安全的。

理解可重入與執行緒安全

Reentrance and thread safety are both related to the way that functions handle resources. Reentrance and thread safety are separate concepts: a function can be either reentrant, thread-safe, both, or neither.

可重入與執行緒安全這兩個概念，都與函數處理資源的方式有關。可重入與執行緒安全是兩個獨立的概念，一個函數可以是可重入或是執行緒安全，或是同時滿足兩者，或是同時不滿足兩者的。

Relationships between the sets of reentrant, thread-safe, and non-thread-safe functions

This section provides information about writing reentrant and thread-safe programs. It does not cover the topic of writing thread-efficient programs. Thread-efficient programs are efficiently parallelized programs. You must consider thread effiency during the design of the program. Existing single-threaded programs can be made thread-efficient, but this requires that they be completely redesigned and rewritten.

本節提供了編寫可重入和執行緒安全程式的相關訊息。然而本節的主題並不是如何編寫高效並行化的多執行緒程式，這只有在程式設計階段才能完成。現有的單執行緒程式必須徹底的重新設計和重新編寫，才能實作高效執行緒化。

Reentrance

可重入

A reentrant function does not hold static data over successive calls, nor does it return a pointer to static data. All data is provided by the caller of the function. A reentrant function must not call non-reentrant functions.

一個可重入的函數在執行中並不使用靜態資料，也不返回指向靜態資料的指標。所有使用到的資料都由函數的呼叫者提供。可重入函數不能呼叫非可重入函數。

A non-reentrant function can often, but not always, be identified by its external interface and its usage. For example, the strtok subroutine is not reentrant, because it holds the string to be broken into tokens. The ctime subroutine is also not reentrant; it returns a pointer to static data that is overwritten by each call.

一個非可重入函數通常(儘管不是所有情況下)由它的外部介面和使用方法即可進行判斷。例如 strtok()是非可重入的，因為它在內部儲存了被標記分割的字串；ctime()函數也是非可重入的，它返回一個指向靜態資料的指標，而該靜態資料在每次呼叫中都被覆蓋重寫。

Thread Safety

執行緒安全

A thread-safe function protects shared resources from concurrent access by locks. Thread safety concerns only the implementation of a function and does not affect its external interface.

一個執行緒安全的函數通過加鎖的方式來實作多執行緒對共享資料的安全存取。執行緒安全這個概念，只與函數的內部實作有關，而不影響函數的外部介面。

In C language, local variables are dynamically allocated on the stack. Therefore, any function that does not use static data or other shared resources is trivially thread-safe, as in the following example:

在C語言中，局部變數是在堆疊上動態分配的。因此，任何未使用靜態資料或其他共享資源的函數都是執行緒安全的。例如，下面的函數是執行緒安全的：

/* thread-safe function */  
int diff(int x, int y)
{
	int delta;

	delta = y - x;
	if (delta < 0)
		delta = -delta;

	return delta;
}

The use of global data is thread-unsafe. Global data should be maintained per thread or encapsulated, so that its access can be serialized. A thread may read an error code corresponding to an error caused by another thread. In AIX®, each thread has its own errno value.

使用全域變數(的函數)是非執行緒安全的。這樣的訊息應該以執行緒為單位進行儲存，這樣對資料的存取就可以序列化。一個執行緒可能會讀取由另外一個執行緒生成的錯誤程式碼。在AIX中，每個執行緒有獨立的errno變數。

Making a Function Reentrant

函數可重入化

In most cases, non-reentrant functions must be replaced by functions with a modified interface to be reentrant. Non-reentrant functions cannot be used by multiple threads. Furthermore, it may be impossible to make a non-reentrant function thread-safe.

在多數情況下，非可重入的函數必須被修改過的具有可重入介面的函數所替代。非可重入函數不可用於多執行緒環境。此外，一個非可重入的函數可能無法滿足執行緒安全的要求。

Returning Data

返回資料

Many non-reentrant functions return a pointer to static data. This can be avoided in the following ways:

很多非可重入函數返回指向靜態資料的指標。可以以兩種方式避免這種情況：

★ Returning dynamically allocated data. In this case, it will be the caller's responsibility to free the storage. The benefit is that the interface does not need to be modified. However, backward compatibility is not ensured; existing single-threaded programs using the modified functions without changes would not free the storage, leading to memory leaks.

返回指向動態分配空間的指標。在這種情況下，由呼叫者負責釋放資源。這種方式的有點在於函數的外部介面不用修改。然後，卻無法保證程式碼的向後兼容：呼叫修改後函數的單執行緒程式，如果不做修改的話來釋放資源的話，會出現記憶體泄露的問題。

★ Using caller-provided storage. This method is recommended, although the interface must be modified.

使用由呼叫者提供的儲存空間。儘管函數的外部介面需要更改，但是該方法是被推薦需要這麼做的。

For example, a strtoupper function, converting a string to uppercase, could be implemented as in the following code fragment:

例如，將字串大寫化的strtoupper()函數，實作程式碼片斷如下：

/* non-reentrant function */
char *strtoupper(char *string)
{
	static char buffer[MAX_STRING_SIZE];
	int index;

	for (index = 0; string[index]; index++)
		buffer[index] = toupper(string[index]);  
	buffer[index] = 0

	return buffer;
}

This function is not reentrant (nor thread-safe). To make the function reentrant by returning dynamically allocated data, the function would be similar to the following code fragment:

上面的函數是非可重入(也是非執行緒安全的)。運用之前介紹的第一種方法，藉由動態配置資料，將函數改寫為可重入函數，程式碼片斷如下：

/* reentrant function (a poor solution) */
char *strtoupper(char *string)
{
	char *buffer;
	int index;

	/* error-checking should be performed! */
	buffer = malloc(MAX_STRING_SIZE);
	for (index = 0; string[index]; index++)
		buffer[index] = toupper(string[index]);  
	buffer[index] = 0

	return buffer;
}

A better solution consists of modifying the interface. The caller must provide the storage for both input and output strings, as in the following code fragment:

一個更佳的改寫方式是改變函數的介面。呼叫者必須為輸入和輸出字串提供儲存空間，程式碼片斷如下：

/* reentrant function (a better solution) */
char *strtoupper_r(char *in_str, char *out_str)  
{
	int index;
 
	for (index = 0; in_str[index]; index++)
		out_str[index] = toupper(in_str[index]);
	out_str[index] = 0

	return out_str;
}

The non-reentrant standard C library subroutines were made reentrant using caller-provided storage. This is discussed in Reentrant and Thread-Safe Libraries.

非可重入的C標準函式庫是按照第二種方法改寫的。這一點會在後文提到。

Keeping Data over Successive Calls

在連續的呼叫之間(由函數)保存訊息

No data should be kept over successive calls, because different threads may successively call the function. If a function must maintain some data over successive calls, such as a working buffer or a pointer, the caller should provide this data.

在連續的函數呼叫之間，不應該由函數保存任何訊息，因為多個執行緒可能一個接一個的呼叫該函數。如果一個函數需要在連續的呼叫中保存某個訊息，例如工作緩存區或是指標，這個訊息應該由呼叫者負責保存。

Consider the following example. A function returns the successive lowercase characters of a string. The string is provided only on the first call, as with the strtok subroutine. The function returns 0 when it reaches the end of the string. The function could be implemented as in the following code fragment:

考慮下面的例子。lowercase_c函數在連續呼叫中返回字串中字符的小寫字符。與strtok()函數的使用方法類似，該字串只在函數第一次呼叫時作為參數提供。函數在到達字串尾部時返回值為0。函數的實作程式碼片斷如下：

/* non-reentrant function */
char lowercase_c(char *string)
{
	static char *buffer;
	static int index;
	char c = 0;

	/* stores the string on first call */
	if (string != NULL) {
		buffer = string;
		index = 0;
	}

	/* searches a lowercase character */  
	for (; c = buffer[index]; index++) {
		if (islower(c)) {
			index++;
			break;
		}
	}

	return c;
}

This function is not reentrant. To make it reentrant, the static data, the index variable, must be maintained by the caller. The reentrant version of the function could be implemented as in the following code fragment:

該函數是非可重入的。為了將其改寫為可重入函數，由函數的靜態變數index所保存的訊息，應該改為由呼叫者負責保存。函數的可重入版本程式碼片斷實作如下：

/* reentrant function */
char reentrant_lowercase_c(char *string, int *p_index)
{
	char c = 0;
	/* no initialization - the caller should have done it */  
	/* searches a lowercase character */
	for (; c = string[*p_index]; (*p_index)++) {
		if (islower(c)) {
			(*p_index)++;
			break;
		  }
	}

	return c;
}

The interface of the function changed and so did its usage. The caller must provide the string on each call and must initialize the index to 0 before the first call, as in the following code fragment:

函數的外部介面和使用方法都需要修改。呼叫者必須在每次呼叫函數時提供字串參數，並且在第一次呼叫前將index變數初始化為0，正如以下程式碼片斷所展示的：

	char *my_string;
	char my_char;
	int my_index;
	...
	my_index = 0;
	while (my_char = reentrant_lowercase_c(my_string, &my_index)) {  
		...
	}

Making a Function Thread-Safe

函數執行緒安全化

In multi-threaded programs, all functions called by multiple threads must be thread-safe. However, a workaround exists for using thread-unsafe subroutines in multi-threaded programs. Non-reentrant functions usually are thread-unsafe, but making them reentrant often makes them thread-safe, too.

在多執行緒程式中，所有被多個執行緒呼叫的函數都要求是執行緒安全的。然而，有一種方法能夠實作在多執行緒程式中呼叫非執行緒安全的函數。同樣需要注意的是，非可重入的函數通常也是非執行緒安全的，然而將其改寫為可重入後，同時也就變為執行緒安全的了。

Locking Shared Resources

鎖定共享資源

Functions that use static data or any other shared resources, such as files or terminals, must serialize the access to these resources by locks in order to be thread-safe. For example, the following function is thread-unsafe:

使用靜態資料或其他共享資源(如文件、終端)的函數，必須通過加鎖的方式來將對資源的存取序列化來實作執行緒安全。例如，下面的函數是非執行緒安全的。

/* thread-unsafe function */
int increment_counter()
{
	static int counter = 0;
	counter++;

	return counter;
}

To be thread-safe, the static variable counter must be protected by a static lock, as in the following example:

為了實作執行緒安全，需要用一個靜態鎖來限制對靜態變數counter的存取，如下面的程式碼所示(虛擬程式碼)

/* pseudo-code thread-safe function */
int increment_counter()
{
	static int counter = 0;
	static lock_type counter_lock = LOCK_INITIALIZER;  

	lock(counter_lock);
	counter++;
	unlock(counter_lock);

	return counter;
}

In a multi-threaded application program using the threads library, mutexes should be used for serializing shared resources. Independent libraries may need to work outside the context of threads and, thus, use other kinds of locks.

在使用執行緒函式庫的多執行緒應用程式中，應該是用互斥鎖定來實作共享資源存取的序列化。獨立的函式庫有可能在執行緒之外的上下文環境中工作，因此，需要使用其他類型的鎖定。

Workarounds for Thread-Unsafe Functions

使用非執行緒安全函數的解決方法

It is possible to use a workaround to use thread-unsafe functions called by multiple threads. This can be useful, especially when using a thread-unsafe library in a multi-threaded program, for testing or while waiting for a thread-safe version of the library to be available. The workaround leads to some overhead, because it consists of serializing the entire function or even a group of functions. The following are possible workarounds:

通過某種解決方法，非執行緒安全函數是可以被多個執行緒呼叫的。這在某些情況下或許是有用的，特別是當在多執行緒程式中使用一個非執行緒安全函式庫的時候——或者是出於測試的目的，或者是由於沒有相應的執行緒安全版本可用。這種解決方法會增加額外的開銷，因為它需要將對某個或一組函數的呼叫進行序列化。

★ Use a global lock for the library, and lock it each time you use the library (calling a library routine or using a library global variable). This solution can create performance bottlenecks because only one thread can access any part of the library at any given time. The solution in the following pseudocode is acceptable only if the library is seldom accessed, or as an initial, quickly implemented workaround.

使用作用於整個函式庫的鎖定，在每次使用該函式庫(呼叫函式庫中的某個函數或是存取函式庫中的全域變數)時鎖定，該解決方法有可能會造成性能瓶頸，因為在任意時刻，只有一個執行緒能任意的存取或是用該函式庫。只有在該函式庫很少被使用的情況下，或是作為一種快速的實作方式，該方法才是可接受的。如下面的虛擬程式碼所示：

      /* this is pseudo-code! */
      lock(library_lock);
      library_call();
      unlock(library_lock);

      lock(library_lock);
      x = library_var;
      unlock(library_lock);

★ Use a lock for each library component (routine or global variable) or group of components. This solution is somewhat more complicated to implement than the previous example, but it can improve performance. Because this workaround should only be used in application programs and not in libraries, mutexes can be used for locking the library.

使用作用於單個函式庫組件(函數或是全域變數)或是一組組件的鎖定，這種方法與前者相比要複雜一些，但是能提高性能。由於該類解決方式只應該在應用程式而不是函式庫中使用，可以使用互斥鎖定(mutex)來為整個庫加鎖。如下面的虛擬程式碼所示：

      /* this is pseudo-code! */
      lock(library_moduleA_lock);
      library_moduleA_call();
      unlock(library_moduleA_lock);  
      
      lock(library_moduleB_lock);
      x = library_moduleB_var;
      unlock(library_moduleB_lock);

Reentrant and Thread-Safe Libraries

可重入和執行緒安全函式庫

Reentrant and thread-safe libraries are useful in a wide range of parallel (and asynchronous) programming environments, not just within threads. It is a good programming practice to always use and write reentrant and thread-safe functions.

可重入和執行緒安全函式庫，不僅在多執行緒環境，在並行以及異步編程的廣泛領域中也是很有用的。因此，堅持使用和編寫可重入和執行緒安全函數是一個很好的編程習慣。

Using Libraries

使用函式庫

Several libraries shipped with the AIX® Base Operating System are thread-safe. In the current version of AIX®, the following libraries are thread-safe:

AIX base OS附帶函式庫中有幾個是執行緒安全的。目前的AIX版本中，以下函式庫是執行緒安全的：

★ Standard C library C (libc.a)

標準函式庫

★ Berkeley compatibility library (libbsd.a)

與BSD兼容的函式庫

Some of the standard C subroutines are non-reentrant, such as the ctime and strtok subroutines. The reentrant version of the subroutines have the name of the original subroutine with a suffix _r (underscore followed by the letter r).

某些C標準函式庫函數是非可重入的，例如ctime()和strtok()。這些函數的對應可重入版本的名字為原函數加_r後綴。

When writing multi-threaded programs, use the reentrant versions of subroutines instead of the original version. For example, the following code fragment:

在編寫多執行緒程式時，應該使用可重入版本的庫函數替代原始版本。例如，下面的程式碼片斷：

	token[0] = strtok(string, separators);
	i = 0;
	do {
		i++;
		token[i] = strtok(NULL, separators);

	} while (token[i] != NULL);

should be replaced in a multi-threaded program by the following code fragment:

在一個多執行緒程式中應該替換成下面的程式碼：

	char *pointer;
	...
	token[0] = strtok_r(string, separators, &pointer);
	i = 0;
	do {
		i++;
		token[i] = strtok_r(NULL, separators, &pointer);

  	} while (token[i] != NULL);

Thread-unsafe libraries may be used by only one thread in a program. Ensure the uniqueness of the thread using the library; otherwise, the program will have unexpected behavior, or may even stop.

非執行緒安全的函式庫在程式中可以僅由一個執行緒使用。程式員必須保證使用該函數的執行緒的唯一性；否則，程式將會執行未期待的行為，甚至崩潰。

Converting Libraries

改寫函式庫

Consider the following when converting an existing library to a reentrant and thread-safe library. This information applies only to C language libraries.

下面強調了將現存函式庫改寫為可重入和執行緒安全版本的主要步驟，只適用於C語言的函式庫。

★ Identify exported global variables. Those variables are usually defined in a header file with the export keyword. Exported global variables should be encapsulated. The variable should be made private (defined with the static keyword in the library source code), and access (read and write) subroutines should be created.

識別出由函式庫導出的所有全域變數。這些全域變數通常是在頭文件中由export關鍵字定義的。導出的全域變數應該被封裝起來。每個變數應該被設為函式庫所私有的(通過static關鍵字實作)，然後創建全域變數的存取函數來執行對全域變數的存取。

★ Identify static variables and other shared resources. Static variables are usually defined with the static keyword. Locks should be associated with any shared resource. The granularity of the locking, thus choosing the number of locks, impacts the performance of the library. To initialize the locks, the one-time initialization facility may be used. For more information, see One-Time Initializations.

識別出所有靜態變數和其他共享資源。靜態變數通常是由static關鍵字定義的。每個共享資源都應該與一個鎖定關聯起來，鎖定的顆粒性(也就是鎖定的數量)，影響著函式庫的性能。為了初始化所有鎖定，可能需要一個僅被呼叫一次的初始化函數。

★ Identify non-reentrant functions and make them reentrant. For more information, see Making a Function Reentrant.

識別所有非可重入函數，並將其轉化為可重入。參見函數可重入化

★ Identify thread-unsafe functions and make them thread-safe. For more information, see Making a Function Thread-Safe.

識別所有非執行緒安全函數，並將其轉化為執行緒安全。參見函數執行緒安全化。