The image C++

Converting a String to Upper Case

Last updated Mar 20, 2006.

I recently received an e-mail message suggesting a "simpler technique" for converting a string object to uppercase letters. This technique is allegedly simpler than the original solution I presented elsewhere. To make this discussion more interesting and edifying, I will not post my original solution yet. The reader's message is as follows (as usual, I omitted identification details):

I saw your article titled Using the Transform() Algorithm to Change a String's Case. It was fascinating and informative, I was looking for a way to take my string and convert to uppercase, and although your method worked I figured out another solution that was simpler for me. You might want to add this to your article. Or maybe you already knew about it but here is my way:

#include 
string myString("HeLlO, WoRlD!");
strupr((char *) myString.c_str());

This worked for me with no problems using gcc 3.3.1.

Take a deep breath and relax. Yes, the proposed code is horrid; No, I don't intend to add it to my original article. However, this is an excellent case study from which we all can learn something.

Portability

Let's admit it: we all sin sometimes. We write brittle code that offers a quick and dirty solution to a burning problem, hoping that one day we'll dedicate more time to improve the design. As scornful as these hacks are, at least they work... sort of.

Here, however, the code uses the nonstandard function strupr(). Consequently, even this quick and dirty patch might not work at all, depending on the target compiler.

NOTE

Lesson 1: Determine the importance of portability at design time.

If you're writing general purpose code that should work independently of a specific compiler or platform, always stick to standard libraries. If you don't know whether a certain function, class template, etc. are standard, consult a good reference guide or your IDE's online help.

NOTE

The fact that a certain function is declared in a standard header such as still doesn't guarantee portability. Vendors often "augment" standard headers with nonstandard extensions.

In this specific example, the conversion to uppercase is meant to be a portable solution. Therefore, strupr() is a bad choice to begin with.

What's in a string?

For a C++ programmer, there is no such thing called a string. What you have is at least three loosely-related concepts:

  • Zero-terminated char arrays, also known as "C-strings". strupr() operates on this type of strings. It cannot handle wchar_t arrays and it is certainly incapable of handling string objects directly.
  • std::string objects. Normally, a portable C++ program would use this type of abstraction.
  • Custom-made strings. These include string classes defined by platform-specific frameworks such as MFC's CString class, or Borland's VCL AnsiString. Generally speaking, these string classes aren't portable.

NOTE

Lesson 2: Define design concepts clearly. For instance, clearly state which type(s) of strings your program handles.

const Revisited

To overcome the mismatch between std::string and strupr()'s expected argument, the author uses a brute force cast that removes the const qualifier (recall that c_str() returns const char *). However, when a function returns something that is const, this usually means that you are not supposed to modify the result directly. There are indeed rare occasions in which poorly-designed libraries use const excessively. In such cases, you might remove const violently. However, even if you're absolutely convinced that it is safe and justified to remove the const qualifier, there are better ways to do it.

NOTE

Lesson 3: Respect const/volatile qualification.

Before removing the const qualifier think for a moment: was the object defined as const or is it "contractually const"? In this example, the string object wasn't declared const. Good. Now back to the previous problem: how do you remove the const qualifier in the least offensive manner? You use const_cast. const_cast has three advantages over C-style cast:

  • Explicitness. const_cast describes your intent clearly.
  • Standardization. While C-style casts are still in the standard, they are on the verge of becoming deprecated. Don't use them unless you have a compelling reason for this.
  • Safety. Unlike C-style cast, which can perform several operations at once, const_cast can only affect an object's CV qualification. As such, it protects you from inadvertent errors such as this:
    unsigned int n;
    const unsigned int * p=&n;
    *((int *) p)=3; //oops, didn't mean to remove 'unsigned'

Flexible Design or Over-Engineering?

While future maintenance is never to be taken lightly, it is easy to get carried away into an over-engineering mania. How likely is it that the string in question will always undergo the same type of transformation (i.e. to uppercase)? How likely are other transformation types (say, transliteration to Cyrillic or encryption)?

Furthermore, is this string meant to be read by humans? After all, it is not unusual to see strings that actually represent binary numbers, decimal values or even jpeg images!

If you're certain that uppercase conversion of Latin characters is all that this string will ever undergo, you can feel OK about hard-wiring this type of transformation into your code. However, if a different transformation isn't an unlikely scenario, use an extra layer of delegation, as shown in my original solution.

NOTE

Lesson 4: Make it flexible, but not too flexible.

Epilog

Having learned which factors to consider and which design mistakes to avoid, stop for a moment and think: how would you tackle this problem? One possible solution can be found below.


Using the Transform() Algorithm to Change a String's Case
By Danny Kalev, C++ Pro

Command line interpreters, HTTP requests, and SMS messages are only a few of the applications in which different letter cases merely cause noise. To overcome this problem, such applications usually convert all strings to uppercase before any further processing.



Sadly, most of these apps use C-style strings and ad-hoc, in-house conversion routines that more often than not suffer from bugs, illegibility, and performance overhead. The std::string class provides more than a hundred member functions and overloaded operators. Yet, none of these functions transforms a string to uppercase or lowercase letters.



Use the STL transform() algorithm to change a string's case easily.

Converting the Hard Way
There are numerous ways to change the case of a string. A naive implementation might look like this:

 
#include

#include

using namespace std;
int main()
{
string s="hello";

for (int j=0; j<s.length(); ++j)
{

s[j]=toupper(s[j]);
} // s now contains "HELLO"

}
Though functionally correct, this loop is a maintenance headache. To apply a different type of transformation to the string, say to convert it to lowercase or transliterate all characters to their Cyrillic equivalent, you'll have to rewrite the loop's body. To improve the design, separate the string transformation into two operations: one that iterates through the string's elements and one that actually transforms every element. You gain more flexibility by decoupling these operations and simplify future maintenance.

Step 1: Iteration
The transform() algorithm defined in is rather flexible. Not only does it separate between the iterations and transformation operations, it also allows you to transform only a portion of the string. In addition, you can store the result in a different destination, should you prefer to keep the original string intact. The transform() algorithm has two overloaded versions but we will use only the following one:


OutputIterator transform(InputIterator first,

InputIterator last,

OutputIterator result,
UnaryOperation unary_op);


You can find an explanation about the different iterator categories here . The first and second arguments are iterators pointing to the beginning and the end of the sequence being transformed. The third argument is an iterator pointing to the beginning of the destination sequence. If you wish to overwrite the current string, result and first should have identical values.

Step 2: Transformation
The fourth argument is a unary operator. It can either be an address of a function that takes a single argument or a function object. STL algorithms don't really care whether a unary operator is a function object or an address because they merely append () to it and let the compiler takes care of the rest. This example uses the standard toupper() function declared in :

 


#include // for toupper

#include

#include
using namespace std;


string s="hello";

transform(
s.begin(), s.end(), s.begin(), toupper);

Alas, the program above will not compile because the name 'toupper' is ambiguous. It can refer either to:


int std::toupper(int); // from
or
template  

charT std::toupper(charT, const locale&);// from

Use an explicit cast to resolve the ambiguity:
std::transform(s.begin(), s.end(), s.begin(), (int(*)(int)) toupper);
This will instruct the compiler to choose the right toupper().

Design Improvements
There are ways to further benefit from using transform(). Suppose you need to transform a string to lowercase rather than uppercase. You change the transform() call to:
std::transform(s.begin(), 
s.end(),
s1.begin(),
(int(*)(int)) tolower);

It's not much of an improvement compared to the original for loop, is it? To avoid intrusive code changes such as this, use an additional level of indirection. Instead of passing a function's name as an argument, use a pointer to a function. This way, you can decouple the transform() call from the customers' requirements. Furthermore, the use of a pointer enables you to postpone the function binding to runtime:
int (*pf)(int)=tolower; 
transform(
s.begin(), s.end(), s.begin(), pf);

//lowercase
Notice that you don't need to change the transform() call now if you wish to apply yet another transformation:
pf=tocyrillic; // just an example

transform(s.begin(), s.end(), s.begin(), pf);

// Cyrillic
Conclusions
If using transform() to convert a string to uppercase seems like overkill to you, you're probably right. The string transformation was a red herring, though. The point was to show how to use transform() to manipulate sequences in a generic fashion. By using transform(), transforming a sequence of integers to their negative or square root values is a cinch:

template  class negate
{
public:
T operator()(T t) { return -t;}
};

int arr[]={1, 2, 3};

transform(arr,
arr+sizeof(arr)/sizeof(int),
arr,
negate()); // arr = {-1, -2, -3}

An complete example for string to upper-case and compare with another string.

//StrToUpper.cc
#include <cctype> // for toupper
#include <string>
#include <algorithm>
#include <iostream>
using namespace std;
int main(int argc, char* argv[])
{
    string s="hello";
    string s1=s;
    bool status = false;

    transform(s.begin(), s.end(), s1.begin(), toupper);
    cout<<s<<endl;
    cout<<s1<<endl;
    status = (s==s1)?true:false;
    if(status)
        cout<<"True"<<endl;
    else
        cout<<"False"<<endl;
    system("pause");
    return 0;
}
//---------------------------------------------------------------------------

#gcc -o
StrToUpper StrToUpper.cc  //compiling and linking the example.


Reference:
 http://blog.xuite.net/jackie.xie/bluelove/5631200?ctype=List&st=c&w=39194&re=list&p=4
http://www.chris-lott.org/resources/cstyle/
http://faq.cprogramming.com/cgi-bin/smartfaq.cgi?subject=1043284392
http://www.msoe.edu/eecs/cese/resources/stl/vector.htm
http://geosoft.no/development/cppstyle.html


arrow
arrow
    全站熱搜

    Bluelove1968 發表在 痞客邦 留言(0) 人氣()