C++
Converting a String to Upper Case
I recently received an e-mail message suggesting a "simpler technique" for converting a string object to uppercase letters. This technique is allegedly simpler than the original solution I presented elsewhere. To make this discussion more interesting and edifying, I will not post my original solution yet. The reader's message is as follows (as usual, I omitted identification details):
I saw your article titled Using the Transform() Algorithm to Change a String's Case. It was fascinating and informative, I was looking for a way to take my string and convert to uppercase, and although your method worked I figured out another solution that was simpler for me. You might want to add this to your article. Or maybe you already knew about it but here is my way:
#include
string myString("HeLlO, WoRlD!");
strupr((char *) myString.c_str());
This worked for me with no problems using gcc 3.3.1.
Take a deep breath and relax. Yes, the proposed code is horrid; No, I don't intend to add it to my original article. However, this is an excellent case study from which we all can learn something.
Portability
Let's admit it: we all sin sometimes. We write brittle code that offers a quick and dirty solution to a burning problem, hoping that one day we'll dedicate more time to improve the design. As scornful as these hacks are, at least they work... sort of.
Here, however, the code uses the nonstandard function strupr(). Consequently, even this quick and dirty patch might not work at all, depending on the target compiler.
NOTE
Lesson 1: Determine the importance of portability at design time.
If you're writing general purpose code that should work independently of a specific compiler or platform, always stick to standard libraries. If you don't know whether a certain function, class template, etc. are standard, consult a good reference guide or your IDE's online help.
NOTE
The fact that a certain function is declared in a standard header such as
In this specific example, the conversion to uppercase is meant to be a portable solution. Therefore, strupr() is a bad choice to begin with.
What's in a string?
For a C++ programmer, there is no such thing called a string. What you have is at least three loosely-related concepts:
- Zero-terminated char arrays, also known as "C-strings". strupr() operates on this type of strings. It cannot handle wchar_t arrays and it is certainly incapable of handling string objects directly.
- std::string objects. Normally, a portable C++ program would use this type of abstraction.
- Custom-made strings. These include string classes defined by platform-specific frameworks such as MFC's CString class, or Borland's VCL AnsiString. Generally speaking, these string classes aren't portable.
NOTE
Lesson 2: Define design concepts clearly. For instance, clearly state which type(s) of strings your program handles.
const Revisited
To overcome the mismatch between std::string and strupr()'s expected argument, the author uses a brute force cast that removes the const qualifier (recall that c_str() returns const char *). However, when a function returns something that is const, this usually means that you are not supposed to modify the result directly. There are indeed rare occasions in which poorly-designed libraries use const excessively. In such cases, you might remove const violently. However, even if you're absolutely convinced that it is safe and justified to remove the const qualifier, there are better ways to do it.
NOTE
Lesson 3: Respect const/volatile qualification.
Before removing the const qualifier think for a moment: was the object defined as const or is it "contractually const"? In this example, the string object wasn't declared const. Good. Now back to the previous problem: how do you remove the const qualifier in the least offensive manner? You use const_cast. const_cast has three advantages over C-style cast:
- Explicitness. const_cast describes your intent clearly.
- Standardization. While C-style casts are still in the standard, they are on the verge of becoming deprecated. Don't use them unless you have a compelling reason for this.
- Safety. Unlike C-style cast, which can perform several operations at once,
const_cast can only affect an object's CV qualification. As such,
it protects you from inadvertent errors such as this:
unsigned int n;
const unsigned int * p=&n;
*((int *) p)=3; //oops, didn't mean to remove 'unsigned'
Flexible Design or Over-Engineering?
While future maintenance is never to be taken lightly, it is easy to get carried away into an over-engineering mania. How likely is it that the string in question will always undergo the same type of transformation (i.e. to uppercase)? How likely are other transformation types (say, transliteration to Cyrillic or encryption)?
Furthermore, is this string meant to be read by humans? After all, it is not unusual to see strings that actually represent binary numbers, decimal values or even jpeg images!
If you're certain that uppercase conversion of Latin characters is all that this string will ever undergo, you can feel OK about hard-wiring this type of transformation into your code. However, if a different transformation isn't an unlikely scenario, use an extra layer of delegation, as shown in my original solution.
NOTE
Lesson 4: Make it flexible, but not too flexible.
Epilog
Having learned which factors to consider and which design mistakes to avoid,
stop for a moment and think: how would you tackle this problem? One possible
solution can be found
below.
Using the Transform() Algorithm to Change a String's Case
By Danny Kalev, C++ Pro
Command line interpreters, HTTP requests, and SMS messages are only a
few of the applications in which different letter cases merely cause
noise. To overcome this problem, such applications usually convert all
strings to uppercase before any further processing.
Sadly, most of these apps use C-style strings and ad-hoc, in-house
conversion routines that more often than not suffer from bugs,
illegibility, and performance overhead. The std::string class provides
more than a hundred member functions and overloaded operators. Yet,
none of these functions transforms a string to uppercase or lowercase
letters.
Use the STL transform() algorithm to change a string's case easily.
Converting the Hard Way
There are numerous ways to change the case of a string. A naive implementation might look like this:
#include
#include
using namespace std;
int main()
{
string s="hello";
for (int j=0; j<s.length(); ++j)
{
s[j]=toupper(s[j]);
} // s now contains "HELLO"
}
Though
functionally correct, this loop is a maintenance headache. To apply a
different type of transformation to the string, say to convert it to
lowercase or transliterate all characters to their Cyrillic equivalent,
you'll have to rewrite the loop's body. To improve the design, separate
the string transformation into two operations: one that iterates
through the string's elements and one that actually transforms every
element. You gain more flexibility by decoupling these operations and
simplify future maintenance.
Step 1: Iteration
The transform() algorithm
defined in
OutputIterator transform(InputIterator first,
InputIterator last,
OutputIterator result,
UnaryOperation unary_op);
You can find an explanation about the different iterator categories
here
.
The first and second arguments are iterators pointing to the beginning
and the end of the sequence being transformed. The third argument is an
iterator pointing to the beginning of the destination sequence. If you
wish to overwrite the current string, result and first should have
identical values.
Step 2: Transformation
The fourth argument is a unary operator. It can either be an address of a function
that takes a single argument or a
function object. STL algorithms don't really care whether a unary operator is a function object or an address because they merely append
()
to it and let the compiler takes care of the rest. This example uses
the standard toupper() function declared in
#include // for toupper
#include
#include
using namespace std;
string s="hello";
transform(
s.begin(), s.end(), s.begin(), toupper);
Alas, the program above will not compile because the name 'toupper' is ambiguous. It can refer either to:
int std::toupper(int); // from
or
template
charT std::toupper(charT, const locale&);// from
Use an explicit cast to resolve the ambiguity:
std::transform(s.begin(), s.end(), s.begin(), (int(*)(int)) toupper);
This will instruct the compiler to choose the right toupper().Design Improvements
There are ways to further benefit from using transform(). Suppose you need to transform a string to lowercase rather than uppercase. You change the transform() call to:
It's not much of an improvement compared to the original for loop, is it? To avoid intrusive code changes such as this, use an additional level of indirection. Instead of passing a function's name as an argument, use a pointer to a function. This way, you can decouple the transform() call from the customers' requirements. Furthermore, the use of a pointer enables you to postpone the function binding to runtime:std::transform(s.begin(),
s.end(),
s1.begin(),
(int(*)(int)) tolower);
int (*pf)(int)=tolower;
transform(
s.begin(), s.end(), s.begin(), pf);
//lowercase
Notice that you don't need to change the transform() call now if you wish to apply yet another transformation:pf=tocyrillic; // just an example
transform(s.begin(), s.end(), s.begin(), pf);
// Cyrillic
ConclusionsIf using transform() to convert a string to uppercase seems like overkill to you, you're probably right. The string transformation was a red herring, though. The point was to show how to use transform() to manipulate sequences in a generic fashion. By using transform(), transforming a sequence of integers to their negative or square root values is a cinch:
template class negate
{
public:
T operator()(T t) { return -t;}
};
int arr[]={1, 2, 3};
transform(arr,
arr+sizeof(arr)/sizeof(int),
arr,
negate()); // arr = {-1, -2, -3}
An complete example for string to upper-case and compare with another string.
//StrToUpper.cc
#include <cctype> // for toupper
#include <string>
#include <algorithm>
#include <iostream>
using namespace std;
int main(int argc, char* argv[])
{
string s="hello";
string s1=s;
bool status = false;
transform(s.begin(), s.end(), s1.begin(), toupper);
cout<<s<<endl;
cout<<s1<<endl;
status = (s==s1)?true:false;
if(status)
cout<<"True"<<endl;
else
cout<<"False"<<endl;
system("pause");
return 0;
}
//----------------------------
#gcc -o StrToUpper StrToUpper.cc //compiling and linking the example.
Reference:
http://blog.xuite.net/jackie.xie/bluelove/5631200?ctype=List&st=c&w=39194&re=list&p=4
http://www.chris-lott.org
http://faq.cprogramming.com
http://www.msoe.edu/eecs/cese
http://geosoft.no/development