u should use <b>UNICODE</b> concept.
UNICODE and Revolution
In the 1980s work was begun to develop a single, 16-bit (= 2 byte) multilingual character encoding system that can represent nearly all characters used in the major languages of the world. The resulting standard was called Unicode.
What is Unicode?
(from the Unicode consortium web site.)
Unicode provides a unique number for every character,
no matter what the platform,
no matter what the program,
no matter what the language.
Fundamentally, computers just deal with numbers. They store letters and other characters by assigning a number for each one. Before Unicode was invented, there were hundreds of different encoding systems for assigning these numbers. No single encoding could contain enough characters: for example, the European Union alone requires several different encodings to cover all its languages. Even for a single language like English no single encoding was adequate for all the letters, punctuation, and technical symbols in common use.
These encoding systems also conflict with one another. That is, two encodings can use the same number for two different characters, or use different numbers for the same character. Any given computer (especially servers) needs to support many different encodings; yet whenever data is passed between different encodings or platforms, that data always runs the risk of corruption.
Unicode is changing all that!
Incorporating Unicode into client-server or multi-tiered applications and websites offers significant cost savings over the use of legacy character sets. Unicode enables a single software product or a single website to be targeted across multiple platforms, languages and countries without re-engineering. It allows data to be transported through many different systems without corruption.
There is a good general introduction to Unicode that is not overly technical on the SIL International web site.
Do the Exercise "Exploring Unicode ".
Useful Information for Using Unicode in Revolution
American Standard Code for Information Interchange.
A single text symbol: a letter, number, punctuation mark, or control character. Characters can be single-byte or double-byte (Unicode).
When using the word character in a chunk expression, single-byte characters are assumed.
A standard for representing all characters from all known writing systems in a single character set. There are three encoding forms that are part of the Unicode specification: UTF-8 (single-byte), UTF-16 (two-byte) and UTF-32 (four-byte). UTF-16 is the most commonly-implemented form (and the form implemented in Revolution.)
A font in which each character is represented by 2 bytes. Languages such as Japanese, Chinese, and Korean, which contain more symbols than can be represented by 256 code points, require double-byte character sets.
Double-byte fonts usually also contain a full complement of alphabetic characters occupying the first 256 positions in the font, so you can display Roman-alphabet text and non-Roman text using the same font.
A character from a character set that supports more than 256 characters; a Unicode character.
The numeric equivalent of Unicode characters is between zero and 65,535, for 65,536 total characters. 65,536 is 216, so it takes 16 bits (two bytes) to store a Unicode character.
Unicode Transformation Format, 8-bit encoding form.
The UTF-8 encoding form was developed to work with existing software implementations that were designed for processing 8-bit text data. Most web pages that use unicode text use UTF-8 format. Since Revolution fields use UTF-16 primarily, you must convert UTF-8 unicode text to UTF-16 before displaying it in a Revolution field.
Unicode Transformation Format, 16-bit encoding form.
The 16-bit (2 byte) implementation of Unicode most commonly supported by applications that support Unicode. It is the format represented by the unicodeText property of Revolution fields.
The order in which the bytes of two and four-byte characters are stored in the computer's memory. It is determined by the CPU. Intel processors store the most significant byte last ("little-endian"), whereas Motorola and IBM PowerPC processors store the most significant byte first ("big-endian"). Because this can be reflected in the way the data is stored in files, unicode (UTF-16) data strings produced on one processor may not transfer properly to a system that uses a different byte order. This issue affects UTF-16, but not UTF-8.
Unicode fonts commonly included in Mac and Windows systems
see http://www.alanwood.net/unicode/fonts_macosx.html for an exhaustive list.
Lucida Grande (Latin, Cyrillic, Greek...)
Fang Song (Chinese)
Times New Roman
(Can use Windows Unicode fonts)
see http://www.alanwood.net/unicode/fonts.html for an exhaustive list.
Lucida family (Latin, Cyrillic, Greek...)
MS Hei, MS Song (Chinese)
Times New Roman
How Tos and Abouts
From About chunk expressions:
Important! Characters in chunk expressions are assumed to be single-byte characters. To successfully use chunk expressions with Unicode (double-byte) text, you must treat each double-byte character as a set of two single-byte characters. For example, to get the numeric value of the third Unicode character in a field, use statements like the following:
set the useUnicode to true
get charToNum(char 5 to 6 of field "Chinese Text")
How to enter or display Unicode text in a field.
You display double-byte text in its correct language by setting its textFont property to a Unicode font. You can either put the text into the field and set the textFont in a handler or the message box, or manually enter the text after using the operating systems built-in text entry tools to choose a language.
For example, to display double-byte Japanese characters that are on line 12 of a field, use a statement like the following:
set the textFont of line 12 of field 1 to "Osaka,Japanese"
When you manually enter text in a language that does not use the Roman alphabet, using the operating systems tools, Revolution automatically sets the textFont of the text you enter to the appropriate font for the language you have chosen.
How to find out whether text in a field is Unicode
You find out whether text in a field is Unicode text by examining its textFont property. The textFont of Unicode text consists of the font name, a comma, and either Unicode or the language the text is in. The following example statement checks whether line 3 of a field is Unicode:
if the effective textFont of line 3 of field 1 contains comma then answer "Its Unicode!"
Note: Characters in chunk expressions are assumed to be single-byte characters. To check a Unicode characters textFont using a chunk expression, treat it as two single-byte characters. For example, to check the fifth character in a field consisting of double-byte characters, use the expression the effective textFont of char 9 to 10 of field 1.
How to convert between Unicode (UTF-16) and UTF-8 text.
Revolution displays non-Roman-alphabet languages using Unicode (UTF-16). You use the uniDecode and uniEncode functions to convert between UTF-16 and UTF-8.
The following statement converts a variables contents from UTF-8 to UTF-16, and places the resulting Unicode text in a field:
put uniEncode(myVariable,"UTF8") into field "My Field"
How to convert between Unicode and ASCII text.
You use the uniEncode and uniDecode functions to convert text from double-byte (Unicode) to single-byte ASCII, or vice versa.
To convert a string of single-byte characters to Unicode text, use a statement like the following:
put uniEncode(field "Text") into myUnicodeText
To convert a string of double-byte characters to single-byte, use a statement like the following:
put uniDecode(the unicodeText of field "Japanese Text") into convertedText
How to import a Unicode text file.
You use the unicodeText property to import a file that contains Unicode text. To put the text from a Unicode file into a field, use a statement like the following in a handler or the message box:
set the unicodeText of field "Text" to URL "binfile:my.txt"
If the file contains text in multiple languages, Revolution automatically sets the textFont of language runs to the appropriate Unicode font.
Important! This method works only if the file you are importing contains Unicode (UTF-16) data. It will not work for other encoding methods such as UTF-8 or Shift-JIS.
Transcript language elements
useUnicode property: Specifies whether the charToNum and numToChar functions assume a character is double-byte.
unicodeText property: Specifies the text in a field, represented as Unicode (double-byte characters).
uniDecode function: Converts a string from Unicode to single-byte text.
uniEncode function: Converts a string from single-byte text to Unicode.
fontLanguage function: Returns the language associated with a Unicode font.