IBM 15 Manual

A SERVICE OF

next previous

Appendix

B

BB

B

Unicode Support

Unicode Support in IBM SPSS Modeler

IBM® SPSS® Modeler is fully U nicode-enabled for both IBM ® SPSS® Modeler and IBM®

SPSS® Modeler Serv er. This makes it possible to exchange data with other applications that

support Unicode, including multi-lang uage databases, without any lo ss of information that might

be caused by conv ersion to or from a locale-speciﬁc encoding scheme.

 SPSS Modeler stores Unicode data internally and can read and write multi-language data

stored as Unicode in databases without loss.

 SPSS Modele

r can read and write UTF-8 encoded text ﬁles. Text ﬁle import and export

will default to the locale-encodi ng but support UTF-8 as an alternative. This setting can be

speciﬁed in the ﬁle import and export nodes, or the default encoding can be changed in th e

stream properties dialog box. For more information, see the topic Setting general options

for streams in Chapter 5 on p. 55.

 Statistics, SAS, and text data ﬁles stored in the locale-encodin g will be converted to UTF-8 on

import and back aga in on export. When writing to any ﬁle, if there are Unicode characters

that do not exist in the locale character set, they will be substituted and a wa rning will be

display ed . This should occur only where the data has been imported from a data source that

supports Unicode (a database or UTF-8 text ﬁle) and that contains characters from a different

locale or f

rom multiple locales or character sets .

 IBM® SPSS® Modeler Solution Publisher images are UTF - 8 encoded and are truly portable

betwee n platforms and locales.

About Unicode

The go al of the Unicode standard is to provide a consiste nt way to encode multilingual text so that

it can be easily shared across borders, locales, and a pplications. The Unicode Standa r d, now at

version 4.0.1, deﬁnes a character set that is a superset of all of the character sets in common use

in the world today and assigns to each cha r acter a un ique name and code point. The characters

and their code points are identical to those of the Universal Character Set (UCS) deﬁn ed by

ISO-10646. For more infor mation, see the Unicode Home Page (http://www.unicode.org).

© Copyright IBM Corporation 1994, 2012.

248