Frequently Asked Questions

Q: Is the recognition always accurate?

A: Utrac can easily differentiates charsets from different systems like ISO-8859-1 (Unix), CP1252 (Windows), CP437 (DOS), Mac Roman (Apple) or UTF-8. However, it is much more difficult to distinguish between charset that are almost identical like ISO-8859-1 and ISO-8859-15, or ISO-8858-7 and CP1253.

For this reason, it is possible to specify a language (or more specifically a charset) that will give increase the probability for all charsets that "fit" tho this language to be selected.

Q: How does it work?

A: It rates the given text in many different charsets (those listed in its configuration file) and select the one with the highest mark. It can then give the name of this charset or convert it to an other charset.

To rate a text in a specific charset, Utrac considers that the text is encoded in this charset, then it analyses each word, symbol and ponctuation and rate it according to simple rules, for instance the presence of uppercases in a word : "café" (ISO-8859-1) will have a better mark than "cafÈ" (Mac Roman).

Q: Which charsets are supported?

A: All the ASCII-derivated, single-byte charsets can be supported. It covers :

ISO-8859-1 to ISO-8859-16 (Unix)
CP1250 to CP1258 (Windows)
CP437, CP7xx, CP8xx (DOS)
MacRoman, MacCentralEuropean, etc... (Apple)
KOI8-U and KOI8-R

Other charsets can be easily added by appending their charmap (correspondance with Unicode) in the configuration file, but only if they meet the preceding restriction.

UTF-8 is also supported (and its detection is unambigous).

Encoding like UTF-16, 32, HTML (&#nnn form), quoted-printable are still not supported (Ok, Utrac is not as universal as it claims to be).

Q: How can I help?

A: Send me your text files that were not well recognized by Utrac!

Q: Utrac is released under a proprietary license? Why?

A: Utrac is released under GPL and under a proprietary license (like Qt).

This project, before being Utrac, was in fact a module of a closed-source application, SafeSMS, developped by the Alliance MCA company. It has been finally decided to spin off this module in order to release it as free software, since there were few tools available for character set recognition. Because SafeSMS is staying proprietary and is linked against Utrac, the library is also proprietary.

If you are just a user, you don't need to worry. Just use Utrac as any other GPL piece of software. But if you want to contribute and if your contribution is integrated to Utrac, you have to know that it will be also used for applications developped as proprietary software at Alliance MCA (but nowhere else). Don't think that we want to steal the work of generous contributors ;^), it just that developing and releasing Utrac already costed time and money to the company, and there's is not enough left to maintain two different versions of Utrac.