Main Page | Class List | File List | Class Members | File Members | Related Pages

ut_recognition2.c File Reference

Extended ASCII charset pass. More...

#include <stdlib.h>
#include <stdio.h>
#include "ut_text.h"
#include "ut_charset.h"
#include "utrac.h"
#include "debug.h"

Include dependency graph for ut_recognition2.c:

Include dependency graph

Go to the source code of this file.

Functions

char ut_get_pre_char (char **scan_pre, UtText *text)
 Move the scan_pre pointer to the previous character and return it.
char ut_get_post_char (char **scan_post, UtText *text, char *scan_end)
 Move the scan_post pointer to the next character and return it.
UtCode ut_xascii_pass (UtText *text)
 Rate each charset relatively yo the text and register lines with extended characters.


Detailed Description

Extended ASCII charset pass.

Author:
Antoine Calando (antoine@alliancemca.net)

Definition in file ut_recognition2.c.


Function Documentation

UtCode ut_xascii_pass UtText text  ) 
 

Rate each charset relatively yo the text and register lines with extended characters.

  • Rate single byte extended ascii charsets: the function scan the whole text. Each time an extended character is found, and for each charset, it is encoded in this charset, compared to the previous and following character(s), and depending on the result, some points are added to charset rating. For instance, "café" (Latin1) will get more points than "cafÈ" (MacRoman). The checksum of all the extended characters in each charset is also calculated, to determine which charsets will have the same result (see UtCharsetEval).
  • Register lines with extended chars: each time an extended character is found, and if that character was not already found, the line is registered in a linked list (see UtExtCharLine). After the whole text is scanned, the line linked list is filtered and sorted to keep only the most revelant lines.

Todo:
check if charmap exists!
Returns:
UT_OK on success, error code otherwise.

Definition at line 82 of file ut_recognition2.c.

References UtCharType::categorie, UtCharset::char_type, UtSession::charset, UtText::charset, UtCharsetEval::checksum, UtText::data, UtText::distribution, UtText::evaluation, UtText::ext_char, UtText::flags, UtExtCharLine::line_i, UtExtCharLine::line_p, UtCharset::name, UtSession::nb_charsets, UtExtCharLine::nb_ext_chars, UtExtCharLine::next, UtSession::progress_function, UtCharsetEval::rating, UtCharType::script, UtText::size, UtCharset::type, UtCharset::unicode, ut_crc32(), ut_get_post_char(), ut_get_pre_char(), UT_PROCESS_STEP, ut_update_progress(), ut_xascii_pass(), UtCateg, and UtScript.

Referenced by ut_recognize(), and ut_xascii_pass().

Here is the call graph for this function:


Generated on Fri Feb 25 18:30:16 2005 for Utrac by  doxygen 1.3.9