Main Page | Class List | File List | Class Members | File Members | Related Pages

ut_recognition1.c File Reference

Distrib/utf8 pass and EOL pass. More...

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include "utrac.h"
#include "debug.h"

Include dependency graph for ut_recognition1.c:

Include dependency graph

Go to the source code of this file.

Defines

#define UT_DEBUG   1

Functions

bool ut_unicode_invalid (ulong unicode)
 Return false if unicode scalar value is invalid.
UtCode ut_distrib_utf_pass (UtText *text)
 Scan the text to calculate frequency distribution and UTF-8 correctness.
void ut_change_EOL1toEOL2 (char *beg, char *end)
 Change all UT_EOL_CHAR to UT_EOL_ALT_CHAR, from beg to end-1.
UtCode ut_eol_pass (UtText *text)
 Scan the text to detect EOL type and replace EOL by UT_EOL_CHAR or UT_EOL_ALT_CHAR.


Detailed Description

Distrib/utf8 pass and EOL pass.

Author:
Antoine Calando (antoine@alliancemca.net)

Definition in file ut_recognition1.c.


Function Documentation

void ut_change_EOL1toEOL2 char *  beg,
char *  end
 

Change all UT_EOL_CHAR to UT_EOL_ALT_CHAR, from beg to end-1.

Note:
EC pourquoi revenir en arrière ? AC Si on s'est trompé de type d'eol (un LF a été scanné avant un CRLF par ex)

Definition at line 194 of file ut_recognition1.c.

References UT_EOL_CHAR.

Referenced by ut_eol_pass().

UtCode ut_distrib_utf_pass UtText text  ) 
 

Scan the text to calculate frequency distribution and UTF-8 correctness.

This function calculate the frequency distribution, i.e. for i between 0 and 255, text->distribution [i] is equal to the number of bytes "i" in the text. This distribution is used to determinate if the file is binary or ASCII. The text is also simultaneously scanned to check for UTF-8 errors.

Returns:
UT_OK on success, UT_BINARY_DATA_ERROR if file is binary, error code otherwise.

Definition at line 62 of file ut_recognition1.c.

References UtText::charset, UtSession::charset, UtText::data, UtText::distribution, UtText::flags, UtSession::nb_charsets, UtSession::progress_function, UtText::size, UtCharset::type, ut_distrib_utf_pass(), UT_PROCESS_STEP, UT_THRESHOLD_CONTROL_CHAR, UT_THRESHOLD_UTF8, ut_unicode_invalid(), and ut_update_progress().

Referenced by ut_distrib_utf_pass(), and ut_recognize().

Here is the call graph for this function:

UtCode ut_eol_pass UtText text  ) 
 

Scan the text to detect EOL type and replace EOL by UT_EOL_CHAR or UT_EOL_ALT_CHAR.

EOL are recognized and replaced by UT_EOL_CHAR (null char), and eventually UT_EOL_ALT_CHAR if EOL type is UT_EOL_CRLF_CR or UT_EOL_CRLF_LF (see UtEolType). ut_session->progress_function() is called only if ( text->flags & UT_F_TRANSFORM_EOL )

Returns:
UT_OK on success, error code otherwise.

Definition at line 277 of file ut_recognition1.c.

References UtText::data, UtText::eol, UtText::eol_alt, UtText::flags, UtText::nb_lines, UtText::nb_lines_alt, UtSession::progress_function, UtText::size, UtText::skip_char, ut_change_EOL1toEOL2(), UT_EOF_CHAR, UT_EOL_CHAR, UT_EOL_MIX, ut_eol_pass(), UT_PROCESS_STEP, ut_update_progress(), and UtEolType.

Referenced by ut_eol_pass(), and ut_recognize().

Here is the call graph for this function:


Generated on Fri Feb 25 18:30:16 2005 for Utrac by  doxygen 1.3.9