Main Page | Class List | File List | Class Members | File Members | Related Pages

ut_text.h File Reference

This graph shows which files directly or indirectly include this file:

Included by dependency graph

Go to the source code of this file.

Classes

struct  UtCharsetEval
 Contains evaluation of a charset. More...
struct  UtExtCharLine
 Refers to a line with extended characters. More...
struct  UtText
 Contains all the information about a text and its processing. More...

Typedefs

typedef enum UtTextFlags UtTextFlags
 Flags that control the recognition of a text.
typedef enum UtPassFlags UtPassFlags
 Flags that describe each step in the processing of a text.
typedef UtCharsetEval UtCharsetEval
 Contains evaluation of a charset.
typedef UtExtCharLine UtExtCharLine
 Refers to a line with extended characters.
typedef enum UtEolType UtEolType
 Types of End-of-line characters.
typedef short UtCharsetIndex
typedef UtText UtText
 Contains all the information about a text and its processing.

Enumerations

enum  UtTextFlags {
  UT_F_UNSET = 0, UT_F_FORCE_BINARY = 1<<0, UT_F_IDENTIFY_EOL = 1<<1, UT_F_TRANSFORM_EOL = 1<<2,
  UT_F_REMOVE_ILLEGAL_CHAR = 1<<3, UT_F_ADD_FINAL_EOL = 1<<4, UT_F_IDENTIFY_CHARSET = 1<<5, UT_F_REFERENCE_EXT_CHAR = 1<<6,
  UT_F_DEFAULT = UT_F_REMOVE_ILLEGAL_CHAR | UT_F_IDENTIFY_CHARSET
}
 Flags that control the recognition of a text. More...
enum  UtPassFlags {
  UT_PF_UNSET = 0, UT_PF_NONE = 1<<0, UT_PF_LOAD = 1<<1, UT_PF_RECOGNIZE = 1<<2,
  UT_PF_DISTRIB_PASS = 1<<3, UT_PF_EOL_PASS = 1<<4, UT_PF_XASCII_PASS = 1<<5, UT_PF_CONVERT = 1<<6,
  UT_PF_MAX = 1<<6
}
 Flags that describe each step in the processing of a text. More...
enum  UtEolType {
  UT_EOL_UNSET = -1, UT_EOL_CR, UT_EOL_LF, UT_EOL_CRLF,
  UT_EOL_LFCR, UT_EOL_MIX, UT_EOL_BSN, UT_EOL_NUL,
  UT_EOL_NONE
}
 Types of End-of-line characters. More...

Variables

const char * UT_EOL_NAME []
 Names of eol type.


Detailed Description

Author:
Antoine Calando (antoine@alliancemca.net)

Definition in file ut_text.h.


Typedef Documentation

typedef struct UtCharsetEval UtCharsetEval
 

Contains evaluation of a charset.

An array of this structure is instanciated in UtText and holds the result of the evaluation of each charset. The charset which get the best rating will be choosed for the conversion.

typedef enum UtEolType UtEolType
 

Types of End-of-line characters.

Different types are CRLF (DOS/Windows), LF (Unix), CR (Mac). The types CRLF_CR and CRLF_LF exists in some CSV databases : entries are ended with CRLF, but some fields may contains LF or CR alone to indicate a "carriage return" in the field. CR is the character 0xD, LF is 0xA.

Note:
EC le cas du LFCR n'est pas pris en compte (cela n'existe pas ?) AC Si! je ne l'ai pas rencontré, mais il faudrait le rajouter... (en fait il faudrait même modifier pas mal de trucs dans la reconnaissance de fins de ligne)

Referenced by ut_eol_pass().

typedef struct UtExtCharLine UtExtCharLine
 

Refers to a line with extended characters.

This structure refers to a line with extended characters. The list of lines with extended characters is filtered to exclude lines with same characters and is stocked in a linked list accessible from UtText.

typedef enum UtPassFlags UtPassFlags
 

Flags that describe each step in the processing of a text.

They are set by the user or by utrac to select which pass will be done, in ordrer to compute the of the process done for the 'progress bar' callback.

typedef struct UtText UtText
 

Contains all the information about a text and its processing.

This structure is created by ut_init_text() and destroyed by ut_free_text(). It is used to pass different arguments to ut_process_text(), and to stock information about the text all along its processing.

typedef enum UtTextFlags UtTextFlags
 

Flags that control the recognition of a text.

They are set by the user to tune the way the text will be analysed (during function ut_recognize() ). Some of them are unimplemented (UT_F_REFERENCE_EXT_CHAR, always true).


Enumeration Type Documentation

enum UtEolType
 

Types of End-of-line characters.

Different types are CRLF (DOS/Windows), LF (Unix), CR (Mac). The types CRLF_CR and CRLF_LF exists in some CSV databases : entries are ended with CRLF, but some fields may contains LF or CR alone to indicate a "carriage return" in the field. CR is the character 0xD, LF is 0xA.

Note:
EC le cas du LFCR n'est pas pris en compte (cela n'existe pas ?) AC Si! je ne l'ai pas rencontré, mais il faudrait le rajouter... (en fait il faudrait même modifier pas mal de trucs dans la reconnaissance de fins de ligne)
Enumeration values:
UT_EOL_MIX  Detection only.
UT_EOL_BSN 
, conversion only
UT_EOL_NUL  ASCII NUL character.

Definition at line 132 of file ut_text.h.

enum UtPassFlags
 

Flags that describe each step in the processing of a text.

They are set by the user or by utrac to select which pass will be done, in ordrer to compute the of the process done for the 'progress bar' callback.

Definition at line 67 of file ut_text.h.

enum UtTextFlags
 

Flags that control the recognition of a text.

They are set by the user to tune the way the text will be analysed (during function ut_recognize() ). Some of them are unimplemented (UT_F_REFERENCE_EXT_CHAR, always true).

Enumeration values:
UT_F_FORCE_BINARY  Force processing of the file even if it is detected as binary data.
UT_F_TRANSFORM_EOL  Replace EOL by null character to simplify the processing.
UT_F_REMOVE_ILLEGAL_CHAR  Remove control characters (except CR, LF and TAB).
UT_F_ADD_FINAL_EOL  Add a final EOL to the text if the last line is not empty.
UT_F_REFERENCE_EXT_CHAR  Register the lines that contains extended characters (unimplemented, always true).

Definition at line 44 of file ut_text.h.


Generated on Fri Feb 25 18:30:16 2005 for Utrac by  doxygen 1.3.9