NAME
tconv_ext - tconv extended API
SYNOPSIS
#include <tconv.h>
tconv_t tconv_open_ext(const char *tocodes, const char *fromcodes, tconv_option_t *tconvOptionp);
void tconv_trace_on(tconv_t tconvp);
void tconv_trace_off(tconv_t tconvp);
void tconv_trace(tconv_t tconvp, const char *fmts, ...);
char *tconv_error_set(tconv_t tconvp, const char *msgs);
char *tconv_error(tconv_t tconvp);
char *tconv_fromcode(tconv_t tconvp);
char *tconv_tocode(tconv_t tconvp);
short tconv_helper(tconv_t tconvp,
void *contextp,
short (*producerp)(void *contextp, char **bufpp, size_t *countlp, short *eofbp),
short (*consumerp)(void *contextp, char *bufp, size_t countl, short eofb, size_t *resultlp)
);
DESCRIPTION
tconv extended API is providing more entry points to query or control how tconv behaves: tconv is a generic layer on top of iconv(), ICU, etc. Therefore additional semantic is needed.
METHODS
tconv_open_ext
tconv_t tconv_open_ext(const char *tocodes, const char *fromcodes, tconv_option_t *tconvOptionp);
typedef void (*tconvTraceCallback_t)(void *userDatavp, const char *msgs);
typedef struct tconv_option {
tconv_charset_t *charsetp;
tconv_convert_t *convertp;
tconvTraceCallback_t traceCallbackp;
void *traceUserDatavp;
const char *fallbacks;
} tconv_option_t;
tconv support two engine types: one for charset detection, one for character conversion. Each engine as its own option structure:
- charsetp
-
Describe charset engine options.
- convertp
-
Describe convertion engine options.
Logging is provided through the genericLogger package, and the developper may provide a function pointer with an associated context:
- traceCallbackp
-
A function pointer.
- traceUserDatavp
-
Function pointer opaque context.
- fallbacks
-
Fallback charset when user gave none and the guess failed.
If tconvOptionp is NULL, defaults will apply. Otherwise, if charsetp is NULL charset defaults apply, if convertp is NULL convertion defaults apply, and if traceCallbackp is NULL, no logging is possible.
charset engine
A charset engine may support three entry points:
typedef void *(*tconv_charset_new_t) (tconv_t tconvp, void *optionp);
typedef char *(*tconv_charset_run_t) (tconv_t tconvp, void *contextp, char *bytep, size_t bytel);
typedef void (*tconv_charset_free_t)(tconv_t tconvp, void *contextp);
All entry points start with a tconvp pointer (that they can use to trigger logging, error setting).
The new is optional, have a pointer to an opaque (from tconv point of view) data area, and return a charset specific opaque context. If new is not NULL, then free must not be NULL, and will be called with the charset specific context pointer returned by new. When new is NULL, the charset specific context will be NULL.
The only required entry point is run, with a pointer to bytes, and the number of bytes.
charsetp must point to a structure defined as:
typedef struct tconv_charset {
enum {
TCONV_CHARSET_EXTERNAL = 0,
TCONV_CHARSET_PLUGIN,
TCONV_CHARSET_ICU,
TCONV_CHARSET_CCHARDET,
} charseti;
union {
tconv_charset_external_t external;
tconv_charset_plugin_t plugin;
tconv_charset_ICU_option_t *ICUOptionp;
tconv_charset_cchardet_option_t *cchardetOptionp;
} u;
} tconv_charset_t;
i.e. a charset engine can be of four types:
- TCONV_CHARSET_EXTERNAL
-
An external charset engine type is a structure that give explicitly the three entry points described at the beginning of this section, and a pointer to an opaque charset specific option area. It is defined as:
typedef struct tconv_charset_external { void *optionp; tconv_charset_new_t tconv_charset_newp; tconv_charset_run_t tconv_charset_runp; tconv_charset_free_t tconv_charset_freep; } tconv_charset_external_t; - TCONV_CHARSET_PLUGIN
-
The charset engine is dynamically loaded. A plugin definition is:
typedef struct tconv_charset_plugin { void *optionp; char *news; char *runs; char *frees; char *filenames; } tconv_charset_plugin_t;i.e. tconv will use
filenamesas the path of a shared library and will try to load it.optionpis a pointer to a charset specific option area. tconv will look to the three entry points namednews,runsandfrees:- news
-
If
newsis NULL, environment variableTCONV_ENV_CHARSET_NEW, elsetconv_charset_newpwill be looked at. - runs
-
If
runsis NULL, environment variableTCONV_ENV_CHARSET_RUN, elsetconv_charset_runpwill be looked at. - frees
-
If
freesis NULL, environment variableTCONV_ENV_CHARSET_FREE, elsetconv_charset_freepwill be looked at.
Please note that dynamically load is not always thread-safe, and tconv will not try to adapt to this situation. Therefore, it is up to the caller to make sure that tconv_open_ext() is called within a context that is not affected by an eventual non-thread-safe workflow (e.g. typically within a critical section, or at program startup).
- TCONV_CHARSET_ICU
-
ICU built-in, available when tconv has been compiled with ICU. If tconv has not been compiled with such support,
TCONV_CHARSET_ICUremain available, but using it will fail.If
ICUOptionpis not NULL, it must be a pointer to a structure defined as:typedef struct tconv_charset_ICU_option { int confidencei; } tconv_charset_ICU_option_t;where
confidenceiis the minimum accepted confidence level. If NULL, a default of 10 is used, unless the environment variableTCONV_ENV_CHARSET_ICU_CONFIDENCEis set. - TCONV_CHARSET_CCHARDET
-
cchardet built-in, always available.
If
cchardetOptionpis not NULL, it must be a pointer to a structure defined as:typedef struct tconv_charset_cchardet_option { float confidencef; } tconv_charset_cchardet_option_t;where
confidencefis the minimum accepted confidence level. If NULL, a default of 0.4f is used. This can also be set via the environment variableTCONV_ENV_CHARSET_CCHARDET_CONFIDENCE.
convert engine
A convert engine may support three entry points:
typedef void *(*tconv_convert_new_t) (tconv_t tconvp, const char *tocodes, const char *fromcodes, void *optionp);
typedef size_t (*tconv_convert_run_t) (tconv_t tconvp, void *contextp, char **inbufsp, size_t *inbytesleftlp, char **outbufsp, size_t *outbytesleftlp);
typedef int (*tconv_convert_free_t)(tconv_t tconvp, void *contextp);
All entry points start with a tconvp pointer.
The new is optional, have a pointer to an opaque (from tconv point of view) data area, and return a convert specific opaque context. If new is not NULL, then free must not be NULL, and will be called with the convert specific context pointer returned by new. When new is NULL, the convert specific context will be NULL.
The only required entry point is run, with additional parameters that are the iconv() semantics: pointers to
convertp must point to a structure defined as:
typedef struct tconv_convert {
enum {
TCONV_CONVERT_EXTERNAL = 0,
TCONV_CONVERT_PLUGIN,
TCONV_CONVERT_ICU,
TCONV_CONVERT_ICONV
} converti;
union {
tconv_convert_external_t external;
tconv_convert_plugin_t plugin;
tconv_convert_ICU_option_t *ICUOptionp;
tconv_convert_iconv_option_t *iconvOptionp;
} u;
} tconv_convert_t;
i.e. a convert engine can be of four types:
- TCONV_CONVERT_EXTERNAL
-
An external convert engine type is a structure that give explicitly the three entry points described above, and a pointer to an opaque convert specific option area. It is defined as:
typedef struct tconv_convert_external { void *optionp; tconv_convert_new_t tconv_convert_newp; tconv_convert_run_t tconv_convert_runp; tconv_convert_free_t tconv_convert_freep; } tconv_convert_external_t; - TCONV_CONVERT_PLUGIN
-
The convert engine is dynamically loaded. A plugin definition is:
typedef struct tconv_convert_plugin { void *optionp; char *news; char *runs; char *frees; char *filenames; } tconv_convert_plugin_t;i.e. tconv will use
filenamesas the path of a shared library and will try to load it.optionpis a pointer to a convert specific option area. tconv will look to the three entry points namednews,runsandfrees:- news
-
If
newsis NULL, environment variableTCONV_ENV_CONVERT_NEW, elsetconv_convert_newpwill be looked at. - runs
-
If
runsis NULL, environment variableTCONV_ENV_CONVERT_RUN, elsetconv_convert_runpwill be looked at. - frees
-
If
freesis NULL, environment variableTCONV_ENV_CONVERT_FREE, elsetconv_convert_freepwill be looked at.
Same remark about thread-safety as for the charset engine.
- TCONV_CONVERT_ICU
-
ICU built-in, available when tconv has been compiled with ICU. If tconv has not been compiled with such support,
TCONV_CONVERT_ICUremain available, but using it will fail.If
ICUOptionpis not NULL, it must be a pointer to a structure defined as:typedef struct tconv_convert_ICU_option { size_t uCharCapacityl; short fallbackb; int signaturei; } tconv_convert_ICU_option_t;containing:
- uCharCapacityl
-
ICU convertion always go through an UTF-16 internal buffer by design.
uCharCapacitylis the number of bytes of this internal intermediary buffer. The default is 4096, unless environment variableTCONV_ENV_CONVERT_ICU_UCHARCAPACITYis set. - fallbackb
-
ICU convertion has an optional fallback mechanism for unknown characters. Default value is a false value, unless
TCONV_ENV_CONVERT_ICU_FALLBACKis set. - signaturei
-
A signature may be added or removed on demand. If
signatureiis lower than zero, signature is removed. Ifsignatureiis higher than zero, signature is added. Else ICU default will apply. Default is 0, unlessTCONV_ENV_CONVERT_ICU_SIGNATUREis set.
- TCONV_CONVERT_ICONV
-
iconv built-in, always available. No special option.
tconv_trace_on
void tconv_trace_on(tconv_t tconvp);
Set tracing. Then any call to tconv_trace() will trigger a call to traceCallbackp given in tconv_open_ext()'s option structure.
tconv_trace_off
void tconv_trace_off(tconv_t tconvp);
Unset tracing.
tconv_trace
void tconv_trace(tconv_t tconvp, const char *fmts, ...);
Formats a message string and call traceCallbackp if tracing is on.
tconv_error_set
char *tconv_error_set(tconv_t tconvp, const char *msgs);
Set a string that should a contain a more accurate description of the last error. Any engine should use that when a specific description exist. Default is use system's errno description.
tconv_error
char *tconv_error(tconv_t tconvp);
Get the latest value of specific error string.
tconv_fromcode
char *tconv_fromcode(tconv_t tconvp);
Get the source codeset.
tconv_tocode
char *tconv_tocode(tconv_t tconvp);
Get the destination codeset.
tconv_helper
short tconv_helper(tconv_t tconvp,
void *contextp,
short (*producerp)(void *contextp, char **bufpp, size_t *countlp, short *eofbp),
short (*consumerp)(void *contextp, char *bufp, size_t countl, short eofb, size_t *resultlp)
);
From an end-user point of viez, the only important thing is to produce bytes that must be converted and to consume them. The tconv_helper method is totally hiding all the iconv API subtilities, leaving only the two methods that are meaningul for the vast majority of applications. The parameters are:
NOTES
- tracing
-
tconv can trace itself, unless tconv has been compiled with -DTCONV_NDEBUG, which is the default. When compiled without -DTCONV_NDEBUG, default tracing level is 0, unless environment variable
TCONV_ENV_TRACEis set and the value of the later is a true value. - specific error string
-
tconv internally limit the length of such string to 1024 bytes (including NUL).
- normalized charset name
-
A charset name contains only letters in the range [a-z0-9+.:].