NAME

tconv - iconv-like interface with automatic charset detection

SYNOPSIS

#include <tconv.h>

size_t tconv(tconv_t cd,
             char **inbuf, size_t *inbytesleft,
             char **outbuf, size_t *outbytesleft);

DESCRIPTION

tconv is like iconv, but without the need to know the input charset. Caller might want to play with macros e.g.

#define iconv_t                       tconv_t
#define iconv_open(tocode, fromcode)  tconv_open(tocode, fromcode)
#define iconv(cd, ipp, ilp, opp, olp) tconv(cd, ipp, ilp, opp, olp)
#define iconv_close(cd)               tconv_close(cd)

When calling tconv_open:

tconv_open(const char *tocode, const char *fromcode)

it is legal to have NULL for fromcode. In this case the first chunk of input will be used for charset detection, it is therefore recommended to use enough bytes at the very beginning. If fromcode is not NULL, no charset detection will occur, and tconv will behave like iconv(3), modulo the engine being used (see below). If tocode is NULL, it will default to fromcode.

ENGINES

tconv support two engine types: one for charset detection, one for character conversion, please refer to the tconv_open_ext documentation for technical details. Engines, whatever their type, are supposed to have three entry points: new, run and free. They can be:

external

The application already have the new, run and free entry points.

plugin

The application give the path of a shared library, and tconv will look at it.

built-in

Python's cchardet charset detection engine, bundled with tconv, is always available. If tconv is compiled with ICU support, then ICU charset and conversion engines will be available. If tconv is compiled with ICONV support, then ICONV conversion engine will be available.

DEFAULTS

charset detection

The default charset detection engine is cchardet, bundled statically with tconv.

character conversion

The default character conversion engine is ICU, if tconv has been compiled with ICU support, else ICONV if compiled with ICONV support, else none.

NOTES

Windows platform

On Windows, an ICONV-like conversion engine is always available, via the win-iconv package, bundled with tconv.

iconv compliance
semantics

tconv() only guarantees that his plug-ins support the //TRANSLIT and //IGNORE iconv notation.

output

It is guaranteed that tconv() will behave exactly like iconv() if the character conversion engine is ICONV on an UNIX platform, since in this case tconv() will call iconv() internally. In any other case, the plug-ins have a best-effort policy to behave like iconv.

POSIX compliance

By POSIX compliance, we mean that, when the output buffer is too small, iconv should stop updating the input and output pointers prior to when the limit is reached. When the character conversion engine is ICONV on an UNIX platform, it is the behaviour of this UNIX platform that happen. In any other case, the plug-ins guarantee at least that input and output pointers are left in a state that, if being called again, will correctly handle the continuation of the conversion.

SEE ALSO

tconv_ext(3), iconv(3), cchardet, win-iconv, ICU