Character set handling

OpenConnect development started in 2008 on a modern Linux box, and as such the character set handling was extremely simplistic. It boiled down to the simple but reasonable assumption that "everything is UTF-8, all of the time". This was the case up to and including the OpenConnect 6.00 release in July 2014.

Since its inception, however, OpenConnect has been ported to various less progressive POSIX-based systems and also to Windows, which has its own particular style of charset insanity. It was therefore necessary to implement some explicit handling for character set conversion.

The design of this character set handling is that the internal libopenconnect library still handles every string as UTF-8. All input and output of the library remains UTF-8, and all callers of the library are expected to handle them appropriately. For the GNOME and KDE GUI tools, this should come naturally as all strings are expected to be UTF-8 there. For the command-line tool openconnect itself, implemented in main.c, this means that character set conversion is done on all terminal input and output, and all arguments provided on the command line.

Where it is necessary to open files or interact with the system in other ways using the legacy character set, libopenconnect will do the required conversion transparently. On POSIX systems with legacy non-UTF-8 character sets, it will use iconv to convert, while on Windows it will convert to UTF-16 and use the wide character (so-called "Unicode") APIs instead.