perl iconmysql iconunicode icon

Perl, MySQL and UTF-8

Posted in , , , Mon, 02 Oct 2006 15:35:00 GMT

One of the mysteries of Perl to me is that why, as of yet, is there no UTF-8 support in DBD::mysql although this issue has been discussed on the msql-mysql-modules list since at least 2003 (using the MARC archives). This is also given that MySQL does have UTF-8 support itself.

Read more...
del.icio.us:Perl, MySQL and UTF-8 digg:Perl, MySQL and UTF-8 reddit:Perl, MySQL and UTF-8 spurl:Perl, MySQL and UTF-8 wists:Perl, MySQL and UTF-8 simpy:Perl, MySQL and UTF-8 newsvine:Perl, MySQL and UTF-8 blinklist:Perl, MySQL and UTF-8 furl:Perl, MySQL and UTF-8 fark:Perl, MySQL and UTF-8 blogmarks:Perl, MySQL and UTF-8 Y!:Perl, MySQL and UTF-8 smarking:Perl, MySQL and UTF-8 magnolia:Perl, MySQL and UTF-8 segnalo:Perl, MySQL and UTF-8

8 comments

postgresql iconperl iconunicode icon

Perl - Strictify utf8 to UTF-8

Posted in , , Fri, 29 Sep 2006 18:21:00 GMT

Perl has two UTF-8 encodings, utf8 which is Perl's liberal version and UTF-8 which is a strict interpretation, aka utf-8-strict. The liberal version allows for encoded characters outside the UTF-8 character set, however you can run into problems when interoperating with applications that expect utf-8-strict, such as PostgreSQL. Here's a function I wrote to strictify utf8 to UTF-8 using the Encode core module:

use Encode;

sub strictify_utf8 {
    my $data = shift;
    if (Encode::is_utf8($data) && !Encode::is_utf8($data,1)) {
        Encode::_utf8_off($data);
        Encode::from_to($data, 'utf8', 'UTF-8');
        Encode::_utf8_on($data);
    }
    return $data;
}
del.icio.us:Perl - Strictify utf8 to UTF-8 digg:Perl - Strictify utf8 to UTF-8 reddit:Perl - Strictify utf8 to UTF-8 spurl:Perl - Strictify utf8 to UTF-8 wists:Perl - Strictify utf8 to UTF-8 simpy:Perl - Strictify utf8 to UTF-8 newsvine:Perl - Strictify utf8 to UTF-8 blinklist:Perl - Strictify utf8 to UTF-8 furl:Perl - Strictify utf8 to UTF-8 fark:Perl - Strictify utf8 to UTF-8 blogmarks:Perl - Strictify utf8 to UTF-8 Y!:Perl - Strictify utf8 to UTF-8 smarking:Perl - Strictify utf8 to UTF-8 magnolia:Perl - Strictify utf8 to UTF-8 segnalo:Perl - Strictify utf8 to UTF-8

no comments

perl iconunicode icon

Perl - Getting a Unicode Character's Hex Codepoint

Posted in , Fri, 29 Sep 2006 18:00:00 GMT

I recently responded to someone asking how to get a Unicode hex codepoint from a Unicode literal on DevShed Forums. Since I think it may be more generally useful, here's my solution. The following function takes a unicode literal, converts it to a decimal representation using unpack and then converts it to hex usning sprintf:

sub codepoint_hex {
    if (my $char = shift) {
        return sprintf '%2.2x', unpack('U0U*', $char);
    }
}

my $cp = codepoint_hex('カ'); # eq '30ab'
Read more...
del.icio.us:Perl - Getting a Unicode Character's Hex Codepoint digg:Perl - Getting a Unicode Character's Hex Codepoint reddit:Perl - Getting a Unicode Character's Hex Codepoint spurl:Perl - Getting a Unicode Character's Hex Codepoint wists:Perl - Getting a Unicode Character's Hex Codepoint simpy:Perl - Getting a Unicode Character's Hex Codepoint newsvine:Perl - Getting a Unicode Character's Hex Codepoint blinklist:Perl - Getting a Unicode Character's Hex Codepoint furl:Perl - Getting a Unicode Character's Hex Codepoint fark:Perl - Getting a Unicode Character's Hex Codepoint blogmarks:Perl - Getting a Unicode Character's Hex Codepoint Y!:Perl - Getting a Unicode Character's Hex Codepoint smarking:Perl - Getting a Unicode Character's Hex Codepoint magnolia:Perl - Getting a Unicode Character's Hex Codepoint segnalo:Perl - Getting a Unicode Character's Hex Codepoint

2 comments