<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Transcoding adventures with C</title>
	<atom:link href="http://blog.reverberate.org/2007/04/21/transcoding-adventures-with-c/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.reverberate.org/2007/04/21/transcoding-adventures-with-c/</link>
	<description>parsing, performance, minimalism with C99</description>
	<lastBuildDate>Mon, 06 Feb 2012 23:44:51 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<item>
		<title>By: Andrew</title>
		<link>http://blog.reverberate.org/2007/04/21/transcoding-adventures-with-c/comment-page-1/#comment-675</link>
		<dc:creator>Andrew</dc:creator>
		<pubDate>Thu, 07 Jun 2007 19:49:27 +0000</pubDate>
		<guid isPermaLink="false">http://blog.reverberate.org/2007/04/21/transcoding-adventures-with-c/#comment-675</guid>
		<description>Counting Multi-Byte Characters using C99 WCS
======================================
Multi-Byte character sets like UTF-8 have a variable number of bytes per character. The ISO C99 has a special form of the MBSTOWCS(3) call to count the characters.

To count the characters (not bytes), compute the wchar_t fixed-width character-length. Set the result wide-char buffer to NULL and its width to ZERO. Then, mbstowcs() will just count the number of characters without really building the wchar_t* string.

char *str = &quot;Some UTF-8 String&quot;;
/* Count the characters in a multi-byte character set: */
size_t chars = mbstowcs(NULL, str, 0);</description>
		<content:encoded><![CDATA[<p>Counting Multi-Byte Characters using C99 WCS<br />
======================================<br />
Multi-Byte character sets like UTF-8 have a variable number of bytes per character. The ISO C99 has a special form of the MBSTOWCS(3) call to count the characters.</p>
<p>To count the characters (not bytes), compute the wchar_t fixed-width character-length. Set the result wide-char buffer to NULL and its width to ZERO. Then, mbstowcs() will just count the number of characters without really building the wchar_t* string.</p>
<p>char *str = &#8220;Some UTF-8 String&#8221;;<br />
/* Count the characters in a multi-byte character set: */<br />
size_t chars = mbstowcs(NULL, str, 0);</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: josh</title>
		<link>http://blog.reverberate.org/2007/04/21/transcoding-adventures-with-c/comment-page-1/#comment-674</link>
		<dc:creator>josh</dc:creator>
		<pubDate>Thu, 07 Jun 2007 16:08:33 +0000</pubDate>
		<guid isPermaLink="false">http://blog.reverberate.org/2007/04/21/transcoding-adventures-with-c/#comment-674</guid>
		<description>Good catch Andrew!  My OS X box returns zero as well.  I guess I wasn&#039;t paying enough attention when I ran the test program initially.</description>
		<content:encoded><![CDATA[<p>Good catch Andrew!  My OS X box returns zero as well.  I guess I wasn&#8217;t paying enough attention when I ran the test program initially.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andrew</title>
		<link>http://blog.reverberate.org/2007/04/21/transcoding-adventures-with-c/comment-page-1/#comment-673</link>
		<dc:creator>Andrew</dc:creator>
		<pubDate>Thu, 07 Jun 2007 15:10:00 +0000</pubDate>
		<guid isPermaLink="false">http://blog.reverberate.org/2007/04/21/transcoding-adventures-with-c/#comment-673</guid>
		<description>ICONV(3) Return Value
===================
The manpage for ICONV(3) reports its return value as &quot;the number of characters converted in a non-reversible way&quot;. This is very different from the total number of characters in the converted buffer! A NON-ZERO return value really indicates a kind of error. Later, this string can&#039;t be converted back to the original charset.

On my Fedora Linux box, the sample program reports a return value of ZERO in the &quot;chars&quot; variable. This means that the LATIN-1 character (octal 370) is reversibly converted to UTF-8.

size_t chars = iconv(...);</description>
		<content:encoded><![CDATA[<p>ICONV(3) Return Value<br />
===================<br />
The manpage for ICONV(3) reports its return value as &#8220;the number of characters converted in a non-reversible way&#8221;. This is very different from the total number of characters in the converted buffer! A NON-ZERO return value really indicates a kind of error. Later, this string can&#8217;t be converted back to the original charset.</p>
<p>On my Fedora Linux box, the sample program reports a return value of ZERO in the &#8220;chars&#8221; variable. This means that the LATIN-1 character (octal 370) is reversibly converted to UTF-8.</p>
<p>size_t chars = iconv(&#8230;);</p>
]]></content:encoded>
	</item>
</channel>
</rss>

