I set out a couple of weeks ago to port the Killer Deals application from the iPhone 2.2.1 SDK to 3.0. “No problem”, I thought; the application doesn’t veer far from the standard APIs and controls, and it is a relatively straightforward application, implementation-wise.
As I was making my updates (some minor method and property name changes here and there), I stumbled upon a show stopper: The NSXMLParser was no longer parsing my XML data. After scratching my head for a few hours, I finally figured out the problem.
It seems Apple has removed support for Windows-1252 character encoding in the latest iteration of the NSXMLParser. I used this encoding to prevent the appearance of those funny characters you occasionally see on a web page. Those funny characters are caused by a character-encoding mismatch between a text renderer and the document. Some Microsoft tools use the Windows-1252 character encoding for documents by default. Unfortunately, some of the Killer Deals data is only available in this encoding.
While the previous version would give a NSXMLParserUnknownEncodingError and continue to parse, the new version simply stops parsing after the error (although strangely, the parserDidStartDocument() method is still called after the error).
At this point, I was going to force the encoding to UTF-8 on the server, and put a hack in the code to convert the ‘invalid’ characters to the proper UTF-8 equivalents. That ended up in an invalid character error via the parser.
It
seems Apple has degraded the parser and offers less tolerance for ‘unsupported’
formats. Does this have anything to do with bad blood between Apple and
Microsoft? Hmmm…








I found a way around this problem by loading a string from the url with windows-1252 encoding, coverting it to a data object, converting it back to a UTF-8 string, then parsing the string. Thanks to this article for setting me straight: http://stackoverflow.com/questions/525001/parsing-a-url-from-xml-with-amp-in-it
Posted by: Doug Kadlecek | May 26, 2009 at 12:32 PM