This framework was developed for Evergreen and is made available here for developers who just need the parsing code. It has no depencies that aren’t provided by the system.
This framework includes parsers for:
It also includes Objective-C wrappers for libXML2’s XML SAX and HTML SAX parsers. You can write your own parsers on top of these.
This framework builds for macOS. It could be made to build for iOS also, but I haven’t gotten around to it yet.
How to parse feeds
To get the type of a feed, even with partial data, call
FeedParser.feedType(parserData), which will return a
To parse a feed, call
FeedParser.parse(parserData), which will return a ParsedFeed. Also see related structs:
You do not need to know the type of feed when calling
FeedParser.parse — it will figure it out and use the correct concrete parser.
(Note: if you want to write a feed reader app, please do! You have my blessing and encouragement. Let me know when it’s shipping so I can check it out.)
How to parse OPML
+[RSOPMLParser parseOPMLWithParserData:error:], which returns an
RSOPMLDocument. See related objects:
How to parse dates
RSDateParser). These handle the common internet date formats. You don’t need to know which format.
How to parse HTML
To get an array of
<a href=… links from from an HTML document, call
+[RSHTMLLinkParser htmlLinksWithParserData:]. It returns an array of
To parse the metadata in an HTML document, call
+[RSHTMLMetadataParser HTMLMetadataWithParserData:]. It returns an
To write your own HTML parser, see
RSSAXHTMLParser. The two parsers above can serve as examples.
How to parse HTML entities
When you have a string with things like
ë and you want to turn those into the correct characters, call
-[NSString rsparser_stringByDecodingHTMLEntities]. (See
How to parse XML
If you need to parse some XML that isn’t RSS, Atom, or OPML, you can use
RSSAXParser. Don’t subclass it — instead, create an
RSOPMLParser as examples.
Why use libXML2’s SAX API?
SAX is kind of a pain because of all the state you have to manage.
An alternative is to use
NSXMLParser, which is event-driven like SAX. However,
RSSAXParser was written to avoid allocating Objective-C objects except when absolutely needed. You’ll note use of things like
Normally I avoid this kind of thing strenuously. I prefer to work at the highest level possible.
But my more-than-a-decade of experience parsing XML has led me to this solution, which — last time I checked, which was, admittedly, a few years ago — was not only fastest but also uses the least memory. (The two things are related, of course: creating objects is bad for performance, so this code attempts to do the minimum possible.)
All that low-level stuff is encapsulated, however. If you just want to parse one of the popular feed formats, see
FeedParser, which makes it easy and Swift-y.
Everything here is thread-safe.
Everything’s pretty fast, too, so you probably could just use the main thread/queue. But it’s totally a-okay to use a non-serial background queue.