.Dd December 25, 2014 .Dt SFEED 1 .Os .Sh NAME .Nm sfeed .Nd simple RSS and Atom parser .Sh SYNOPSIS .Nm .Op Ar baseurl .Sh DESCRIPTION .Nm reads RSS or Atom feed data (XML) from stdin. It writes the feed data in a TAB-separated format to stdout. A .Ar baseurl can be specified if the links in the feed are relative urls. It is recommended to always have absolute urls in your feeds. .Sh TAB-SEPARATED FORMAT FIELDS The items are saved in a TSV-like format. .Pp The fields: title, id, author are not allowed to have newlines and TABs. All whitespace is replaced by a single space character. Control characters are removed. .Pp The content field can contain newlines and is escaped. TABs, newlines and '\\' are escaped with '\\', so: '\\n', '\\t', and '\\\\'. Other whitespace characters except space are removed. Control characters are removed. .Pp The timestamp field is converted to a UNIX timestamp. The timestamp is also added as a formatted text field. .Pp The order and format of the fields are: .Bl -tag -width 17n .It Ar item timestamp UNIX timestamp in UTC+0, empty on parse failure. .It Ar item timestamp Date and time in the format: YYYY-mm-dd HH:MM:SS (UTC[+-][HHMM])|tz. .It Ar item title Title text, HTML in titles is treated as plain-text (on purpose). .It Ar item link Absolute url, unsafe characters are encoded. .It Ar item content Newlines and TABs are escaped. Control characters are removed. See the .Sx TAB-SEPARATED FORMAT FIELDS text. .It Ar item content\-type "html" or "plain". .It Ar item id RSS item GUID or Atom id. .It Ar item author Item author. .It Ar feed type "rss" or "atom". .El .Sh SEE ALSO .Xr sfeed_plain 1 , .Xr sfeed_update 1 , .Xr sh 1 .Sh AUTHORS .An Hiltjo Posthuma Aq Mt hiltjo@codemadness.org .Sh CAVEATS if a timezone is not supported (non-RFC-822) the UNIX timestamp is interpreted as UTC+0. HTML in titles is treated as plain-text (on purpose).