blob: 59bec7b7694b8a1e9f88e5d81474aca3461aa395 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
|
.Dd December 25, 2014
.Dt SFEED 1
.Os
.Sh NAME
.Nm sfeed
.Nd simple RSS and Atom parser
.Sh SYNOPSIS
.Nm
.Op Ar baseurl
.Sh DESCRIPTION
.Nm
reads RSS or Atom feed data (XML) from stdin. It writes the feed data in a
TAB-separated format to stdout. A
.Ar baseurl
can be specified if the links in the feed are relative urls. It is
recommended to always have absolute urls in your feeds.
.Sh TAB-SEPARATED FORMAT FIELDS
The items are saved in a TSV-like format.
.Pp
The fields: title, id, author are not allowed to have newlines and TABs. All
whitespace is replaced by a single space character. Control characters are
removed.
.Pp
The content field can contain newlines and is escaped. TABs, newlines and '\\'
are escaped with '\\', so: '\\n', '\\t', and '\\\\'. Other whitespace
characters except space are removed. Control characters are removed.
.Pp
The timestamp field is converted to a UNIX timestamp. The timestamp is also
added as a formatted text field.
.Pp
The order and format of the fields are:
.Bl -tag -width 17n
.It item timestamp
UNIX timestamp in UTC+0, empty on parse failure.
.It item timestamp
Date and time in the format: YYYY-mm-dd HH:MM:SS (UTC[+-][HHMM])|tz.
.It item title
Title text, HTML in titles is treated as plain-text (on purpose).
.It item link
Absolute url, unsafe characters are encoded.
.It item content
Newlines and TABs are escaped. Control characters are removed. See the
.Sx TAB-SEPARATED FORMAT FIELDS
text.
.It item content\-type
"html" or "plain".
.It item id
RSS item GUID or Atom id.
.It item author
Item author.
.It feed type
"rss" or "atom".
.El
.Sh SEE ALSO
.Xr sfeed_plain 1 ,
.Xr sfeed_update 1 ,
.Xr sh 1
.Sh AUTHORS
.An Hiltjo Posthuma Aq Mt hiltjo@codemadness.org
.Sh CAVEATS
if a timezone is not supported (non-RFC-822) the UNIX timestamp is interpreted
as UTC+0.
HTML in titles is treated as plain-text (on purpose).
|