1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
|
.Dd April 27, 2021
.Dt SFEED 1
.Os
.Sh NAME
.Nm sfeed
.Nd RSS and Atom parser
.Sh SYNOPSIS
.Nm
.Op Ar baseurl
.Sh DESCRIPTION
.Nm
reads RSS or Atom feed data (XML) from stdin.
It writes the feed data in a TAB-separated format to stdout.
A
.Ar baseurl
can be specified if the links or enclosures in the feed are relative URLs.
If the
.Ar baseurl
is a valid absolute URL then the relative links or enclosures will be
made absolute.
.Sh TAB-SEPARATED FORMAT FIELDS
The items are output per line in a TSV-like format.
.Pp
The fields: title, id, author are not allowed to have newlines and TABs, all
whitespace characters are replaced by a single space character.
Control characters are removed.
.Pp
The content field can contain newlines and is escaped.
TABs, newlines and '\\' are escaped with '\\', so it becomes: '\\t', '\\n'
and '\\\\'.
Other whitespace characters except spaces are removed.
Control characters are removed.
.Pp
The order and content of the fields are:
.Bl -tag -width 12n
.It timestamp
UNIX timestamp in UTC+0, empty if missing or on parse failure.
.It title
Title text, HTML code in titles is ignored and is treated as plain-text.
.It link
Link
.It content
Content, can have plain-text or HTML code depending on the content-type field.
.It content-type
"html" or "plain" if it has content.
.It id
RSS item GUID or Atom id.
.It author
Item author.
.It enclosure
Item, first enclosure.
.It category
Item, categories, multiple values are separated by |.
.El
.Sh EXIT STATUS
.Ex -std
.Sh EXAMPLES
.Bd -literal
curl -s 'https://codemadness.org/atom.xml' | sfeed
.Ed
.Sh SEE ALSO
.Xr sfeed_plain 1 ,
.Xr sfeed_update 1 ,
.Xr sfeed 5 ,
.Xr sfeedrc 5
.Sh AUTHORS
.An Hiltjo Posthuma Aq Mt hiltjo@codemadness.org
.Sh CAVEATS
If a timezone is not in the RFC-822 or RFC-3339 format it is not supported and
the UNIX timestamp is interpreted as UTC+0.
.Pp
HTML in titles is treated as plain-text.
|