summaryrefslogtreecommitdiff
path: root/sfeed.5
blob: b21b7fea3c1653f3c822b6012e5acdb2a5e7c412 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
.Dd May 2, 2019
.Dt SFEED 5
.Os
.Sh NAME
.Nm sfeed
.Nd feed format
.Sh SYNOPSIS
.Nm
.Sh DESCRIPTION
.Xr sfeed 1
reads RSS or Atom feed data (XML) from stdin.
It writes the feed data in a TAB-separated format to stdout.
.Sh TAB-SEPARATED FORMAT FIELDS
The items are output per line in a TSV-like format.
.Pp
The fields: title, id, author are not allowed to have newlines and TABs, all
whitespace characters are replaced by a single space character.
Control characters are removed.
.Pp
The content field can contain newlines and is escaped.
TABs, newlines and '\\' are escaped with '\\', so it becomes: '\\t', '\\n'
and '\\\\'.
Other whitespace characters except spaces are removed.
Control characters are removed.
.Pp
The order and content of the fields are:
.Bl -tag -width 17n
.It timestamp
UNIX timestamp in UTC+0, empty on parse failure.
.It title
Title text, HTML code in titles is ignored and is treated as plain-text.
.It link
Absolute url, unsafe characters are encoded.
.It content
Content, can have plain-text or HTML code depending on the content-type field.
.It content-type
"html" or "plain".
.It id
RSS item GUID or Atom id.
.It author
Item author.
.It enclosure
Item, first enclosure.
.El
.Sh SEE ALSO
.Xr sfeed 1 ,
.Xr sfeed_plain 1
.Sh AUTHORS
.An Hiltjo Posthuma Aq Mt hiltjo@codemadness.org
.Sh CAVEATS
If a timezone is not in the RFC-822 or RFC-3339 format it is not supported and
the UNIX timestamp is interpreted as UTC+0.
.Pp
HTML in titles is treated as plain-text.