diff options
author | Hiltjo Posthuma <hiltjo@codemadness.org> | 2015-07-31 21:06:52 +0200 |
---|---|---|
committer | Hiltjo Posthuma <hiltjo@codemadness.org> | 2015-07-31 21:12:07 +0200 |
commit | 356e7d79925f91b9b703ee63e3680694c53a59a4 (patch) | |
tree | bc06b59ee637c2695055b62221abad696d66eb7c /README | |
parent | eb586eda26967183de91c314a57d323b124110bb (diff) |
Various improvements
- Only escape characters in "content" field, these can contain newlines.
- Trim newlines and tabs, etc from the title, id and author fields.
- Make decodefield, xmlencode functions easier to "chain" without allocatting
new buffers.
- Move printutf8pad from util (only used by sfeed_plain) to sfeed_plain.
- Update README, still need to update the man-page and improve the documentation
in general.
- Code cleanup.
Diffstat (limited to 'README')
-rw-r--r-- | README | 23 |
1 files changed, 14 insertions, 9 deletions
@@ -78,25 +78,30 @@ feeds.new - Temporary file used by sfeed_update to merge items. TAB-separated format -------------------- -The items are saved in a TSV-like format except newlines, tabs and -backslash are escaped with \ (\n, \t and \\). Other whitespace except -spaces are removed. +The items are saved in a TSV-like format. + +The fields: title, id, author are not allowed to have newlines, tabs, all +whitespace is replaced by a single space character. Control characters are +removed. + +The content field can contain newlines and is escaped. TABs, newline and '\' +are escaped with '\', so: '\n', '\t', and '\\'. Other whitespace characters +except space are removed. Control characters are also removed. The timestamp field is converted to a UNIX timestamp. The timestamp is also -stored as formatted as a separate field. The other fields are left untouched -(including HTML). +stored as formatted as a separate field. The order and format of the fields are: -item UNIX timestamp - string UNIX timestamp (UTC+0) +item UNIX timestamp - string UNIX timestamp (UTC+0). item formatted timestamp - string timestamp, YYYY-mm-dd HH:MM:SS (UTC[+-]HH:MM)|tz item title - string -item link - string, absolute url, unsafe characters are encoded +item link - string, absolute url, unsafe characters are encoded. item content - string -item contenttype - string, "html" or "plain" +item contenttype - string, "html" or "plain". item id - string item author - string -feed type - string, "rss" or "atom" +feed type - string, "rss" or "atom". CAVEAT: if a timezone is not supported (non-RFC-822) the UNIX timestamp is interpreted as UTC+0. |