Age | Commit message (Collapse) | Author |
|
|
|
"man sfeed" now hopefully more quickly gives a better overview how the tools
work together. Reference the README for extended examples and use-cases.
|
|
|
|
The Atom RFC standard supports multiple authors, but this is usually one.
The first author is parsed.
|
|
|
|
|
|
This makes it slightly easier to lookup fields and map the fields by field
number in scripts (awk, cut) etc.
|
|
... if there is no content.
|
|
The shellscript is optional, but reference it in the documentation.
|
|
Removed/rewritten the functions:
absuri, parseuri, and encodeuri() for percent-encoding.
The functions are now split separately with the following purpose:
- uri_format: format struct uri into a string.
- uri_hasscheme: quick check if a string is absolute or not.
- uri_makeabs: make a URI absolute using a base uri and the original URI.
- uri_parse: parse a string into a struct uri.
The following URLs are better parsed:
- URLs with extra "/"'s in the path prepended are kept as is, no "/" is added
either for empty paths.
- URLs like "http://codemadness.org" are not changed to
"http://codemadness.org/" anymore (paths are kept as is, unless they are
non-empty and not start with "/").
- Paths are not percent-encoded anymore.
- URLs with userinfo field (username, password) are parsed.
like: ftp://user:password@[2001:db8::7]:2121/rfc/rfc1808.txt
- Non-authoritive URLs like mailto:some@email.org, magnet URIs, ISBN URIs/urn,
like: urn:isbn:0-395-36341-1 are allowed and parsed correctly.
- Both local (file:///) and non-local (file://) are supported.
- Specifying a base URL with a port will now only use it when the relative URL
has no host and port set and follows RFC3986 5.2.2 more closely.
- Parsing numeric port: parse as signed long and check <= 0, empty port is
allowed.
- Parsing URIs containing query, fragment, but no path separator (/) will now
parse the component properly.
For sfeed:
- Parse the baseURI only once (no need to do it every time for making absolute
URIs).
- If a link/enclosure is absolute already or if there is no base URL specified
then just print the link directly. There have also been other small performance
improvements related to handling URIs.
References:
- https://tools.ietf.org/html/rfc3986
- Section "5.2.2. Transform References" have also been helpful.
|
|
|
|
|
|
Fields with multiple values are separated by '|'. In the future multiple
enclosure support might be added.
The categories tags are now parsed. This feature is useful for filtering and
categorizing.
Parsing of nested tags such as <author><name> has been improved. This code has
been refactored.
RSS <guid> isPermaLink is now handled differently also and will now prefer a
permalink with "true" (link) over the ID. In practise multiple <guid> in an
item does not happen.
|
|
|
|
In particular for RSS feeds where a pubDate is optional.
|
|
Document it so it can be relied upon in scripts.
|
|
this program does not store anything, but just write to stdout.
|
|
This is useful for example for podcasts (audio attachment), newsposts (usually
some image) or comic strips (link to page, image as enclosure).
thanks leot for the feedback!
|
|
|
|
- fix new warning check (start sentence at each line).
- improve a few words.
|
|
link to sfeed(5) in README to avoid having to duplicate documentation
text.
|
|
Remove type of feed per item, it is not that interesting. sfeed(1) can parse
both RSS and Atom feeds.
|
|
separate sfeed(5) page for just the feed file format.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Clarity CAVEAT concerning timezone parsing, some feeds incorrectly
use non-RFC-822 timezones in RSS feeds, this will be interpreted as
UTC+0. The formatted time will contain this timezone but without an
offset.
|
|
|
|
and regenerate old man-style (make doc-oldman).
|
|
- mandoc: fix mandoc errors and warnings.
- remove pre-generated HTML documentation.
|
|
|
|
|
|
|
|
Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>
|
|
Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>
|
|
Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>
|
|
Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>
|