summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-07-25man page improvementsHiltjo Posthuma
- Some rewording and typo fixes. - Specify in more detail how sfeed_web detects links from HTML code.
2021-07-24sfeed_{web,xmlenc}.1: use my site as an exampleHiltjo Posthuma
2021-07-22sfeed_update.1: just use ~/ instead of $HOME consistently in examplesHiltjo Posthuma
2021-07-19code-style: change gmtime to the reentrant/thread-safe gmtime_rHiltjo Posthuma
No functional or performance difference (intended) because these programs are not threaded.
2021-07-11sfeed.c: parsetime: support short digit years for RSS pubDate fields (RFC822)Hiltjo Posthuma
RSS (pubDate) uses RFC822 dates. This standard is obsoleted by RFC2822. The RSS 2.0 spec says for the pubDate field: "[...] All date-times in RSS conform to the Date and Time Specification of RFC 822, with the exception that the year may be expressed with two characters or four characters (four preferred)." RFC822 section 5.1 describes the syntax with 2 digit years: https://datatracker.ietf.org/doc/html/rfc822#section-5.1 It was obsoleted/fixed in RFC2822 section 4.3: https://datatracker.ietf.org/doc/html/rfc2822#section-4.3 " Where a two or three digit year occurs in a date, the year is to be interpreted as follows: If a two digit year is encountered whose value is between 00 and 49, the year is interpreted by adding 2000, ending up with a value between 2000 and 2049. If a two digit year is encountered with a value between 50 and 99, or any three digit year is encountered, the year is interpreted by adding 1900." In the real world I've seen all sites using RSS use the 4-digit format. For historic context of changes and what feeds it might affect: - RFC822 was published in 13 august 1982, obsoleted by RFC2822. - RFC2822 was published in april 2001, obsoleted by RFC5322. - RFC5322 was published in october 2008. - RDF was started around 1996. It was published around 2004. - March 15, 1999: RSS 0.90 (Netscape), published by Netscape and authored by Ramanathan Guha. - July 10, 1999: RSS 0.91 (Netscape), published by Netscape and authored by Dan Libby. - June 9, 2000: RSS 0.91 (UserLand), published by UserLand Software and authored by Dave Winer. - Dec. 25, 2000: RSS 0.92, UserLand. - Aug. 19, 2002: RSS 2.0, UserLand. - July 15, 2003: RSS 2.0 (version 2.0.1), published by the Berkman Center for Internet & Society at Harvard Law School and authored by Dave Winer. - July 15, 2003: RSS 2.0 (version 2.0.1-rv-1), published by the RSS Advisory Board. - July 17, 2003: RSS 2.0 (version 2.0.1-rv-2), RSS Advisory Board. - April 6, 2004: RSS 2.0 (version 2.0.1-rv-3), RSS Advisory Board. - May 31, 2004: RSS 2.0 (version 2.0.1-rv-4), RSS Advisory Board. - June 19, 2004: RSS 2.0 (version 2.0.1-rv-5), RSS Advisory Board. - January 25, 2005: RSS 2.0 (version 2.0.1-rv-6), RSS Advisory Board. - Aug. 12, 2006: RSS 2.0 (version 2.0.8), RSS Advisory Board. - June 5, 2007: RSS 2.0 (version 2.0.9), RSS Advisory Board. - Oct. 15, 2007: RSS 2.0 (version 2.0.10), RSS Advisory Board. - March 30, 2009 (current): RSS 2.0 (version 2.0.11), RSS Advisory Board. RSS history source: https://www.rssboard.org/rss-history
2021-07-10bump version to 0.9.25Hiltjo Posthuma
2021-07-07sfeed_web.1: fix typo: url -> URLHiltjo Posthuma
2021-07-06sfeed_mbox: add option to print contentHiltjo Posthuma
- Add SFEED_MBOX_CONTENT environment option. When set to "1" it outputs the content aswell. This is disabled by default for security reasons, because many clients handle HTML in an insecure way. - Print link and enclosure on one line and align them.
2021-07-06sfeedrc.5: add an example how to override the options in the man page aswellHiltjo Posthuma
2021-07-06sfeed.{1,5}: number fields in the man pageHiltjo Posthuma
This makes it slightly easier to lookup fields and map the fields by field number in scripts (awk, cut) etc.
2021-07-06README.xml: remove newline before EOFHiltjo Posthuma
2021-07-06README: add a simplified version of printing the first enclosureHiltjo Posthuma
This works on sfeed(5) feed output since they are already sorted.
2021-07-06sfeed: change comment which reflects printing relative URLs behaviourHiltjo Posthuma
This URL printing behaviour was changed recently in commit f305b032bc19b4e81c0dd6c0398370028ea910ca
2021-07-06sfeed: printtrimmed function does not change or modify the bufferHiltjo Posthuma
Make it const char *.
2021-06-05README: fix typo in a commentHiltjo Posthuma
2021-06-05Makefile: switch to use CPPFLAGS -D_DEFAULT_SOURCEHiltjo Posthuma
This fixes a warning on Linux glibc: /usr/include/features.h:187:3: warning: #warning "_BSD_SOURCE and _SVID_SOURCE are deprecated, use _DEFAULT_SOURCE" [-Wcpp] 187 | # warning "_BSD_SOURCE and _SVID_SOURCE are deprecated, use _DEFAULT_SOURCE" | ^~~~~~~ Tested on Void GNU/Linux glibc with gcc. Tested on various other platforms for regressions too namely: OpenBSD, NetBSD, FreeBSD, Void GNU/Linux musl.
2021-06-05README: fix escape sequence which is non-POSIXHiltjo Posthuma
The "\s" escape sequence is non-POSIX and GNU awk gives a warning: gawk: cmd. line:69: warning: escape sequence `\s' treated as plain `s' BSD awk does not give this warning and supports it. Use the POSIX [[:space:]] character class instead. References: - https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html The table in the section "Regular Expressions". - https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap05.html#tag_05
2021-06-03bump version to 0.9.24Hiltjo Posthuma
2021-06-01util.c: err() do not print colon formattedHiltjo Posthuma
Most common-used compilers (gcc, clang) optimize this away though.
2021-06-01sfeed_gopher: unveil: show path when it failedHiltjo Posthuma
2021-06-01portability and standards: add BSD-like err() and errx() functionsHiltjo Posthuma
These are BSD functions. - HaikuOS now compiles without having to use libbsd. - Tested on SerenityOS (for fun), which doesn't have these functions (yet). With a small change to support wcwidth() sfeed works on SerenityOS.
2021-05-30sfeed_frames.1/sfeed_html.1: reference the style.css example fileHiltjo Posthuma
2021-05-29sfeed_opml_export: sync loadconfig() function fixes from sfeed_updateHiltjo Posthuma
- Do not show stderr of readlink. - Show the reference to the example sfeedrc (like sfeed_update). - Make the error message a bit shorter. - Fix showing the path if it does not exist, for example: $ sfeed_opml_export "a" readlink: a: No such file or directory Configuration file "" does not exist or is not readable. Now shows: $ sfeed_opml_export "a" Configuration file "a" cannot be read. See sfeedrc.example for an example.
2021-05-27sfeed_frames/sfeed_html: show the total counts and improve the title formatHiltjo Posthuma
This title format now matches the one with sfeed_curses. It shows the count to the most left and makes it more readable imho. It also works better when the titlebar is small.
2021-05-27sfeed_update: fix message when the configuration file does not existHiltjo Posthuma
When sfeed_update was called without using a parameter and it used the default and this path did not exist it would incorrectly print: Configuration file "" does not exist or is not readable. See sfeedrc.example for an example. Make the error message a bit shorter too. This was a partial regression of commit df74ba274c4ea5d9b7388c33500ba601ed0c991d
2021-04-29bump version to 0.9.23Hiltjo Posthuma
2021-04-28Makefile: fix typo in commentHiltjo Posthuma
2021-04-28fixup: a regression with RSS guid, by default ispermalink="true"Hiltjo Posthuma
2021-04-28use the last href attribute value if there are multiple setHiltjo Posthuma
Input to reproduce: <entry> <link href="https://codemadness.org/a" href="https://codemadness.org/b"/> </entry> Old value: "https://codemadness.org/ahttps://codemadness.org/b" New value: "https://codemadness.org/b" same with RSS <enclosure url="" />
2021-04-28add support for old/legacy Atom 0.3 feedsHiltjo Posthuma
This standard was a draft used around 2005-2006. Instead of the fields "published" and "updated" it used "issued" (mandatory field) and "modified" (optional). Add support for them and also in preference of supporting Atom 1.0 and creation dates first. I don't know any real-life examples that still use this though. Some references: - http://rakaz.nl/2005/07/moving-from-atom-03-to-10.html - https://www.dokuwiki.org/syndication (rss_type "atom" parameter value). - https://support.google.com/merchants/answer/160598?hl=en
2021-04-28sfeed.{1,5}: improve documentation, the content-type field can be empty...Hiltjo Posthuma
... if there is no content.
2021-04-28enable unlocked I/O by defaultHiltjo Posthuma
getchar_unlocked is part of POSIX and should be supported by most platforms. On all tested platforms it has a performance benefit, sometimes smallish (<12%), sometimes large (~40%).
2021-04-28README: update newsboat export scriptHiltjo Posthuma
Since newsboat version 2.22 (2020-12-21) it stores the content mime-type of a field so allow to export this. The older entries are empty and will be exported as "html" (even though they might have been plain-text). ... also add the (empty) category field.
2021-04-28improve "ispermalink", "rel" and "type" attribute handling/bufferingHiltjo Posthuma
2021-04-28improve content-type "type" attribute handling/bufferingHiltjo Posthuma
2021-04-27sfeed.c: detect the proper mime-type for XHTMLHiltjo Posthuma
Reference: https://www.w3.org/2003/01/xhtml-mimetype/
2021-04-24fix a comment code-styleHiltjo Posthuma
This fix is very important *ahem*.
2021-03-13bump version to 0.9.22Hiltjo Posthuma
2021-03-12sfeed_web.1, sfeed_xmlenc.1: remove unneeded mdoc escape sequenceHiltjo Posthuma
2021-03-03sfeed_update: return instead of exit in main() on successHiltjo Posthuma
This is useful so the script can be included, call main and then have additional post-main functionality.
2021-03-02README: workaround empty fields with *BSD xargs -0Hiltjo Posthuma
Workaround it by setting the empty "middle" fields to some value. The last field can be empty. Some feeds were incorrectly using the wrong base URL if the `baseurl` field was empty but the encoding field was set. So it incorrectly used the encoding field instead. Only now noticed some feeds were failing because the baseURL is validated since commit f305b032bc19b4e81c0dd6c0398370028ea910ca and returning a non-zero exit status. This doesn't happen with GNU xargs, busybox or toybox xargs. Affected (atleast): OpenBSD, NetBSD, FreeBSD and DragonFlyBSD xargs which share similar code. Simple way to reproduce the difference: printf 'a\0\0c\0' | xargs -0 echo Prints "a c" on *BSD. Prints "a c" on GNU xargs (and some other implementations).
2021-03-01sfeed_update: fix baseurl substitutionHiltjo Posthuma
Follow-up from a rushed commit: commit 58555779d123be68c0acf9ea898931d656ec6d63 Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Sun Feb 28 13:33:21 2021 +0100 sfeed_update: simplify, use feedurl directly This also make it possible to use non-authoritive URLs as a baseurl, like "magnet:" URLs.
2021-03-01util.c: uri_makeabs: check initial base URI field, not dest `a` (style)Hiltjo Posthuma
No functional difference because the base URI host is copied beforehand.
2021-03-01sfeed.1: reference sfeed_update and sfeedrcHiltjo Posthuma
The shellscript is optional, but reference it in the documentation.
2021-03-01sfeed_update: simplify, use feedurl directlyHiltjo Posthuma
This also make it possible to use non-authoritive URLs as a baseurl, like "magnet:" URLs.
2021-03-01util: improve/refactor URI parsing and formattingHiltjo Posthuma
Removed/rewritten the functions: absuri, parseuri, and encodeuri() for percent-encoding. The functions are now split separately with the following purpose: - uri_format: format struct uri into a string. - uri_hasscheme: quick check if a string is absolute or not. - uri_makeabs: make a URI absolute using a base uri and the original URI. - uri_parse: parse a string into a struct uri. The following URLs are better parsed: - URLs with extra "/"'s in the path prepended are kept as is, no "/" is added either for empty paths. - URLs like "http://codemadness.org" are not changed to "http://codemadness.org/" anymore (paths are kept as is, unless they are non-empty and not start with "/"). - Paths are not percent-encoded anymore. - URLs with userinfo field (username, password) are parsed. like: ftp://user:password@[2001:db8::7]:2121/rfc/rfc1808.txt - Non-authoritive URLs like mailto:some@email.org, magnet URIs, ISBN URIs/urn, like: urn:isbn:0-395-36341-1 are allowed and parsed correctly. - Both local (file:///) and non-local (file://) are supported. - Specifying a base URL with a port will now only use it when the relative URL has no host and port set and follows RFC3986 5.2.2 more closely. - Parsing numeric port: parse as signed long and check <= 0, empty port is allowed. - Parsing URIs containing query, fragment, but no path separator (/) will now parse the component properly. For sfeed: - Parse the baseURI only once (no need to do it every time for making absolute URIs). - If a link/enclosure is absolute already or if there is no base URL specified then just print the link directly. There have also been other small performance improvements related to handling URIs. References: - https://tools.ietf.org/html/rfc3986 - Section "5.2.2. Transform References" have also been helpful.
2021-03-01README: combine bandwidth saving options into one sectionHiltjo Posthuma
Combine E-Tags, If-Modified-Since in one section. Also mention the curl --compression option for typically GZIP decompression. Note that E-Tags were broken in curl <7.73 due to a bug with "weak" e-tags. https://github.com/curl/curl/issues/5610 From a question/feedback by e-mail from Hadrien Lacour, thanks.
2021-02-05sfeed_update: $SFEED_UPDATE_INCLUDE: be a bit more precise/pedanticHiltjo Posthuma
2021-02-04sfeed.c: fix time parsing regression with non-standard date formatHiltjo Posthuma
The commit that introduced the regression was: commit 33c50db302957bca2a850ac8d0b960d05ee0520e Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Mon Oct 12 18:55:35 2020 +0200 simplify time parsing Noticed on a RSS feed with the following date: <pubDate>2021-02-03 05:13:03</pubDate> This format is non-standard, but sfeed should support this. A standard format would be (for Atom): 2021-02-03T05:13:03Z Partially revert it.
2021-01-28README: fix xargs -P example when there are no feedsHiltjo Posthuma
Kindof a non-issue but if theres a sfeedrc with no feeds then xargs will still be executed and give an error. The xargs -r option (GNU extension) fixes this: From the OpenBSD xargs(1) man page: "-r Do not run the command if there are no arguments. Normally the command is executed at least once even if there are no arguments." Reproducable with the sfeedrc: feeds() { true }