sfeed.git - Suckless rss Feed reader with my configs

Age	Commit message (Collapse)	Author
2021-04-28	fixup: a regression with RSS guid, by default ispermalink="true"	Hiltjo Posthuma

2021-04-28	use the last href attribute value if there are multiple set	Hiltjo Posthuma
	Input to reproduce: <entry> <link href="https://codemadness.org/a" href="https://codemadness.org/b"/> </entry> Old value: "https://codemadness.org/ahttps://codemadness.org/b" New value: "https://codemadness.org/b" same with RSS <enclosure url="" />
2021-04-28	add support for old/legacy Atom 0.3 feeds	Hiltjo Posthuma
	This standard was a draft used around 2005-2006. Instead of the fields "published" and "updated" it used "issued" (mandatory field) and "modified" (optional). Add support for them and also in preference of supporting Atom 1.0 and creation dates first. I don't know any real-life examples that still use this though. Some references: - http://rakaz.nl/2005/07/moving-from-atom-03-to-10.html - https://www.dokuwiki.org/syndication (rss_type "atom" parameter value). - https://support.google.com/merchants/answer/160598?hl=en
2021-04-28	improve "ispermalink", "rel" and "type" attribute handling/buffering	Hiltjo Posthuma

2021-04-28	improve content-type "type" attribute handling/buffering	Hiltjo Posthuma

2021-04-27	sfeed.c: detect the proper mime-type for XHTML	Hiltjo Posthuma
	Reference: https://www.w3.org/2003/01/xhtml-mimetype/
2021-04-24	fix a comment code-style	Hiltjo Posthuma
	This fix is very important ahem.
2021-03-01	util: improve/refactor URI parsing and formatting	Hiltjo Posthuma
	Removed/rewritten the functions: absuri, parseuri, and encodeuri() for percent-encoding. The functions are now split separately with the following purpose: - uri_format: format struct uri into a string. - uri_hasscheme: quick check if a string is absolute or not. - uri_makeabs: make a URI absolute using a base uri and the original URI. - uri_parse: parse a string into a struct uri. The following URLs are better parsed: - URLs with extra "/"'s in the path prepended are kept as is, no "/" is added either for empty paths. - URLs like "http://codemadness.org" are not changed to "http://codemadness.org/" anymore (paths are kept as is, unless they are non-empty and not start with "/"). - Paths are not percent-encoded anymore. - URLs with userinfo field (username, password) are parsed. like: ftp://user:password@[2001:db8::7]:2121/rfc/rfc1808.txt - Non-authoritive URLs like mailto:some@email.org, magnet URIs, ISBN URIs/urn, like: urn:isbn:0-395-36341-1 are allowed and parsed correctly. - Both local (file:///) and non-local (file://) are supported. - Specifying a base URL with a port will now only use it when the relative URL has no host and port set and follows RFC3986 5.2.2 more closely. - Parsing numeric port: parse as signed long and check <= 0, empty port is allowed. - Parsing URIs containing query, fragment, but no path separator (/) will now parse the component properly. For sfeed: - Parse the baseURI only once (no need to do it every time for making absolute URIs). - If a link/enclosure is absolute already or if there is no base URL specified then just print the link directly. There have also been other small performance improvements related to handling URIs. References: - https://tools.ietf.org/html/rfc3986 - Section "5.2.2. Transform References" have also been helpful.
2021-02-04	sfeed.c: fix time parsing regression with non-standard date format	Hiltjo Posthuma
	The commit that introduced the regression was: commit 33c50db302957bca2a850ac8d0b960d05ee0520e Author: Hiltjo Posthuma <hiltjo@codemadness.org> Date: Mon Oct 12 18:55:35 2020 +0200 simplify time parsing Noticed on a RSS feed with the following date: <pubDate>2021-02-03 05:13:03</pubDate> This format is non-standard, but sfeed should support this. A standard format would be (for Atom): 2021-02-03T05:13:03Z Partially revert it.
2021-01-22	sfeed: fix regression with parsing content fields	Hiltjo Posthuma
	This regression introduced in commit e43b7a48 on Tue Oct 6 18:51:33 2020 +0200. After a content tag was parsed the "iscontenttag" variable was not reset. This caused 2 regressions: - It ignored other tags such as links after it. - It incorrectly set the content-type of a lesser priority field. Thanks to pazz0 for reporting it!
2020-10-22	Do not change the referenced matched tag data (from gettag()).	Hiltjo Posthuma
	Fixes a regression introduced in the refactor in commit e43b7a48b08a6bbcb4e730e80395b3257681b33e Now copy the data by value. This structure is small and no performance regression has been seen. This was because the tag ID was modified which made subsequent parsed tags of this type behave strangely: ctx.tag->id = RSSTagGuidPermalinkTrue; Input data to reproduce: <rss> <channel> <item> <guid isPermaLink="false">https://def/</guid> </item> <item> <guid>https://abc/</guid> </item> </channel> </rss>
2020-10-12	add a comment about the intended date priority	Hiltjo Posthuma

2020-10-12	Revert "RSS: give Dublin Core <dc:date> higher priority over <pubDate>"	Hiltjo Posthuma
	This reverts commit a1516cb7869a0dd99ebaacf846ad4161f2b9b9a2.
2020-10-12	simplify time parsing	Hiltjo Posthuma

2020-10-12	remove unneeded check for NUL terminator	Hiltjo Posthuma

2020-10-12	RSS: give Dublin Core <dc:date> higher priority over <pubDate>	Hiltjo Posthuma
	This way dc:date could be the updated time of the item. For Atom there is <published> and <updated> with the same logic.
2020-10-12	parse categories, add multiple field values support (for categories)	Hiltjo Posthuma
	Fields with multiple values are separated by '\|'. In the future multiple enclosure support might be added. The categories tags are now parsed. This feature is useful for filtering and categorizing. Parsing of nested tags such as <author><name> has been improved. This code has been refactored. RSS <guid> isPermaLink is now handled differently also and will now prefer a permalink with "true" (link) over the ID. In practise multiple <guid> in an item does not happen.
2020-10-09	sfeed: parse day with max 2 digits (instead of 4)	Hiltjo Posthuma

2020-10-09	sfeed: support the ISO8601 time format without separators	Hiltjo Posthuma
	For example "19720229T132245Z" is now supported.
2020-10-09	XML cdata callback: handle CDATA as data	Hiltjo Posthuma
	This improves handling CDATA for example in Atom feeds with: <author><email><![CDATA[abc]]><name><![CDATA[[person]]></name></author>
2020-05-28	sfeed: simplify/optimize checking end tags while inside a RSS/Atom tag	Hiltjo Posthuma
	Instead of a binary search do set a pointer to the assigned expected end tag. This makes more sense and is also a minor optimization. No behavioural change intended.
2020-01-24	cleanup some includes	Hiltjo Posthuma

2020-01-18	minor style: use plain int for xml_entitytostr()	Hiltjo Posthuma

2019-10-12	string_append: check for addition and multiplication overflow	Hiltjo Posthuma
	This could overflow / wrap the buffer. Note: SIZE_MAX is defined in POSIX to atleast 65535. On most platforms on 64-bit this is 0xffffffffffffffffUL bytes.
2019-09-05	sfeed.c: fix typo in comment	Hiltjo Posthuma

2019-06-17	sfeed: optimization: xmlattr: when not in some RSS/Atom tag skip further checks	Hiltjo Posthuma

2019-06-11	fix typo in comment	Hiltjo Posthuma

2019-06-11	optimization: only convert entities when we are inside a RSS/Atom tag	Hiltjo Posthuma

2019-06-11	reorder function	Hiltjo Posthuma

2019-06-11	Handle entities in attribute values.	Julian Schweinsberg

2019-05-25	gettzoffset: fix possible arithmetic overflow if int is 16-bit	Hiltjo Posthuma
	also reduce size of return type (32-bit+ should be enough).
2019-05-10	remove unused variables	Hiltjo Posthuma

2019-05-10	sfeed: remove support for military zones and simplify	Hiltjo Posthuma
	see RFC2822 4.3 page 32: " [...] However, because of the error in [RFC822], they SHOULD all be considered equivalent to "-0000" unless there is out-of-band information confirming their meaning. "
2019-05-02	sfeed: improve content type (attribute) handling	Hiltjo Posthuma
	- handle type attribute for MRSS media:description, media:description type="plain" is now parsed properly. - handle default content-types per tag now. - when multiple content-like fields are specified use the proper content-type. - be flexible about type attribute handling. - minor code tweaks.
2019-04-14	sfeed: add support for the first enclosure of an item	Hiltjo Posthuma
	This is useful for example for podcasts (audio attachment), newsposts (usually some image) or comic strips (link to page, image as enclosure). thanks leot for the feedback!
2019-04-06	optimization: define GETNEXT as an inline macro	Hiltjo Posthuma
	This reduces much function call overhead. getnext is defined in xml.h for inline optimization. sfeed only uses one XML parser context per program, this allows further optimizations of the compiler also. On OpenBSD it was noticable because of retpoline etc function call overhead. Using clang and a 500MB test XML file reduces processing time from +- 12s to 5s. Tested using some crazy optimization flags: SFEED_CFLAGS = -O3 -std=c99 -DGETNEXT=getchar_unlocked -fno-ret-protector \ -mno-retpoline -static A GETNEXT macro is also nice for programs which mmap(2) some big XML file. Then you can simply define: #define GETNEXT() (off >= len ? EOF : reg[off++])
2019-04-06	sfeed: gettag: simplify and use ANSI bsearch()	Hiltjo Posthuma

2019-03-03	gettzoffset: bit more strict UTC offset parsing	Hiltjo Posthuma

2019-03-03	skip spaces in parsetime() itself	Hiltjo Posthuma

2019-03-03	sfeed: style, break in switch instead of return	Hiltjo Posthuma
	this style change is useful for my local coverage profile.
2019-02-27	atomlinktype make enum TagId instead of int	Hiltjo Posthuma

2019-02-27	improve RSS2 permalink support	Hiltjo Posthuma
	In RSS2 (but not RSS0.9), a <link> is optional and it can also be specified by <guid isPermaLink="true"> (permalink is "true" by default). When a <link> is also present this will be used instead of the GUID permalink.
2019-02-27	sfeed.c: improve comment	Hiltjo Posthuma

2019-02-24	stricter Atom link parsing	Hiltjo Posthuma
	the Atom link parsing is more strict now and checks the rel attribute. When the rel attribute is empty it is handled as a normal link ("alternate"). This makes sure when an link with an other type is specified (such as "enclosure", "related", "self" or "via") before a link it is not used. sfeed does not handle enclosures, but the code is reworked so it is very simple to add this. Enclosures are often used for example to attach some image to a newspost or an audio file to a podcast.
2019-02-24	fix RFC822 ANSI and military zones parsing	Hiltjo Posthuma

2019-02-08	don't read XML data inside tag for Atom <link href/>	Hiltjo Posthuma
	Noticed in the webcomic "amphibian": http://amphibian.com/feeds/atom
2019-02-08	trim whitespace around uri field value	Hiltjo Posthuma
	... and abstract printing timetamp and uri to string_print_{timestamp,uri} similar to string_print_trimmed (normal string) and string_print_encoded (content). Noticed with whitespace around the field in the webcomic "amphibian": http://amphibian.com/feeds/atom
2019-02-08	short some callback variable names, change "name" to "t" (tag)	Hiltjo Posthuma

2019-01-29	sfeed: use the same handler names as the XMLParser	Hiltjo Posthuma

2018-12-14	sfeed: rename buffer to buf, change entitytostr check, it can never happen	Hiltjo Posthuma