sfeed - Suckless RSS reader

Age	Commit message (Collapse)	Author
2021-01-27	typofixes	Hiltjo Posthuma

2021-01-22	xml.c: fix typo / regression in checking codepoint range for utf-16 ↵	Hiltjo Posthuma
	surrogate pair Regression in commit 12b279581fbbcde2b36eb4b78d70a1c52d4a209a 0xdffff should be 0xdfff. printf '<item><title>👈</title></item>' \| sfeed Before (bad): 👈 After: 👈
2021-01-22	xml.c: do not convert UTF-16 surrogate pairs to an invalid sequence	Hiltjo Posthuma
	Simple way to reproduce: printf '<item><title>&#xdc00;</title></item>' \| sfeed \| iconv -t utf-8 Result: iconv: (stdin):1:8: cannot convert Output result: printf '<item><title>&#xdc00;</title></item>' \| sfeed Before: 00000000 09 ed b0 80 09 09 09 09 09 09 09 0a \|............\| 0000000c After: 00000000 09 26 23 78 64 63 30 30 3b 09 09 09 09 09 09 09 \|.&#xdc00;.......\| 00000010 0a \|.\| 00000011 The entity is output as a literal string. This allows to see more easily whats wrong and debug the feed and it is consistent with the current behaviour of invalid named entities (&bla;). An alternative could be a UTF-8 replacement symbol (codepoint 0xfffd). Reference: https://unicode.org/faq/utf_bom.html , specificly: "Q: How do I convert an unpaired UTF-16 surrogate to UTF-8? " "A: A different issue arises if an unpaired surrogate is encountered when converting ill-formed UTF-16 data. By representing such an unpaired surrogate on its own as a 3-byte sequence, the resulting UTF-8 data stream would become ill-formed. While it faithfully reflects the nature of the input, Unicode conformance requires that encoding form conversion always results in a valid data stream. Therefore a converter must treat this as an error. [AF]"
2020-10-18	xml.c: initialize i = 0	Hiltjo Posthuma
	Forgot it in the cleanup commit 37afcf334fa1ba0b668bde08e8fcaaa9fd7dfa0d
2020-10-09	xml: remove unused code for sfeed	Hiltjo Posthuma

2020-10-09	xml.c: remove buffering of comment data, which is unused anyway	Hiltjo Posthuma

2020-06-01	fix typo	Hiltjo Posthuma

2020-01-24	cleanup some includes	Hiltjo Posthuma

2020-01-18	improve XML entity conversion	Hiltjo Posthuma
	- return -1 for invalid XML entities. - separate between NUL () and invalid entities: although both are unwanted in sfeed. - validate the number range more strictly and don't wrap to unsigned. entities lik: "&#-1;" are handled as invalid now. "&#;" is also invalid instead of the same as "".
2019-11-22	xml.c: upper-case named-entities are invalid in XML	Hiltjo Posthuma
	Named entities are case-sensitive and in XML lower-case. (In HTML some of these are valid. Although &APOS; is invalid there too). References: 4.6 Predefined entities: https://www.w3.org/TR/xml/#sec-predefined-ent In the definition of "match": https://www.w3.org/TR/xml/#dt-match "No case folding is performed."
2019-06-11	xml: improve cdata and comment callback logic	Hiltjo Posthuma
	it used to call both handlers twice at the end for "-->" (comment) or "]]>" (CDATA) with the data "" and length 0. Now it is only called when non-empty. The start and end handlers can still be used.
2019-03-16	xml: write x->getnext to a default GETNEXT macro	Hiltjo Posthuma
	this allows to override x->getnext to expand to global context parsing and allows the compiler to optimize this inline. also remove checking if the x->getnext function exists (just crash hard).
2019-01-08	xml: remove unnecesary checks	Hiltjo Posthuma
	- reduce amount of data to check. - remove unnecesary checks from (now) internal functions.
2018-12-02	XML tag parse improvements for PI and end tags	Hiltjo Posthuma
	- Stricter parsing of tags, no whitespace stripping after <. - For end tags the "internal" context x->tag would be "/sometag". Make sure this matches exactly with the parameter tag. - Reset tagname after parsing an end tag. - Make end tag handling more consistent. - Remove temporary variable taglen.
2018-08-26	xml: use ANSI types and struct initialization	Hiltjo Posthuma
	long is atleast 32-bits, codepointtoutf8() works with >= 32-bit types. Valid codepoint ranges are not larger than this. unsigned char is not needed because converted unicode bytes don't use this range. tested all valid codepoints and output on amd64, i386 and SPARC64.
2018-08-23	xml: remove TODO comments and add a note	Hiltjo Posthuma

2018-08-22	xml: improve parsing of invalid attribute values separated by whitespace	Hiltjo Posthuma
	It is invalid XML, but this allows parsing old HTML pages aswell. For example: <input id=cb checked type="checkbox" title='checkbox' /> or <FONT FACE=wingdings SIZE=12><BLINK>oh hai</BLINK></FONT>
2018-08-22	xml: improve handling of invalid long data entities	Hiltjo Posthuma
	this also fixes an issue with truncating and missing data on invalid input.
2018-08-21	xml: rewrite codepointtoutf8 function	Hiltjo Posthuma
	No more converting to a uint32_t type. Just convert to a byte buffer. Tested on little- and big-endian. The code should be more clear too hopefully.
2018-08-21	xml: don't reset internal tagname when parsing non-tag types like CDATA	Hiltjo Posthuma
	... this affects "tags" starting with < such as CDATA and processing instructions.
2018-08-21	xml: fix missing first byte when parsing a long incorrect attribute entity	Hiltjo Posthuma
	... the entity had to be invalid (start with &) and longer than the buffer size. + tiny style fix.
2018-08-21	xml: interface change: make some functions private	Hiltjo Posthuma
	... this does not expose the uint* types either.
2018-08-21	xml: increase allowed size of attribute names	Hiltjo Posthuma

2018-08-16	XML parser: numeric entity: check unicode codepoint range	Hiltjo Posthuma

2018-03-11	include <sys/types.h> for types size_t, ssize_t etc	Hiltjo Posthuma
	This makes sure xml.c in particular can be compiled without further feature macros.
2018-03-11	xml: improve comment parsing	Hiltjo Posthuma
	note that ---> is officially invalid XML, but we allow it anyway.
2018-03-11	xml: fix parsing of cdata when a handler is unset	Hiltjo Posthuma

2018-03-11	xml: improve CDATA parsing	Hiltjo Posthuma
	thanks Svyatoslav Mishyn for the feedback!
2017-12-24	xml: make name entities static, minor clarifications	Hiltjo Posthuma

2016-04-10	xml: stricter check of entity: must end with ';', ...	Hiltjo Posthuma
	... zero output buffer if codepoint length is 0
2015-08-22	xml: fix includes	Hiltjo Posthuma

2015-08-22	xml: simplify XML reader	Hiltjo Posthuma

2015-08-16	xml: change xml_parse_string to xml_parse_buf	Hiltjo Posthuma
	In the parser itself allow reading '\0' in the XML itself. Add a length parameter to specify the buffer size.
2015-08-14	minor code-style improvements	Hiltjo Posthuma

2015-08-14	xml: whoops, remove leftover xml_getnext_stdin	Hiltjo Posthuma

2015-08-14	xml: separate reader context from parser	Hiltjo Posthuma
	also: - rename xmlparser_ prefix to xml_. - make xml_parse public, this allows a custom reader like a direct mmap, see: XMLParser.getnext and (optionall) XMLParser.getnext_data. - improve the README text.
2015-08-08	xml: move entity to namedentitystr()	Hiltjo Posthuma

2015-08-06	xml: remove forced __inline__ attribute	Hiltjo Posthuma

2015-08-06	general cleanups	Hiltjo Posthuma

2015-08-01	xml: only allow full uppercase or full lowercase for entities	Hiltjo Posthuma

2015-07-31	xml: fix xml_namedentitytostr loop	Hiltjo Posthuma

2015-07-31	xml: fix missing include strings.h, for strncasecmp	Hiltjo Posthuma

2015-07-29	improve includes (dont include headers in .h), fix build on Linux	Hiltjo Posthuma

2015-07-28	improve code-style and consistency	Hiltjo Posthuma

2015-06-23	xml: fix comment issue, improve cdata and comment while encountering separator	Hiltjo Posthuma

2015-06-22	xml: fix cdata issue	Hiltjo Posthuma

2015-06-21	separate xml specific code into xml.c	Hiltjo Posthuma

2015-06-21	xml.c: fix empty cdata callback	Hiltjo Posthuma

2015-05-16	xml: only call data handler if set	Hiltjo Posthuma

2015-05-16	xml: call parse	Hiltjo Posthuma