sfeed - Suckless RSS reader

Age	Commit message (Collapse)	Author
2022-02-06	add compile-time option to improve output on dumb non-UTF8 terminals	Hiltjo Posthuma
	This makes atleast feeds with simple ASCII work.
2022-02-04	improve some code comments	Hiltjo Posthuma

2022-01-19	fix inconsistencies in comments	Hiltjo Posthuma

2022-01-14	util: strtotime: expand on comment about 2038-readiness	Hiltjo Posthuma
	Also tested on MIPS32BE which has 32-bit time_t and which wraps the time value.
2022-01-14	util: parsetime(): fix comment, long long supports atleast 64-bit range	Hiltjo Posthuma

2021-11-26	sfeed_curses: reuse some functions in util	Hiltjo Posthuma

2021-06-01	util.c: err() do not print colon formatted	Hiltjo Posthuma
	Most common-used compilers (gcc, clang) optimize this away though.
2021-06-01	portability and standards: add BSD-like err() and errx() functions	Hiltjo Posthuma
	These are BSD functions. - HaikuOS now compiles without having to use libbsd. - Tested on SerenityOS (for fun), which doesn't have these functions (yet). With a small change to support wcwidth() sfeed works on SerenityOS.
2021-03-01	util.c: uri_makeabs: check initial base URI field, not dest `a` (style)	Hiltjo Posthuma
	No functional difference because the base URI host is copied beforehand.
2021-03-01	util: improve/refactor URI parsing and formatting	Hiltjo Posthuma
	Removed/rewritten the functions: absuri, parseuri, and encodeuri() for percent-encoding. The functions are now split separately with the following purpose: - uri_format: format struct uri into a string. - uri_hasscheme: quick check if a string is absolute or not. - uri_makeabs: make a URI absolute using a base uri and the original URI. - uri_parse: parse a string into a struct uri. The following URLs are better parsed: - URLs with extra "/"'s in the path prepended are kept as is, no "/" is added either for empty paths. - URLs like "http://codemadness.org" are not changed to "http://codemadness.org/" anymore (paths are kept as is, unless they are non-empty and not start with "/"). - Paths are not percent-encoded anymore. - URLs with userinfo field (username, password) are parsed. like: ftp://user:password@[2001:db8::7]:2121/rfc/rfc1808.txt - Non-authoritive URLs like mailto:some@email.org, magnet URIs, ISBN URIs/urn, like: urn:isbn:0-395-36341-1 are allowed and parsed correctly. - Both local (file:///) and non-local (file://) are supported. - Specifying a base URL with a port will now only use it when the relative URL has no host and port set and follows RFC3986 5.2.2 more closely. - Parsing numeric port: parse as signed long and check <= 0, empty port is allowed. - Parsing URIs containing query, fragment, but no path separator (/) will now parse the component properly. For sfeed: - Parse the baseURI only once (no need to do it every time for making absolute URIs). - If a link/enclosure is absolute already or if there is no base URL specified then just print the link directly. There have also been other small performance improvements related to handling URIs. References: - https://tools.ietf.org/html/rfc3986 - Section "5.2.2. Transform References" have also been helpful.
2021-01-27	typofixes	Hiltjo Posthuma

2021-01-09	printutf8pad: fix byte-seek issue with negative width codepoints in the ↵	Hiltjo Posthuma
	range >= 127 For example: "\xef\xbf\xb7" (codepoint 0xfff7), returns wcwidth(wc) == -1. The next byte was incorrected seeked, but the codepoint itself was valid (mbtowc).
2021-01-09	printutf8pad: small code-style/clarify changes	Hiltjo Posthuma

2021-01-08	util.c: printutf8pad(): improve padded printing and printing invalid unicode ↵	Hiltjo Posthuma
	characters This affects sfeed_plain. - Use unicode replacement character (codepoint 0xfffd) when a codepoint is invalid and proceed printing the rest of the characters. - When a codepoint is invalid reset the internal state of mbtowc(3), from the OpenBSD man page: " If a call to mbtowc() resulted in an undefined internal state, mbtowc() must be called with s set to NULL to reset the internal state before it can safely be used again." - Optimize for the common ASCII case and use a macro to print the character instead of a wasteful fwrite() function call. With 250k lines (+- 350MB) this improves printing performance from 1.7s to 1.0s on my laptop. On an other system it improved by +- 25%. Tested with clang and gcc and also tested the worst-case (non-ASCII) with no penalty. To test: printf '0\tabc\xc3 def' \| sfeed_plain Before: 1970-01-01 01:00 abc After: 1970-01-01 01:00 abc� def
2021-01-08	xmlencode: optimize common character output function	Hiltjo Posthuma
	Use putc instead of fputc, it can be optimized to macros. From the OpenBSD man page: " putc() acts essentially identically to fputc(), but is a macro that expands in-line. It may evaluate stream more than once, so arguments given to putc() should not be expressions with potential side effects." sfeed_atom, sfeed_frames and sfeed_html are using this function. Mini-benchmarked sfeed_html and it went from 1.45s to 1.0s with feed files in total 250k lines (+- 350MB). Tested with clang and gcc on OpenBSD on an older laptop.
2020-10-12	remove unneeded check for NUL terminator	Hiltjo Posthuma

2020-05-27	util: encodeuri: simplify condition	Hiltjo Posthuma
	iscntrl is c < ' ' \|\| c == 127 I want to encode a space and everything above 127 also. So this condition can be simplified to this.
2020-04-01	util: improve/cleanup parseline()	Hiltjo Posthuma
	- remove a check that has no use/can never happen. - remove the return value as it's unused and the input size is known. - fix an old comment that doesn't reflect what the function does anymore.
2020-03-10	sfeed_plain: optimize utf8-decoding and column position calculation	Hiltjo Posthuma
	Optimize for the common-case: assuming ASCII. The input is assumed to be valid UTF-8 input (output of sfeed). This saves 2 function calls for determining the width of a single ASCII character, which of course is 1. Ranges: < 32 are control-characters and are skipped. < 127 is typical ASCII and is 1 column wide. >= 127 is the normal path (control-character and multi-byte UTF-8). Tested on OpenBSD and Linux with various compilers (clang, gcc, pcc and tcc). On OpenBSD and Linux glibc much improvement. On Linux musl (almost) no change. In a common-case upto 40% performance improvement. In the worst-case negligible performance degration (<1%).
2020-01-24	cleanup some includes	Hiltjo Posthuma

2019-04-25	util: small code-style fix	Hiltjo Posthuma

2019-04-21	util: keep brackets when parsing IPv6 addresses	Hiltjo Posthuma

2019-04-15	util: remove unneeded err.h header	Hiltjo Posthuma

2019-04-06	util: remove unnecesary cast and initialization	Hiltjo Posthuma

2019-03-08	util: pedantic snprintf improvement	Hiltjo Posthuma
	POSIX says about snprintf: "If an output error was encountered, these functions shall return a negative value". So check for < 0 instead of -1. Afaik all implementations return -1 though.
2019-02-27	util: parseuri: fix typo in cast (ssize_t)	Hiltjo Posthuma

2018-09-07	util.c: remove remaining uint8_t type, we assume a sane CHAR_BIT == 8	Hiltjo Posthuma

2018-09-07	fix many undefined behaviour in usage of ctype functions	Hiltjo Posthuma
	- cast all ctype(3) function argument to (unsigned char) to avoid UB POSIX says: "The c argument is an int, the value of which the application shall ensure is a character representable as an unsigned char or equal to the value of the macro EOF. If the argument has any other value, the behavior is undefined." Many libc cast implicitly the value, but NetBSD does not, which is probably the correct thing to interpret it. - no need to cast for putchar + rename some fputc(..., stdout) to putchar POSIX says: "The fputc() function shall write the byte specified by c (converted to an unsigned char) to the output stream pointed to by stream [...]" Major thanks to Leonardo Taccari <iamleot@gmail.com> for reporting and testing it on NetBSD!
2018-06-24	util: printutf8pad: proper counting of multiwidth characters	Hiltjo Posthuma
	for example the string "\xef\xbc\xb5".
2018-02-18	util: improve a cast	Hiltjo Posthuma

2018-02-16	util.c: parseuri(): fix incorrect NUL termination for IPv6 addresses	Hiltjo Posthuma

2017-12-09	sfeed_mbox: move murmur to this file, cleanup	Hiltjo Posthuma

2017-06-29	improve printutf8pad for sfeed_plain	Hiltjo Posthuma
	- use a UTF-8 ellipses (1 column width) for "...". - do proper truncation at the specified length.
2017-04-27	compatiblity with browsers: use numeric entity for apos	Hiltjo Posthuma
	this entity is XHTML, it is not supported by some (older) browsers.
2016-08-06	add USE_PLEDGE, remove pledge dummy function	Hiltjo Posthuma

2016-04-10	absuri, encodeuri: make encodeuri static, change argument order	Hiltjo Posthuma

2016-04-10	util: standard pattern to check for valid number strtoul	Hiltjo Posthuma

2016-04-10	remove basename, just use last part of the path...	Hiltjo Posthuma
	... as a bonus it also saves an allocation.
2016-04-10	add comment for strtotime	Hiltjo Posthuma

2016-04-10	strtotime: improve	Hiltjo Posthuma
	use long long: atleast 32-bit, but now time_t (real) to 32-bit or 64-bit is supported. Long long is C99 though, but that is fine. check errno, it can have ERANGE.
2016-03-10	remove cast of unused variables	Hiltjo Posthuma

2016-02-28	util: simplify encodehex, use inline	Hiltjo Posthuma

2016-02-27	various improvements	Hiltjo Posthuma
	- pledge tools and add define to enable it on platforms that support it, currently only OpenBSD 5.9+ - separate getline and parseline functionality. - use murmur3 hash instead of jenkins1: faster and less collisions. - make some error messages a bit more clear, for example with path truncation. - some small cleanups, move printutf8pad to util.
2015-08-22	util: absuri: simplify + fix port in url with prefix "//"	Hiltjo Posthuma
	use the port specified in the link for urls starting with "//" (use protocol).
2015-08-22	util: absuri handle port separately	Hiltjo Posthuma

2015-08-22	util: support ipv6 address, parse port separately	Hiltjo Posthuma

2015-08-16	code-style, wrap some lines, etc	Hiltjo Posthuma

2015-08-10	util: parseuri: nul-terminate, bug introduced by ↵	Hiltjo Posthuma
	7f11ef506465896705f15c39bd0416d96ca651a8
2015-08-08	util: just zero strings by null-terminating first byte	Hiltjo Posthuma

2015-08-07	util: strtotime: stricter time parsing	Hiltjo Posthuma
	as input: an empty string or non-digit characters are digits are considered an error now. Still, for the format tools output the formatted time string as time_t 0 on a parse error.