1 files changed, 251 insertions, 84 deletions
diff --git a/README b/README
index 40037ab..47d4d76 100644
--- a/README
+++ b/README
@@ -38,7 +38,7 @@ Initial setup:
 	cp sfeedrc.example "$HOME/.sfeed/sfeedrc"
 
 Edit the sfeedrc(5) configuration file and change any RSS/Atom feeds. This file
-is included and evaluated as a shellscript for sfeed_update, so it's functions
+is included and evaluated as a shellscript for sfeed_update, so its functions
 and behaviour can be overridden:
 
 	$EDITOR "$HOME/.sfeed/sfeedrc"
@@ -76,6 +76,7 @@ HTML view (no frames), copy style.css for a default style:
 HTML view with the menu as frames, copy style.css for a default style:
 
 	mkdir -p "$HOME/.sfeed/frames"
+	cp style.css "$HOME/.sfeed/frames/style.css"
 	cd "$HOME/.sfeed/frames" && sfeed_frames $HOME/.sfeed/feeds/*
 
 To automatically update your feeds periodically and format them in a way you
@@ -107,7 +108,8 @@ Optional dependencies
 - POSIX sh(1),
   used by sfeed_update(1) and sfeed_opml_export(1).
 - POSIX utilities such as awk(1) and sort(1),
-  used by sfeed_content(1), sfeed_markread(1) and sfeed_update(1).
+  used by sfeed_content(1), sfeed_markread(1), sfeed_opml_export(1) and
+  sfeed_update(1).
 - curl(1) binary: https://curl.haxx.se/ ,
   used by sfeed_update(1), but can be replaced with any tool like wget(1),
   OpenBSD ftp(1) or hurl(1): https://git.codemadness.org/hurl/
@@ -115,6 +117,8 @@ Optional dependencies
   used by sfeed_update(1). If the text in your RSS/Atom feeds are already UTF-8
   encoded then you don't need this. For a minimal iconv implementation:
   https://git.etalabs.net/cgit/noxcuse/tree/src/iconv.c
+- xargs with support for the -P and -0 option,
+  used by sfeed_update(1).
 - mandoc for documentation: https://mdocml.bsd.lv/
 - curses (typically ncurses), otherwise see minicurses.h,
   used by sfeed_curses(1).
@@ -140,12 +144,12 @@ sfeed supports a subset of XML 1.0 and a subset of:
 
 - Atom 1.0 (RFC 4287): https://datatracker.ietf.org/doc/html/rfc4287
 - Atom 0.3 (draft, historic).
-- RSS 0.91+.
+- RSS 0.90+.
 - RDF (when used with RSS).
 - MediaRSS extensions (media:).
 - Dublin Core extensions (dc:).
 
-Other formats like JSONfeed, twtxt or certain RSS/Atom extensions can be
+Other formats like JSON Feed, twtxt or certain RSS/Atom extensions are
 supported by converting them to RSS/Atom or to the sfeed(5) format directly.
 
 
@@ -153,7 +157,7 @@ OS tested
 ---------
 
 - Linux,
-  compilers: clang, gcc, chibicc, cproc, lacc, pcc, tcc,
+  compilers: clang, gcc, chibicc, cproc, lacc, pcc, scc, tcc,
   libc: glibc, musl.
 - OpenBSD (clang, gcc).
 - NetBSD (with NetBSD curses).
@@ -164,7 +168,7 @@ OS tested
 - Windows (cygwin gcc + mintty, mingw).
 - HaikuOS
 - SerenityOS
-- FreeDOS (djgpp).
+- FreeDOS (djgpp, Open Watcom).
 - FUZIX (sdcc -mz80, with the sfeed parser program).
 
 
@@ -185,6 +189,7 @@ sfeed_curses      - Format feed data (TSV) to a curses interface.
 sfeed_frames      - Format feed data (TSV) to HTML file(s) with frames.
 sfeed_gopher      - Format feed data (TSV) to Gopher files.
 sfeed_html        - Format feed data (TSV) to HTML.
+sfeed_json        - Format feed data (TSV) to JSON Feed.
 sfeed_opml_export - Generate an OPML XML file from a sfeedrc config file.
 sfeed_opml_import - Generate a sfeedrc config file from an OPML XML file.
 sfeed_markread    - Mark items as read/unread, for use with sfeed_curses.
@@ -245,13 +250,13 @@ Find RSS/Atom feed URLs from a webpage:
 
 output example:
 
-	https://codemadness.org/blog/rss.xml	application/rss+xml
-	https://codemadness.org/blog/atom.xml	application/atom+xml
+	https://codemadness.org/atom.xml	application/atom+xml
+	https://codemadness.org/atom_content.xml	application/atom+xml
 
 - - -
 
-Make sure your sfeedrc config file exists, see sfeedrc.example. To update your
-feeds (configfile argument is optional):
+Make sure your sfeedrc config file exists, see the sfeedrc.example file. To
+update your feeds (configfile argument is optional):
 
 	sfeed_update "configfile"
 
@@ -287,10 +292,12 @@ Just like the other format programs included in sfeed you can run it like this:
 
 	sfeed_curses < ~/.sfeed/feeds/xkcd
 
-By default sfeed_curses marks the items of the last day as new/bold. To manage
-read/unread items in a different way a plain-text file with a list of the read
-URLs can be used. To enable this behaviour the path to this file can be
-specified by setting the environment variable $SFEED_URL_FILE to the URL file:
+By default sfeed_curses marks the items of the last day as new/bold. This limit
+might be overridden by setting the environment variable $SFEED_NEW_AGE to the
+desired maximum in seconds. To manage read/unread items in a different way a
+plain-text file with a list of the read URLs can be used. To enable this
+behaviour the path to this file can be specified by setting the environment
+variable $SFEED_URL_FILE to the URL file:
 
 	export SFEED_URL_FILE="$HOME/.sfeed/urls"
 	[ -f "$SFEED_URL_FILE" ] || touch "$SFEED_URL_FILE"
@@ -332,7 +339,7 @@ filtering items per feed. It can be used to shorten URLs, filter away
 advertisements, strip tracking parameters and more.
 
 	# filter fields.
-	# filter(name)
+	# filter(name, url)
 	filter() {
 		case "$1" in
 		"tweakers")
@@ -578,7 +585,7 @@ procmail_maildirs.sh file:
 	mkdir -p "${maildir}/.cache"
 
 	if ! test -r "${procmailconfig}"; then
-		echo "Procmail configuration file \"${procmailconfig}\" does not exist or is not readable." >&2
+		printf "Procmail configuration file \"%s\" does not exist or is not readable.\n" "${procmailconfig}" >&2
 		echo "See procmailrc.example for an example." >&2
 		exit 1
 	fi
@@ -675,8 +682,8 @@ additional metadata from the previous request.
 CDNs blocking requests due to a missing HTTP User-Agent request header
 
 sfeed_update will not send the "User-Agent" header by default for privacy
-reasons.  Some CDNs like Cloudflare don't like this and will block such HTTP
-requests.
+reasons.  Some CDNs like Cloudflare or websites like Reddit.com don't like this
+and will block such HTTP requests.
 
 A custom User-Agent can be set by using the curl -H option, like so:
 
@@ -701,56 +708,6 @@ sfeedrc file and change the curl options "-L --max-redirs 0".
 
 - - -
 
-Shellscript to update feeds in parallel more efficiently using xargs -P.
-
-It creates a queue of the feeds with its settings, then uses xargs to process
-them in parallel using the common, but non-POSIX -P option. This is more
-efficient than the more portable solution in sfeed_update which can stall a
-batch of $maxjobs in the queue if one item is slow.
-
-sfeed_update_xargs shellscript:
-
-	#!/bin/sh
-	# update feeds, merge with old feeds using xargs in parallel mode (non-POSIX).
-	
-	# include script and reuse its functions, but do not start main().
-	SFEED_UPDATE_INCLUDE="1" . sfeed_update
-	# load config file, sets $config.
-	loadconfig "$1"
-	
-	# process a single feed.
-	# args are: config, tmpdir, name, feedurl, basesiteurl, encoding
-	if [ "${SFEED_UPDATE_CHILD}" = "1" ]; then
-		sfeedtmpdir="$2"
-		_feed "$3" "$4" "$5" "$6"
-		exit $?
-	fi
-	
-	# ...else parent mode:
-	
-	# feed(name, feedurl, basesiteurl, encoding)
-	feed() {
-		# workaround: *BSD xargs doesn't handle empty fields in the middle.
-		name="${1:-$$}"
-		feedurl="${2:-http://}"
-		basesiteurl="${3:-${feedurl}}"
-		encoding="$4"
-	
-		printf '%s\0%s\0%s\0%s\0%s\0%s\0' "${config}" "${sfeedtmpdir}" \
-			"${name}" "${feedurl}" "${basesiteurl}" "${encoding}"
-	}
-	
-	# fetch feeds and store in temporary directory.
-	sfeedtmpdir="$(mktemp -d '/tmp/sfeed_XXXXXX')"
-	# make sure path exists.
-	mkdir -p "${sfeedpath}"
-	# print feeds for parallel processing with xargs.
-	feeds | SFEED_UPDATE_CHILD="1" xargs -r -0 -P "${maxjobs}" -L 6 "$(readlink -f "$0")"
-	# cleanup temporary files etc.
-	cleanup
-
-- - -
-
 Shellscript to handle URLs and enclosures in parallel using xargs -P.
 
 This can be used to download and process URLs for downloading podcasts,
@@ -764,7 +721,7 @@ arguments are specified then the data is read from stdin.
 
 	#!/bin/sh
 	# sfeed_download: downloader for URLs and enclosures in sfeed(5) files.
-	# Dependencies: awk, curl, flock, xargs (-P), youtube-dl.
+	# Dependencies: awk, curl, flock, xargs (-P), yt-dlp.
 	
 	cachefile="${SFEED_CACHEFILE:-$HOME/.sfeed/downloaded_urls}"
 	jobs="${SFEED_JOBS:-4}"
@@ -777,14 +734,14 @@ arguments are specified then the data is read from stdin.
 		else
 			s="$2"
 		fi
-		printf '[%s]: %s: %s\n' "$(date +'%H:%M:%S')" "${s}" "$3" >&2
+		printf '[%s]: %s: %s\n' "$(date +'%H:%M:%S')" "${s}" "$3"
 	}
 	
 	# fetch(url, feedname)
 	fetch() {
 		case "$1" in
 		*youtube.com*)
-			youtube-dl "$1";;
+			yt-dlp "$1";;
 		*.flac|*.ogg|*.m3u|*.m3u8|*.m4a|*.mkv|*.mp3|*.mp4|*.wav|*.webm)
 			# allow 2 redirects, hide User-Agent, connect timeout is 15 seconds.
 			curl -O -L --max-redirs 2 -H "User-Agent:" -f -s --connect-timeout 15 "$1";;
@@ -803,14 +760,13 @@ arguments are specified then the data is read from stdin.
 		if [ "${feedname}" != "-" ]; then
 			mkdir -p "${feedname}"
 			if ! cd "${feedname}"; then
-				log "${feedname}" "${msg}: ${feedname}" "DIR FAIL"
-				exit 1
+				log "${feedname}" "${msg}: ${feedname}" "DIR FAIL" >&2
+				return 1
 			fi
 		fi
 	
 		log "${feedname}" "${msg}" "START"
-		fetch "${url}" "${feedname}"
-		if [ $? = 0 ]; then
+		if fetch "${url}" "${feedname}"; then
 			log "${feedname}" "${msg}" "OK"
 	
 			# append it safely in parallel to the cachefile on a
@@ -819,21 +775,23 @@ arguments are specified then the data is read from stdin.
 			printf '%s\n' "${url}" >> "${cachefile}"
 			) 9>"${lockfile}"
 		else
-			log "${feedname}" "${msg}" "FAIL"
+			log "${feedname}" "${msg}" "FAIL" >&2
+			return 1
 		fi
+		return 0
 	}
 	
 	if [ "${SFEED_DOWNLOAD_CHILD}" = "1" ]; then
 		# Downloader helper for parallel downloading.
 		# Receives arguments: $1 = URL, $2 = title, $3 = feed filename or "-".
-		# It should write the URI to the cachefile if it is succesful.
+		# It should write the URI to the cachefile if it is successful.
 		downloader "$1" "$2" "$3"
 		exit $?
 	fi
 	
 	# ...else parent mode:
 	
-	tmp=$(mktemp)
+	tmp="$(mktemp)" || exit 1
 	trap "rm -f ${tmp}" EXIT
 	
 	[ -f "${cachefile}" ] || touch "${cachefile}"
@@ -963,8 +921,199 @@ TSV format.
 
 - - -
 
-Running custom commands inside the program
-------------------------------------------
+Progress indicator
+------------------
+
+The below sfeed_update wrapper script counts the amount of feeds in a sfeedrc
+config.  It then calls sfeed_update and pipes the output lines to a function
+that counts the current progress. It writes the total progress to stderr.
+Alternative: pv -l -s totallines
+
+	#!/bin/sh
+	# Progress indicator script.
+	
+	# Pass lines as input to stdin and write progress status to stderr.
+	# progress(totallines)
+	progress() {
+		total="$(($1 + 0))" # must be a number, no divide by zero.
+		test "${total}" -le 0 -o "$1" != "${total}" && return
+	LC_ALL=C awk -v "total=${total}" '
+	{
+		counter++;
+		percent = (counter * 100) / total;
+		printf("\033[K") > "/dev/stderr"; # clear EOL
+		print $0;
+		printf("[%s/%s] %.0f%%\r", counter, total, percent) > "/dev/stderr";
+		fflush(); # flush all buffers per line.
+	}
+	END {
+		printf("\033[K") > "/dev/stderr";
+	}'
+	}
+	
+	# Counts the feeds from the sfeedrc config.
+	countfeeds() {
+		count=0
+	. "$1"
+	feed() {
+		count=$((count + 1))
+	}
+		feeds
+		echo "${count}"
+	}
+	
+	config="${1:-$HOME/.sfeed/sfeedrc}"
+	total=$(countfeeds "${config}")
+	sfeed_update "${config}" 2>&1 | progress "${total}"
+
+- - -
+
+Counting unread and total items
+-------------------------------
+
+It can be useful to show the counts of unread items, for example in a
+windowmanager or statusbar.
+
+The below example script counts the items of the last day in the same way the
+formatting tools do:
+
+	#!/bin/sh
+	# Count the new items of the last day.
+	LC_ALL=C awk -F '\t' -v "old=$(($(date +'%s') - 86400))" '
+	{
+		total++;
+	}
+	int($1) >= old {
+		totalnew++;
+	}
+	END {
+		print "New:   " totalnew;
+		print "Total: " total;
+	}' ~/.sfeed/feeds/*
+
+The below example script counts the unread items using the sfeed_curses URL
+file:
+
+	#!/bin/sh
+	# Count the unread and total items from feeds using the URL file.
+	LC_ALL=C awk -F '\t' '
+	# URL file: amount of fields is 1.
+	NF == 1 {
+		u[$0] = 1; # lookup table of URLs.
+		next;
+	}
+	# feed file: check by URL or id.
+	{
+		total++;
+		if (length($3)) {
+			if (u[$3])
+				read++;
+		} else if (length($6)) {
+			if (u[$6])
+				read++;
+		}
+	}
+	END {
+		print "Unread: " (total - read);
+		print "Total:  " total;
+	}' ~/.sfeed/urls ~/.sfeed/feeds/*
+
+- - -
+
+sfeed.c: adding new XML tags or sfeed(5) fields to the parser
+-------------------------------------------------------------
+
+sfeed.c contains definitions to parse XML tags and map them to sfeed(5) TSV
+fields. Parsed RSS and Atom tag names are first stored as a TagId, which is a
+number.  This TagId is then mapped to the output field index.
+
+Steps to modify the code:
+
+* Add a new TagId enum for the tag.
+
+* (optional) Add a new FeedField* enum for the new output field or you can map
+  it to an existing field.
+
+* Add the new XML tag name to the array variable of parsed RSS or Atom
+  tags: rsstags[] or atomtags[].
+
+  These must be defined in alphabetical order, because a binary search is used
+  which uses the strcasecmp() function.
+
+* Add the parsed TagId to the output field in the array variable fieldmap[].
+
+  When another tag is also mapped to the same output field then the tag with
+  the highest TagId number value overrides the mapped field: the order is from
+  least important to high.
+
+* If this defined tag is just using the inner data of the XML tag, then this
+  definition is enough. If it for example has to parse a certain attribute you
+  have to add a check for the TagId to the xmlattr() callback function.
+
+* (optional) Print the new field in the printfields() function.
+
+Below is a patch example to add the MRSS "media:content" tag as a new field:
+
+diff --git a/sfeed.c b/sfeed.c
+--- a/sfeed.c
++++ b/sfeed.c
+@@ -50,7 +50,7 @@ enum TagId {
+ 	RSSTagGuidPermalinkTrue,
+ 	/* must be defined after GUID, because it can be a link (isPermaLink) */
+ 	RSSTagLink,
+-	RSSTagEnclosure,
++	RSSTagMediaContent, RSSTagEnclosure,
+ 	RSSTagAuthor, RSSTagDccreator,
+ 	RSSTagCategory,
+ 	/* Atom */
+@@ -81,7 +81,7 @@ typedef struct field {
+ enum {
+ 	FeedFieldTime = 0, FeedFieldTitle, FeedFieldLink, FeedFieldContent,
+ 	FeedFieldId, FeedFieldAuthor, FeedFieldEnclosure, FeedFieldCategory,
+-	FeedFieldLast
++	FeedFieldMediaContent, FeedFieldLast
+ };
+ 
+ typedef struct feedcontext {
+@@ -137,6 +137,7 @@ static const FeedTag rsstags[] = {
+ 	{ STRP("enclosure"),         RSSTagEnclosure         },
+ 	{ STRP("guid"),              RSSTagGuid              },
+ 	{ STRP("link"),              RSSTagLink              },
++	{ STRP("media:content"),     RSSTagMediaContent      },
+ 	{ STRP("media:description"), RSSTagMediaDescription  },
+ 	{ STRP("pubdate"),           RSSTagPubdate           },
+ 	{ STRP("title"),             RSSTagTitle             }
+@@ -180,6 +181,7 @@ static const int fieldmap[TagLast] = {
+ 	[RSSTagGuidPermalinkFalse] = FeedFieldId,
+ 	[RSSTagGuidPermalinkTrue]  = FeedFieldId, /* special-case: both a link and an id */
+ 	[RSSTagLink]               = FeedFieldLink,
++	[RSSTagMediaContent]       = FeedFieldMediaContent,
+ 	[RSSTagEnclosure]          = FeedFieldEnclosure,
+ 	[RSSTagAuthor]             = FeedFieldAuthor,
+ 	[RSSTagDccreator]          = FeedFieldAuthor,
+@@ -677,6 +679,8 @@ printfields(void)
+ 	string_print_uri(&ctx.fields[FeedFieldEnclosure].str);
+ 	putchar(FieldSeparator);
+ 	string_print_trimmed_multi(&ctx.fields[FeedFieldCategory].str);
++	putchar(FieldSeparator);
++	string_print_trimmed(&ctx.fields[FeedFieldMediaContent].str);
+ 	putchar('\n');
+ 
+ 	if (ferror(stdout)) /* check for errors but do not flush */
+@@ -718,7 +722,7 @@ xmlattr(XMLParser *p, const char *t, size_t tl, const char *n, size_t nl,
+ 	}
+ 
+ 	if (ctx.feedtype == FeedTypeRSS) {
+-		if (ctx.tag.id == RSSTagEnclosure &&
++		if ((ctx.tag.id == RSSTagEnclosure || ctx.tag.id == RSSTagMediaContent) &&
+ 		    isattr(n, nl, STRP("url"))) {
+ 			string_append(&tmpstr, v, vl);
+ 		} else if (ctx.tag.id == RSSTagGuid &&
+
+- - -
+
+Running custom commands inside the sfeed_curses program
+-------------------------------------------------------
 
 Running commands inside the sfeed_curses program can be useful for example to
 sync items or mark all items across all feeds as read. It can be comfortable to
@@ -983,14 +1132,13 @@ or
 		forkexec((char *[]) { "syncnews.sh", NULL }, 1);
 		break;
 
-The specified script should be in $PATH or an absolute path.
+The specified script should be in $PATH or be an absolute path.
 
 Example of a `markallread.sh` shellscript to mark all URLs as read:
 
 	#!/bin/sh
 	# mark all items/URLs as read.
-
-	tmp=$(mktemp)
+	tmp="$(mktemp)" || exit 1
 	(cat ~/.sfeed/urls; cut -f 3 ~/.sfeed/feeds/*) | \
 	awk '!x[$0]++' > "$tmp" &&
 	mv "$tmp" ~/.sfeed/urls &&
@@ -999,7 +1147,23 @@ Example of a `markallread.sh` shellscript to mark all URLs as read:
 Example of a `syncnews.sh` shellscript to update the feeds and reload them:
 
 	#!/bin/sh
-	sfeed_update && pkill -SIGHUP sfeed_curses
+	sfeed_update
+	pkill -SIGHUP sfeed_curses
+
+
+Running programs in a new session
+---------------------------------
+
+By default processes are spawned in the same session and process group as
+sfeed_curses.  When sfeed_curses is closed this can also close the spawned
+process in some cases.
+
+When the setsid command-line program is available the following wrapper command
+can be used to run the program in a new session, for a plumb program:
+
+	setsid -f xdg-open "$@"
+
+Alternatively the code can be changed to call setsid() before execvp().
 
 
 Open an URL directly in the same terminal
@@ -1030,6 +1194,9 @@ testing sfeed_curses.  Some of them might be fixed already upstream:
   middle-button, right-button is incorrect / reversed.
 - putty: the full reset attribute (ESC c, typically `rs1`) does not reset the
   window title.
+- Mouse button encoding for extended buttons (like side-buttons) in some
+  terminals are unsupported or map to the same button: for example side-buttons 7
+  and 8 map to the scroll buttons 4 and 5 in urxvt.
 
 
 License