README (29220B) - raw


      1 sfeed
      2 -----
      3 
      4 RSS and Atom parser (and some format programs).
      5 
      6 It converts RSS or Atom feeds from XML to a TAB-separated file. There are
      7 formatting programs included to convert this TAB-separated format to various
      8 other formats. There are also some programs and scripts included to import and
      9 export OPML and to fetch, filter, merge and order feed items.
     10 
     11 
     12 Build and install
     13 -----------------
     14 
     15 $ make
     16 # make install
     17 
     18 
     19 To build sfeed without sfeed_curses set SFEED_CURSES to an empty string:
     20 
     21 $ make SFEED_CURSES=""
     22 # make SFEED_CURSES="" install
     23 
     24 
     25 To change the theme for sfeed_curses you can set SFEED_THEME.  See the themes/
     26 directory for the theme names.
     27 
     28 $ make SFEED_THEME="templeos"
     29 # make SFEED_THEME="templeos" install
     30 
     31 
     32 Usage
     33 -----
     34 
     35 Initial setup:
     36 
     37 	mkdir -p "$HOME/.sfeed/feeds"
     38 	cp sfeedrc.example "$HOME/.sfeed/sfeedrc"
     39 
     40 Edit the sfeedrc(5) configuration file and change any RSS/Atom feeds. This file
     41 is included and evaluated as a shellscript for sfeed_update, so it's functions
     42 and behaviour can be overridden:
     43 
     44 	$EDITOR "$HOME/.sfeed/sfeedrc"
     45 
     46 or you can import existing OPML subscriptions using sfeed_opml_import(1):
     47 
     48 	sfeed_opml_import < file.opml > "$HOME/.sfeed/sfeedrc"
     49 
     50 an example to export from an other RSS/Atom reader called newsboat and import
     51 for sfeed_update:
     52 
     53 	newsboat -e | sfeed_opml_import > "$HOME/.sfeed/sfeedrc"
     54 
     55 an example to export from an other RSS/Atom reader called rss2email (3.x+) and
     56 import for sfeed_update:
     57 
     58 	r2e opmlexport | sfeed_opml_import > "$HOME/.sfeed/sfeedrc"
     59 
     60 Update feeds, this script merges the new items, see sfeed_update(1) for more
     61 information what it can do:
     62 
     63 	sfeed_update
     64 
     65 Format feeds:
     66 
     67 Plain-text list:
     68 
     69 	sfeed_plain $HOME/.sfeed/feeds/* > "$HOME/.sfeed/feeds.txt"
     70 
     71 HTML view (no frames), copy style.css for a default style:
     72 
     73 	cp style.css "$HOME/.sfeed/style.css"
     74 	sfeed_html $HOME/.sfeed/feeds/* > "$HOME/.sfeed/feeds.html"
     75 
     76 HTML view with the menu as frames, copy style.css for a default style:
     77 
     78 	mkdir -p "$HOME/.sfeed/frames"
     79 	cd "$HOME/.sfeed/frames" && sfeed_frames $HOME/.sfeed/feeds/*
     80 
     81 To automatically update your feeds periodically and format them in a way you
     82 like you can make a wrapper script and add it as a cronjob.
     83 
     84 Most protocols are supported because curl(1) is used by default and also proxy
     85 settings from the environment (such as the $http_proxy environment variable)
     86 are used.
     87 
     88 The sfeed(1) program itself is just a parser that parses XML data from stdin
     89 and is therefore network protocol-agnostic. It can be used with HTTP, HTTPS,
     90 Gopher, SSH, etc.
     91 
     92 See the section "Usage and examples" below and the man-pages for more
     93 information how to use sfeed(1) and the additional tools.
     94 
     95 
     96 Dependencies
     97 ------------
     98 
     99 - C compiler (C99).
    100 - libc (recommended: C99 and POSIX >= 200809).
    101 
    102 
    103 Optional dependencies
    104 ---------------------
    105 
    106 - POSIX make(1) for the Makefile.
    107 - POSIX sh(1),
    108   used by sfeed_update(1) and sfeed_opml_export(1).
    109 - POSIX utilities such as awk(1) and sort(1),
    110   used by sfeed_content(1), sfeed_markread(1) and sfeed_update(1).
    111 - curl(1) binary: https://curl.haxx.se/ ,
    112   used by sfeed_update(1), but can be replaced with any tool like wget(1),
    113   OpenBSD ftp(1) or hurl(1): https://git.codemadness.org/hurl/
    114 - iconv(1) command-line utilities,
    115   used by sfeed_update(1). If the text in your RSS/Atom feeds are already UTF-8
    116   encoded then you don't need this. For a minimal iconv implementation:
    117   https://git.etalabs.net/cgit/noxcuse/tree/src/iconv.c
    118 - mandoc for documentation: https://mdocml.bsd.lv/
    119 - curses (typically ncurses), otherwise see minicurses.h,
    120   used by sfeed_curses(1).
    121 - a terminal (emulator) supporting UTF-8 and the used capabilities,
    122   used by sfeed_curses(1).
    123 
    124 
    125 Optional run-time dependencies for sfeed_curses
    126 -----------------------------------------------
    127 
    128 - xclip for yanking the URL or enclosure. See $SFEED_YANKER to change it.
    129 - xdg-open, used as a plumber by default. See $SFEED_PLUMBER to change it.
    130 - awk, used by the sfeed_content and sfeed_markread script.
    131   See the ENVIRONMENT VARIABLES section in the man page to change it.
    132 - lynx, used by the sfeed_content script to convert HTML content.
    133   See the ENVIRONMENT VARIABLES section in the man page to change it.
    134 
    135 
    136 Formats supported
    137 -----------------
    138 
    139 sfeed supports a subset of XML 1.0 and a subset of:
    140 
    141 - Atom 1.0 (RFC 4287): https://datatracker.ietf.org/doc/html/rfc4287
    142 - Atom 0.3 (draft, historic).
    143 - RSS 0.91+.
    144 - RDF (when used with RSS).
    145 - MediaRSS extensions (media:).
    146 - Dublin Core extensions (dc:).
    147 
    148 Other formats like JSONfeed, twtxt or certain RSS/Atom extensions can be
    149 supported by converting them to RSS/Atom or to the sfeed(5) format directly.
    150 
    151 
    152 OS tested
    153 ---------
    154 
    155 - Linux,
    156   compilers: clang, gcc, chibicc, cproc, lacc, pcc, tcc,
    157   libc: glibc, musl.
    158 - OpenBSD (clang, gcc).
    159 - NetBSD (with NetBSD curses).
    160 - FreeBSD
    161 - DragonFlyBSD
    162 - GNU/Hurd
    163 - Illumos (OpenIndiana).
    164 - Windows (cygwin gcc + mintty, mingw).
    165 - HaikuOS
    166 - SerenityOS
    167 - FreeDOS (djgpp).
    168 - FUZIX (sdcc -mz80, with the sfeed parser program).
    169 
    170 
    171 Architectures tested
    172 --------------------
    173 
    174 amd64, ARM, aarch64, HPPA, i386, MIPS32-BE, RISCV64, SPARC64, Z80.
    175 
    176 
    177 Files
    178 -----
    179 
    180 sfeed             - Read XML RSS or Atom feed data from stdin. Write feed data
    181                     in TAB-separated format to stdout.
    182 sfeed_atom        - Format feed data (TSV) to an Atom feed.
    183 sfeed_content     - View item content, for use with sfeed_curses.
    184 sfeed_curses      - Format feed data (TSV) to a curses interface.
    185 sfeed_frames      - Format feed data (TSV) to HTML file(s) with frames.
    186 sfeed_gopher      - Format feed data (TSV) to Gopher files.
    187 sfeed_html        - Format feed data (TSV) to HTML.
    188 sfeed_opml_export - Generate an OPML XML file from a sfeedrc config file.
    189 sfeed_opml_import - Generate a sfeedrc config file from an OPML XML file.
    190 sfeed_markread    - Mark items as read/unread, for use with sfeed_curses.
    191 sfeed_mbox        - Format feed data (TSV) to mbox.
    192 sfeed_plain       - Format feed data (TSV) to a plain-text list.
    193 sfeed_twtxt       - Format feed data (TSV) to a twtxt feed.
    194 sfeed_update      - Update feeds and merge items.
    195 sfeed_web         - Find URLs to RSS/Atom feed from a webpage.
    196 sfeed_xmlenc      - Detect character-set encoding from a XML stream.
    197 sfeedrc.example   - Example config file. Can be copied to $HOME/.sfeed/sfeedrc.
    198 style.css         - Example stylesheet to use with sfeed_html(1) and
    199                     sfeed_frames(1).
    200 
    201 
    202 Files read at runtime by sfeed_update(1)
    203 ----------------------------------------
    204 
    205 sfeedrc - Config file. This file is evaluated as a shellscript in
    206           sfeed_update(1).
    207 
    208 At least the following functions can be overridden per feed:
    209 
    210 - fetch: to use wget(1), OpenBSD ftp(1) or an other download program.
    211 - filter: to filter on fields.
    212 - merge: to change the merge logic.
    213 - order: to change the sort order.
    214 
    215 See also the sfeedrc(5) man page documentation for more details.
    216 
    217 The feeds() function is called to process the feeds. The default feed()
    218 function is executed concurrently as a background job in your sfeedrc(5) config
    219 file to make updating faster. The variable maxjobs can be changed to limit or
    220 increase the amount of concurrent jobs (8 by default).
    221 
    222 
    223 Files written at runtime by sfeed_update(1)
    224 -------------------------------------------
    225 
    226 feedname     - TAB-separated format containing all items per feed. The
    227                sfeed_update(1) script merges new items with this file.
    228                The format is documented in sfeed(5).
    229 
    230 
    231 File format
    232 -----------
    233 
    234 man 5 sfeed
    235 man 5 sfeedrc
    236 man 1 sfeed
    237 
    238 
    239 Usage and examples
    240 ------------------
    241 
    242 Find RSS/Atom feed URLs from a webpage:
    243 
    244 	url="https://codemadness.org"; curl -L -s "$url" | sfeed_web "$url"
    245 
    246 output example:
    247 
    248 	https://codemadness.org/blog/rss.xml	application/rss+xml
    249 	https://codemadness.org/blog/atom.xml	application/atom+xml
    250 
    251 - - -
    252 
    253 Make sure your sfeedrc config file exists, see sfeedrc.example. To update your
    254 feeds (configfile argument is optional):
    255 
    256 	sfeed_update "configfile"
    257 
    258 Format the feeds files:
    259 
    260 	# Plain-text list.
    261 	sfeed_plain $HOME/.sfeed/feeds/* > $HOME/.sfeed/feeds.txt
    262 	# HTML view (no frames), copy style.css for a default style.
    263 	sfeed_html $HOME/.sfeed/feeds/* > $HOME/.sfeed/feeds.html
    264 	# HTML view with the menu as frames, copy style.css for a default style.
    265 	mkdir -p somedir && cd somedir && sfeed_frames $HOME/.sfeed/feeds/*
    266 
    267 View formatted output in your browser:
    268 
    269 	$BROWSER "$HOME/.sfeed/feeds.html"
    270 
    271 View formatted output in your editor:
    272 
    273 	$EDITOR "$HOME/.sfeed/feeds.txt"
    274 
    275 - - -
    276 
    277 View formatted output in a curses interface.  The interface has a look inspired
    278 by the mutt mail client.  It has a sidebar panel for the feeds, a panel with a
    279 listing of the items and a small statusbar for the selected item/URL. Some
    280 functions like searching and scrolling are integrated in the interface itself.
    281 
    282 Just like the other format programs included in sfeed you can run it like this:
    283 
    284 	sfeed_curses ~/.sfeed/feeds/*
    285 
    286 ... or by reading from stdin:
    287 
    288 	sfeed_curses < ~/.sfeed/feeds/xkcd
    289 
    290 By default sfeed_curses marks the items of the last day as new/bold. To manage
    291 read/unread items in a different way a plain-text file with a list of the read
    292 URLs can be used. To enable this behaviour the path to this file can be
    293 specified by setting the environment variable $SFEED_URL_FILE to the URL file:
    294 
    295 	export SFEED_URL_FILE="$HOME/.sfeed/urls"
    296 	[ -f "$SFEED_URL_FILE" ] || touch "$SFEED_URL_FILE"
    297 	sfeed_curses ~/.sfeed/feeds/*
    298 
    299 It then uses the shellscript "sfeed_markread" to process the read and unread
    300 items.
    301 
    302 - - -
    303 
    304 Example script to view feed items in a vertical list/menu in dmenu(1). It opens
    305 the selected URL in the browser set in $BROWSER:
    306 
    307 	#!/bin/sh
    308 	url=$(sfeed_plain "$HOME/.sfeed/feeds/"* | dmenu -l 35 -i | \
    309 		sed -n 's@^.* \([a-zA-Z]*://\)\(.*\)$@\1\2@p')
    310 	test -n "${url}" && $BROWSER "${url}"
    311 
    312 dmenu can be found at: https://git.suckless.org/dmenu/
    313 
    314 - - -
    315 
    316 Generate a sfeedrc config file from your exported list of feeds in OPML
    317 format:
    318 
    319 	sfeed_opml_import < opmlfile.xml > $HOME/.sfeed/sfeedrc
    320 
    321 - - -
    322 
    323 Export an OPML file of your feeds from a sfeedrc config file (configfile
    324 argument is optional):
    325 
    326 	sfeed_opml_export configfile > myfeeds.opml
    327 
    328 - - -
    329 
    330 The filter function can be overridden in your sfeedrc file. This allows
    331 filtering items per feed. It can be used to shorten URLs, filter away
    332 advertisements, strip tracking parameters and more.
    333 
    334 	# filter fields.
    335 	# filter(name)
    336 	filter() {
    337 		case "$1" in
    338 		"tweakers")
    339 			awk -F '\t' 'BEGIN { OFS = "\t"; }
    340 			# skip ads.
    341 			$2 ~ /^ADV:/ {
    342 				next;
    343 			}
    344 			# shorten link.
    345 			{
    346 				if (match($3, /^https:\/\/tweakers\.net\/[a-z]+\/[0-9]+\//)) {
    347 					$3 = substr($3, RSTART, RLENGTH);
    348 				}
    349 				print $0;
    350 			}';;
    351 		"yt BSDNow")
    352 			# filter only BSD Now from channel.
    353 			awk -F '\t' '$2 ~ / \| BSD Now/';;
    354 		*)
    355 			cat;;
    356 		esac | \
    357 			# replace youtube links with embed links.
    358 			sed 's@www.youtube.com/watch?v=@www.youtube.com/embed/@g' | \
    359 
    360 			awk -F '\t' 'BEGIN { OFS = "\t"; }
    361 			function filterlink(s) {
    362 				# protocol must start with http, https or gopher.
    363 				if (match(s, /^(http|https|gopher):\/\//) == 0) {
    364 					return "";
    365 				}
    366 
    367 				# shorten feedburner links.
    368 				if (match(s, /^(http|https):\/\/[^\/]+\/~r\/.*\/~3\/[^\/]+\//)) {
    369 					s = substr($3, RSTART, RLENGTH);
    370 				}
    371 
    372 				# strip tracking parameters
    373 				# urchin, facebook, piwik, webtrekk and generic.
    374 				gsub(/\?(ad|campaign|fbclid|pk|tm|utm|wt)_([^&]+)/, "?", s);
    375 				gsub(/&(ad|campaign|fbclid|pk|tm|utm|wt)_([^&]+)/, "", s);
    376 
    377 				gsub(/\?&/, "?", s);
    378 				gsub(/[\?&]+$/, "", s);
    379 
    380 				return s
    381 			}
    382 			{
    383 				$3 = filterlink($3); # link
    384 				$8 = filterlink($8); # enclosure
    385 
    386 				# try to remove tracking pixels: <img/> tags with 1px width or height.
    387 				gsub("<img[^>]*(width|height)[[:space:]]*=[[:space:]]*[\"'"'"' ]?1[\"'"'"' ]?[^0-9>]+[^>]*>", "", $4);
    388 
    389 				print $0;
    390 			}'
    391 	}
    392 
    393 - - -
    394 
    395 Aggregate feeds. This filters new entries (maximum one day old) and sorts them
    396 by newest first. Prefix the feed name in the title. Convert the TSV output data
    397 to an Atom XML feed (again):
    398 
    399 	#!/bin/sh
    400 	cd ~/.sfeed/feeds/ || exit 1
    401 
    402 	awk -F '\t' -v "old=$(($(date +'%s') - 86400))" '
    403 	BEGIN {	OFS = "\t"; }
    404 	int($1) >= old {
    405 		$2 = "[" FILENAME "] " $2;
    406 		print $0;
    407 	}' * | \
    408 	sort -k1,1rn | \
    409 	sfeed_atom
    410 
    411 - - -
    412 
    413 To have a "tail(1) -f"-like FIFO stream filtering for new unique feed items and
    414 showing them as plain-text per line similar to sfeed_plain(1):
    415 
    416 Create a FIFO:
    417 
    418 	fifo="/tmp/sfeed_fifo"
    419 	mkfifo "$fifo"
    420 
    421 On the reading side:
    422 
    423 	# This keeps track of unique lines so might consume much memory.
    424 	# It tries to reopen the $fifo after 1 second if it fails.
    425 	while :; do cat "$fifo" || sleep 1; done | awk '!x[$0]++'
    426 
    427 On the writing side:
    428 
    429 	feedsdir="$HOME/.sfeed/feeds/"
    430 	cd "$feedsdir" || exit 1
    431 	test -p "$fifo" || exit 1
    432 
    433 	# 1 day is old news, don't write older items.
    434 	awk -F '\t' -v "old=$(($(date +'%s') - 86400))" '
    435 	BEGIN { OFS = "\t"; }
    436 	int($1) >= old {
    437 		$2 = "[" FILENAME "] " $2;
    438 		print $0;
    439 	}' * | sort -k1,1n | sfeed_plain | cut -b 3- > "$fifo"
    440 
    441 cut -b is used to trim the "N " prefix of sfeed_plain(1).
    442 
    443 - - -
    444 
    445 For some podcast feed the following code can be used to filter the latest
    446 enclosure URL (probably some audio file):
    447 
    448 	awk -F '\t' 'BEGIN { latest = 0; }
    449 	length($8) {
    450 		ts = int($1);
    451 		if (ts > latest) {
    452 			url = $8;
    453 			latest = ts;
    454 		}
    455 	}
    456 	END { if (length(url)) { print url; } }'
    457 
    458 ... or on a file already sorted from newest to oldest:
    459 
    460 	awk -F '\t' '$8 { print $8; exit }'
    461 
    462 - - -
    463 
    464 Over time your feeds file might become quite big. You can archive items of a
    465 feed from (roughly) the last week by doing for example:
    466 
    467 	awk -F '\t' -v "old=$(($(date +'%s') - 604800))" 'int($1) > old' < feed > feed.new
    468 	mv feed feed.bak
    469 	mv feed.new feed
    470 
    471 This could also be run weekly in a crontab to archive the feeds. Like throwing
    472 away old newspapers. It keeps the feeds list tidy and the formatted output
    473 small.
    474 
    475 - - -
    476 
    477 Convert mbox to separate maildirs per feed and filter duplicate messages using the
    478 fdm program.
    479 fdm is available at: https://github.com/nicm/fdm
    480 
    481 fdm config file (~/.sfeed/fdm.conf):
    482 
    483 	set unmatched-mail keep
    484 
    485 	account "sfeed" mbox "%[home]/.sfeed/mbox"
    486 		$cachepath = "%[home]/.sfeed/fdm.cache"
    487 		cache "${cachepath}"
    488 		$maildir = "%[home]/feeds/"
    489 
    490 		# Check if message is in the cache by Message-ID.
    491 		match case "^Message-ID: (.*)" in headers
    492 			action {
    493 				tag "msgid" value "%1"
    494 			}
    495 			continue
    496 
    497 		# If it is in the cache, stop.
    498 		match matched and in-cache "${cachepath}" key "%[msgid]"
    499 			action {
    500 				keep
    501 			}
    502 
    503 		# Not in the cache, process it and add to cache.
    504 		match case "^X-Feedname: (.*)" in headers
    505 			action {
    506 				# Store to local maildir.
    507 				maildir "${maildir}%1"
    508 
    509 				add-to-cache "${cachepath}" key "%[msgid]"
    510 				keep
    511 			}
    512 
    513 Now run:
    514 
    515 	$ sfeed_mbox ~/.sfeed/feeds/* > ~/.sfeed/mbox
    516 	$ fdm -f ~/.sfeed/fdm.conf fetch
    517 
    518 Now you can view feeds in mutt(1) for example.
    519 
    520 - - -
    521 
    522 Read from mbox and filter duplicate messages using the fdm program and deliver
    523 it to a SMTP server. This works similar to the rss2email program.
    524 fdm is available at: https://github.com/nicm/fdm
    525 
    526 fdm config file (~/.sfeed/fdm.conf):
    527 
    528 	set unmatched-mail keep
    529 
    530 	account "sfeed" mbox "%[home]/.sfeed/mbox"
    531 		$cachepath = "%[home]/.sfeed/fdm.cache"
    532 		cache "${cachepath}"
    533 
    534 		# Check if message is in the cache by Message-ID.
    535 		match case "^Message-ID: (.*)" in headers
    536 			action {
    537 				tag "msgid" value "%1"
    538 			}
    539 			continue
    540 
    541 		# If it is in the cache, stop.
    542 		match matched and in-cache "${cachepath}" key "%[msgid]"
    543 			action {
    544 				keep
    545 			}
    546 
    547 		# Not in the cache, process it and add to cache.
    548 		match case "^X-Feedname: (.*)" in headers
    549 			action {
    550 				# Connect to a SMTP server and attempt to deliver the
    551 				# mail to it.
    552 				# Of course change the server and e-mail below.
    553 				smtp server "codemadness.org" to "hiltjo@codemadness.org"
    554 
    555 				add-to-cache "${cachepath}" key "%[msgid]"
    556 				keep
    557 			}
    558 
    559 Now run:
    560 
    561 	$ sfeed_mbox ~/.sfeed/feeds/* > ~/.sfeed/mbox
    562 	$ fdm -f ~/.sfeed/fdm.conf fetch
    563 
    564 Now you can view feeds in mutt(1) for example.
    565 
    566 - - -
    567 
    568 Convert mbox to separate maildirs per feed and filter duplicate messages using
    569 procmail(1).
    570 
    571 procmail_maildirs.sh file:
    572 
    573 	maildir="$HOME/feeds"
    574 	feedsdir="$HOME/.sfeed/feeds"
    575 	procmailconfig="$HOME/.sfeed/procmailrc"
    576 
    577 	# message-id cache to prevent duplicates.
    578 	mkdir -p "${maildir}/.cache"
    579 
    580 	if ! test -r "${procmailconfig}"; then
    581 		echo "Procmail configuration file \"${procmailconfig}\" does not exist or is not readable." >&2
    582 		echo "See procmailrc.example for an example." >&2
    583 		exit 1
    584 	fi
    585 
    586 	find "${feedsdir}" -type f -exec printf '%s\n' {} \; | while read -r d; do
    587 		name=$(basename "${d}")
    588 		mkdir -p "${maildir}/${name}/cur"
    589 		mkdir -p "${maildir}/${name}/new"
    590 		mkdir -p "${maildir}/${name}/tmp"
    591 		printf 'Mailbox %s\n' "${name}"
    592 		sfeed_mbox "${d}" | formail -s procmail "${procmailconfig}"
    593 	done
    594 
    595 Procmailrc(5) file:
    596 
    597 	# Example for use with sfeed_mbox(1).
    598 	# The header X-Feedname is used to split into separate maildirs. It is
    599 	# assumed this name is sane.
    600 
    601 	MAILDIR="$HOME/feeds/"
    602 
    603 	:0
    604 	* ^X-Feedname: \/.*
    605 	{
    606 		FEED="$MATCH"
    607 
    608 		:0 Wh: "msgid_$FEED.lock"
    609 		| formail -D 1024000 ".cache/msgid_$FEED.cache"
    610 
    611 		:0
    612 		"$FEED"/
    613 	}
    614 
    615 Now run:
    616 
    617 	$ procmail_maildirs.sh
    618 
    619 Now you can view feeds in mutt(1) for example.
    620 
    621 - - -
    622 
    623 The fetch function can be overridden in your sfeedrc file. This allows to
    624 replace the default curl(1) for sfeed_update with any other client to fetch the
    625 RSS/Atom data or change the default curl options:
    626 
    627 	# fetch a feed via HTTP/HTTPS etc.
    628 	# fetch(name, url, feedfile)
    629 	fetch() {
    630 		hurl -m 1048576 -t 15 "$2" 2>/dev/null
    631 	}
    632 
    633 - - -
    634 
    635 Caching, incremental data updates and bandwidth-saving
    636 
    637 For servers that support it some incremental updates and bandwidth-saving can
    638 be done by using the "ETag" HTTP header.
    639 
    640 Create a directory for storing the ETags per feed:
    641 
    642 	mkdir -p ~/.sfeed/etags/
    643 
    644 The curl ETag options (--etag-save and --etag-compare) can be used to store and
    645 send the previous ETag header value. curl version 7.73+ is recommended for it
    646 to work properly.
    647 
    648 The curl -z option can be used to send the modification date of a local file as
    649 a HTTP "If-Modified-Since" request header. The server can then respond if the
    650 data is modified or not or respond with only the incremental data.
    651 
    652 The curl --compressed option can be used to indicate the client supports
    653 decompression. Because RSS/Atom feeds are textual XML content this generally
    654 compresses very well.
    655 
    656 These options can be set by overriding the fetch() function in the sfeedrc
    657 file:
    658 
    659 	# fetch(name, url, feedfile)
    660 	fetch() {
    661 		etag="$HOME/.sfeed/etags/$(basename "$3")"
    662 		curl \
    663 			-L --max-redirs 0 -H "User-Agent:" -f -s -m 15 \
    664 			--compressed \
    665 			--etag-save "${etag}" --etag-compare "${etag}" \
    666 			-z "${etag}" \
    667 			"$2" 2>/dev/null
    668 	}
    669 
    670 These options can come at a cost of some privacy, because it exposes
    671 additional metadata from the previous request.
    672 
    673 - - -
    674 
    675 CDNs blocking requests due to a missing HTTP User-Agent request header
    676 
    677 sfeed_update will not send the "User-Agent" header by default for privacy
    678 reasons.  Some CDNs like Cloudflare don't like this and will block such HTTP
    679 requests.
    680 
    681 A custom User-Agent can be set by using the curl -H option, like so:
    682 
    683 	curl -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0'
    684 
    685 The above example string pretends to be a Windows 10 (x86-64) machine running
    686 Firefox 78.
    687 
    688 - - -
    689 
    690 Page redirects
    691 
    692 For security and efficiency reasons by default redirects are not allowed and
    693 are treated as an error.
    694 
    695 For example to prevent hijacking an unencrypted http:// to https:// redirect or
    696 to not add time of an unnecessary page redirect each time.  It is encouraged to
    697 use the final redirected URL in the sfeedrc config file.
    698 
    699 If you want to ignore this advise you can override the fetch() function in the
    700 sfeedrc file and change the curl options "-L --max-redirs 0".
    701 
    702 - - -
    703 
    704 Shellscript to update feeds in parallel more efficiently using xargs -P.
    705 
    706 It creates a queue of the feeds with its settings, then uses xargs to process
    707 them in parallel using the common, but non-POSIX -P option. This is more
    708 efficient than the more portable solution in sfeed_update which can stall a
    709 batch of $maxjobs in the queue if one item is slow.
    710 
    711 sfeed_update_xargs shellscript:
    712 
    713 	#!/bin/sh
    714 	# update feeds, merge with old feeds using xargs in parallel mode (non-POSIX).
    715 	
    716 	# include script and reuse its functions, but do not start main().
    717 	SFEED_UPDATE_INCLUDE="1" . sfeed_update
    718 	# load config file, sets $config.
    719 	loadconfig "$1"
    720 	
    721 	# process a single feed.
    722 	# args are: config, tmpdir, name, feedurl, basesiteurl, encoding
    723 	if [ "${SFEED_UPDATE_CHILD}" = "1" ]; then
    724 		sfeedtmpdir="$2"
    725 		_feed "$3" "$4" "$5" "$6"
    726 		exit $?
    727 	fi
    728 	
    729 	# ...else parent mode:
    730 	
    731 	# feed(name, feedurl, basesiteurl, encoding)
    732 	feed() {
    733 		# workaround: *BSD xargs doesn't handle empty fields in the middle.
    734 		name="${1:-$$}"
    735 		feedurl="${2:-http://}"
    736 		basesiteurl="${3:-${feedurl}}"
    737 		encoding="$4"
    738 	
    739 		printf '%s\0%s\0%s\0%s\0%s\0%s\0' "${config}" "${sfeedtmpdir}" \
    740 			"${name}" "${feedurl}" "${basesiteurl}" "${encoding}"
    741 	}
    742 	
    743 	# fetch feeds and store in temporary directory.
    744 	sfeedtmpdir="$(mktemp -d '/tmp/sfeed_XXXXXX')"
    745 	# make sure path exists.
    746 	mkdir -p "${sfeedpath}"
    747 	# print feeds for parallel processing with xargs.
    748 	feeds | SFEED_UPDATE_CHILD="1" xargs -r -0 -P "${maxjobs}" -L 6 "$(readlink -f "$0")"
    749 	# cleanup temporary files etc.
    750 	cleanup
    751 
    752 - - -
    753 
    754 Shellscript to handle URLs and enclosures in parallel using xargs -P.
    755 
    756 This can be used to download and process URLs for downloading podcasts,
    757 webcomics, download and convert webpages, mirror videos, etc. It uses a
    758 plain-text cache file for remembering processed URLs. The match patterns are
    759 defined in the shellscript fetch() function and in the awk script and can be
    760 modified to handle items differently depending on their context.
    761 
    762 The arguments for the script are files in the sfeed(5) format. If no file
    763 arguments are specified then the data is read from stdin.
    764 
    765 	#!/bin/sh
    766 	# sfeed_download: downloader for URLs and enclosures in sfeed(5) files.
    767 	# Dependencies: awk, curl, flock, xargs (-P), youtube-dl.
    768 	
    769 	cachefile="${SFEED_CACHEFILE:-$HOME/.sfeed/downloaded_urls}"
    770 	jobs="${SFEED_JOBS:-4}"
    771 	lockfile="${HOME}/.sfeed/sfeed_download.lock"
    772 	
    773 	# log(feedname, s, status)
    774 	log() {
    775 		if [ "$1" != "-" ]; then
    776 			s="[$1] $2"
    777 		else
    778 			s="$2"
    779 		fi
    780 		printf '[%s]: %s: %s\n' "$(date +'%H:%M:%S')" "${s}" "$3" >&2
    781 	}
    782 	
    783 	# fetch(url, feedname)
    784 	fetch() {
    785 		case "$1" in
    786 		*youtube.com*)
    787 			youtube-dl "$1";;
    788 		*.flac|*.ogg|*.m3u|*.m3u8|*.m4a|*.mkv|*.mp3|*.mp4|*.wav|*.webm)
    789 			# allow 2 redirects, hide User-Agent, connect timeout is 15 seconds.
    790 			curl -O -L --max-redirs 2 -H "User-Agent:" -f -s --connect-timeout 15 "$1";;
    791 		esac
    792 	}
    793 	
    794 	# downloader(url, title, feedname)
    795 	downloader() {
    796 		url="$1"
    797 		title="$2"
    798 		feedname="${3##*/}"
    799 	
    800 		msg="${title}: ${url}"
    801 	
    802 		# download directory.
    803 		if [ "${feedname}" != "-" ]; then
    804 			mkdir -p "${feedname}"
    805 			if ! cd "${feedname}"; then
    806 				log "${feedname}" "${msg}: ${feedname}" "DIR FAIL"
    807 				exit 1
    808 			fi
    809 		fi
    810 	
    811 		log "${feedname}" "${msg}" "START"
    812 		fetch "${url}" "${feedname}"
    813 		if [ $? = 0 ]; then
    814 			log "${feedname}" "${msg}" "OK"
    815 	
    816 			# append it safely in parallel to the cachefile on a
    817 			# successful download.
    818 			(flock 9 || exit 1
    819 			printf '%s\n' "${url}" >> "${cachefile}"
    820 			) 9>"${lockfile}"
    821 		else
    822 			log "${feedname}" "${msg}" "FAIL"
    823 		fi
    824 	}
    825 	
    826 	if [ "${SFEED_DOWNLOAD_CHILD}" = "1" ]; then
    827 		# Downloader helper for parallel downloading.
    828 		# Receives arguments: $1 = URL, $2 = title, $3 = feed filename or "-".
    829 		# It should write the URI to the cachefile if it is succesful.
    830 		downloader "$1" "$2" "$3"
    831 		exit $?
    832 	fi
    833 	
    834 	# ...else parent mode:
    835 	
    836 	tmp=$(mktemp)
    837 	trap "rm -f ${tmp}" EXIT
    838 	
    839 	[ -f "${cachefile}" ] || touch "${cachefile}"
    840 	cat "${cachefile}" > "${tmp}"
    841 	echo >> "${tmp}" # force it to have one line for awk.
    842 	
    843 	LC_ALL=C awk -F '\t' '
    844 	# fast prefilter what to download or not.
    845 	function filter(url, field, feedname) {
    846 		u = tolower(url);
    847 		return (match(u, "youtube\\.com") ||
    848 		        match(u, "\\.(flac|ogg|m3u|m3u8|m4a|mkv|mp3|mp4|wav|webm)$"));
    849 	}
    850 	function download(url, field, title, filename) {
    851 		if (!length(url) || urls[url] || !filter(url, field, filename))
    852 			return;
    853 		# NUL-separated for xargs -0.
    854 		printf("%s%c%s%c%s%c", url, 0, title, 0, filename, 0);
    855 		urls[url] = 1; # print once
    856 	}
    857 	{
    858 		FILENR += (FNR == 1);
    859 	}
    860 	# lookup table from cachefile which contains downloaded URLs.
    861 	FILENR == 1 {
    862 		urls[$0] = 1;
    863 	}
    864 	# feed file(s).
    865 	FILENR != 1 {
    866 		download($3, 3, $2, FILENAME); # link
    867 		download($8, 8, $2, FILENAME); # enclosure
    868 	}
    869 	' "${tmp}" "${@:--}" | \
    870 	SFEED_DOWNLOAD_CHILD="1" xargs -r -0 -L 3 -P "${jobs}" "$(readlink -f "$0")"
    871 
    872 - - -
    873 
    874 Shellscript to export existing newsboat cached items from sqlite3 to the sfeed
    875 TSV format.
    876 
    877 	#!/bin/sh
    878 	# Export newsbeuter/newsboat cached items from sqlite3 to the sfeed TSV format.
    879 	# The data is split per file per feed with the name of the newsboat title/url.
    880 	# It writes the URLs of the read items line by line to a "urls" file.
    881 	#
    882 	# Dependencies: sqlite3, awk.
    883 	#
    884 	# Usage: create some directory to store the feeds then run this script.
    885 	
    886 	# newsboat cache.db file.
    887 	cachefile="$HOME/.newsboat/cache.db"
    888 	test -n "$1" && cachefile="$1"
    889 	
    890 	# dump data.
    891 	# .mode ascii: Columns/rows delimited by 0x1F and 0x1E
    892 	# get the first fields in the order of the sfeed(5) format.
    893 	sqlite3 "$cachefile" <<!EOF |
    894 	.headers off
    895 	.mode ascii
    896 	.output
    897 	SELECT
    898 		i.pubDate, i.title, i.url, i.content, i.content_mime_type,
    899 		i.guid, i.author, i.enclosure_url,
    900 		f.rssurl AS rssurl, f.title AS feedtitle, i.unread
    901 		-- i.id, i.enclosure_type, i.enqueued, i.flags, i.deleted, i.base
    902 	FROM rss_feed f
    903 	INNER JOIN rss_item i ON i.feedurl = f.rssurl
    904 	ORDER BY
    905 		i.feedurl ASC, i.pubDate DESC;
    906 	.quit
    907 	!EOF
    908 	# convert to sfeed(5) TSV format.
    909 	LC_ALL=C awk '
    910 	BEGIN {
    911 		FS = "\x1f";
    912 		RS = "\x1e";
    913 	}
    914 	# normal non-content fields.
    915 	function field(s) {
    916 		gsub("^[[:space:]]*", "", s);
    917 		gsub("[[:space:]]*$", "", s);
    918 		gsub("[[:space:]]", " ", s);
    919 		gsub("[[:cntrl:]]", "", s);
    920 		return s;
    921 	}
    922 	# content field.
    923 	function content(s) {
    924 		gsub("^[[:space:]]*", "", s);
    925 		gsub("[[:space:]]*$", "", s);
    926 		# escape chars in content field.
    927 		gsub("\\\\", "\\\\", s);
    928 		gsub("\n", "\\n", s);
    929 		gsub("\t", "\\t", s);
    930 		return s;
    931 	}
    932 	function feedname(feedurl, feedtitle) {
    933 		if (feedtitle == "") {
    934 			gsub("/", "_", feedurl);
    935 			return feedurl;
    936 		}
    937 		gsub("/", "_", feedtitle);
    938 		return feedtitle;
    939 	}
    940 	{
    941 		fname = feedname($9, $10);
    942 		if (!feed[fname]++) {
    943 			print "Writing file: \"" fname "\" (title: " $10 ", url: " $9 ")" > "/dev/stderr";
    944 		}
    945 	
    946 		contenttype = field($5);
    947 		if (contenttype == "")
    948 			contenttype = "html";
    949 		else if (index(contenttype, "/html") || index(contenttype, "/xhtml"))
    950 			contenttype = "html";
    951 		else
    952 			contenttype = "plain";
    953 	
    954 		print $1 "\t" field($2) "\t" field($3) "\t" content($4) "\t" \
    955 			contenttype "\t" field($6) "\t" field($7) "\t" field($8) "\t" \
    956 			> fname;
    957 	
    958 		# write URLs of the read items to a file line by line.
    959 		if ($11 == "0") {
    960 			print $3 > "urls";
    961 		}
    962 	}'
    963 
    964 - - -
    965 
    966 Running custom commands inside the program
    967 ------------------------------------------
    968 
    969 Running commands inside the sfeed_curses program can be useful for example to
    970 sync items or mark all items across all feeds as read. It can be comfortable to
    971 have a keybind for this inside the program to perform a scripted action and
    972 then reload the feeds by sending the signal SIGHUP.
    973 
    974 In the input handling code you can then add a case:
    975 
    976 	case 'M':
    977 		forkexec((char *[]) { "markallread.sh", NULL }, 0);
    978 		break;
    979 
    980 or
    981 
    982 	case 'S':
    983 		forkexec((char *[]) { "syncnews.sh", NULL }, 1);
    984 		break;
    985 
    986 The specified script should be in $PATH or an absolute path.
    987 
    988 Example of a `markallread.sh` shellscript to mark all URLs as read:
    989 
    990 	#!/bin/sh
    991 	# mark all items/URLs as read.
    992 
    993 	tmp=$(mktemp)
    994 	(cat ~/.sfeed/urls; cut -f 3 ~/.sfeed/feeds/*) | \
    995 	awk '!x[$0]++' > "$tmp" &&
    996 	mv "$tmp" ~/.sfeed/urls &&
    997 	pkill -SIGHUP sfeed_curses # reload feeds.
    998 
    999 Example of a `syncnews.sh` shellscript to update the feeds and reload them:
   1000 
   1001 	#!/bin/sh
   1002 	sfeed_update && pkill -SIGHUP sfeed_curses
   1003 
   1004 
   1005 Open an URL directly in the same terminal
   1006 -----------------------------------------
   1007 
   1008 To open an URL directly in the same terminal using the text-mode lynx browser:
   1009 
   1010 	SFEED_PLUMBER=lynx SFEED_PLUMBER_INTERACTIVE=1 sfeed_curses ~/.sfeed/feeds/*
   1011 
   1012 
   1013 Yank to tmux buffer
   1014 -------------------
   1015 
   1016 This changes the yank command to set the tmux buffer, instead of X11 xclip:
   1017 
   1018 	SFEED_YANKER="tmux set-buffer \`cat\`"
   1019 
   1020 
   1021 Known terminal issues
   1022 ---------------------
   1023 
   1024 Below lists some bugs or missing features in terminals that are found while
   1025 testing sfeed_curses.  Some of them might be fixed already upstream:
   1026 
   1027 - cygwin + mintty: the xterm mouse-encoding of the mouse position is broken for
   1028   scrolling.
   1029 - HaikuOS terminal: the xterm mouse-encoding of the mouse button number of the
   1030   middle-button, right-button is incorrect / reversed.
   1031 - putty: the full reset attribute (ESC c, typically `rs1`) does not reset the
   1032   window title.
   1033 
   1034 
   1035 License
   1036 -------
   1037 
   1038 ISC, see LICENSE file.
   1039 
   1040 
   1041 Author
   1042 ------
   1043 
   1044 Hiltjo Posthuma <hiltjo@codemadness.org>