sfeed ----- RSS and Atom parser (and some format programs). It converts RSS or Atom feeds from XML to a TAB-separated file. There are formatting programs included to convert this TAB-separated format to various other formats. There are also some programs and scripts included to import and export OPML and to fetch, filter, merge and order feed items. Build and install ----------------- $ make # make install To build sfeed without sfeed_curses set SFEED_CURSES to an empty string: $ make SFEED_CURSES="" # make SFEED_CURSES="" install To change the theme for sfeed_curses you can set SFEED_THEME. See the themes/ directory for the theme names. $ make SFEED_THEME="templeos" # make SFEED_THEME="templeos" install Usage ----- Initial setup: mkdir -p "$HOME/.sfeed/feeds" cp sfeedrc.example "$HOME/.sfeed/sfeedrc" Edit the sfeedrc(5) configuration file and change any RSS/Atom feeds. This file is included and evaluated as a shellscript for sfeed_update, so it's functions and behaviour can be overridden: $EDITOR "$HOME/.sfeed/sfeedrc" or you can import existing OPML subscriptions using sfeed_opml_import(1): sfeed_opml_import < file.opml > "$HOME/.sfeed/sfeedrc" an example to export from an other RSS/Atom reader called newsboat and import for sfeed_update: newsboat -e | sfeed_opml_import > "$HOME/.sfeed/sfeedrc" an example to export from an other RSS/Atom reader called rss2email (3.x+) and import for sfeed_update: r2e opmlexport | sfeed_opml_import > "$HOME/.sfeed/sfeedrc" Update feeds, this script merges the new items, see sfeed_update(1) for more information what it can do: sfeed_update Format feeds: Plain-text list: sfeed_plain $HOME/.sfeed/feeds/* > "$HOME/.sfeed/feeds.txt" HTML view (no frames), copy style.css for a default style: cp style.css "$HOME/.sfeed/style.css" sfeed_html $HOME/.sfeed/feeds/* > "$HOME/.sfeed/feeds.html" HTML view with the menu as frames, copy style.css for a default style: mkdir -p "$HOME/.sfeed/frames" cp style.css "$HOME/.sfeed/frames/style.css" cd "$HOME/.sfeed/frames" && sfeed_frames $HOME/.sfeed/feeds/* To automatically update your feeds periodically and format them in a way you like you can make a wrapper script and add it as a cronjob. Most protocols are supported because curl(1) is used by default and also proxy settings from the environment (such as the $http_proxy environment variable) are used. The sfeed(1) program itself is just a parser that parses XML data from stdin and is therefore network protocol-agnostic. It can be used with HTTP, HTTPS, Gopher, SSH, etc. See the section "Usage and examples" below and the man-pages for more information how to use sfeed(1) and the additional tools. Dependencies ------------ - C compiler (C99). - libc (recommended: C99 and POSIX >= 200809). Optional dependencies --------------------- - POSIX make(1) for the Makefile. - POSIX sh(1), used by sfeed_update(1) and sfeed_opml_export(1). - POSIX utilities such as awk(1) and sort(1), used by sfeed_content(1), sfeed_markread(1) and sfeed_update(1). - curl(1) binary: https://curl.haxx.se/ , used by sfeed_update(1), but can be replaced with any tool like wget(1), OpenBSD ftp(1) or hurl(1): https://git.codemadness.org/hurl/ - iconv(1) command-line utilities, used by sfeed_update(1). If the text in your RSS/Atom feeds are already UTF-8 encoded then you don't need this. For a minimal iconv implementation: https://git.etalabs.net/cgit/noxcuse/tree/src/iconv.c - mandoc for documentation: https://mdocml.bsd.lv/ - curses (typically ncurses), otherwise see minicurses.h, used by sfeed_curses(1). - a terminal (emulator) supporting UTF-8 and the used capabilities, used by sfeed_curses(1). Optional run-time dependencies for sfeed_curses ----------------------------------------------- - xclip for yanking the URL or enclosure. See $SFEED_YANKER to change it. - xdg-open, used as a plumber by default. See $SFEED_PLUMBER to change it. - awk, used by the sfeed_content and sfeed_markread script. See the ENVIRONMENT VARIABLES section in the man page to change it. - lynx, used by the sfeed_content script to convert HTML content. See the ENVIRONMENT VARIABLES section in the man page to change it. Formats supported ----------------- sfeed supports a subset of XML 1.0 and a subset of: - Atom 1.0 (RFC 4287): https://datatracker.ietf.org/doc/html/rfc4287 - Atom 0.3 (draft, historic). - RSS 0.91+. - RDF (when used with RSS). - MediaRSS extensions (media:). - Dublin Core extensions (dc:). Other formats like JSON Feed, twtxt or certain RSS/Atom extensions are supported by converting them to RSS/Atom or to the sfeed(5) format directly. OS tested --------- - Linux, compilers: clang, gcc, chibicc, cproc, lacc, pcc, scc, tcc, libc: glibc, musl. - OpenBSD (clang, gcc). - NetBSD (with NetBSD curses). - FreeBSD - DragonFlyBSD - GNU/Hurd - Illumos (OpenIndiana). - Windows (cygwin gcc + mintty, mingw). - HaikuOS - SerenityOS - FreeDOS (djgpp, Open Watcom). - FUZIX (sdcc -mz80, with the sfeed parser program). Architectures tested -------------------- amd64, ARM, aarch64, HPPA, i386, MIPS32-BE, RISCV64, SPARC64, Z80. Files ----- sfeed - Read XML RSS or Atom feed data from stdin. Write feed data in TAB-separated format to stdout. sfeed_atom - Format feed data (TSV) to an Atom feed. sfeed_content - View item content, for use with sfeed_curses. sfeed_curses - Format feed data (TSV) to a curses interface. sfeed_frames - Format feed data (TSV) to HTML file(s) with frames. sfeed_gopher - Format feed data (TSV) to Gopher files. sfeed_html - Format feed data (TSV) to HTML. sfeed_opml_export - Generate an OPML XML file from a sfeedrc config file. sfeed_opml_import - Generate a sfeedrc config file from an OPML XML file. sfeed_markread - Mark items as read/unread, for use with sfeed_curses. sfeed_mbox - Format feed data (TSV) to mbox. sfeed_plain - Format feed data (TSV) to a plain-text list. sfeed_twtxt - Format feed data (TSV) to a twtxt feed. sfeed_update - Update feeds and merge items. sfeed_web - Find URLs to RSS/Atom feed from a webpage. sfeed_xmlenc - Detect character-set encoding from a XML stream. sfeedrc.example - Example config file. Can be copied to $HOME/.sfeed/sfeedrc. style.css - Example stylesheet to use with sfeed_html(1) and sfeed_frames(1). Files read at runtime by sfeed_update(1) ---------------------------------------- sfeedrc - Config file. This file is evaluated as a shellscript in sfeed_update(1). At least the following functions can be overridden per feed: - fetch: to use wget(1), OpenBSD ftp(1) or an other download program. - filter: to filter on fields. - merge: to change the merge logic. - order: to change the sort order. See also the sfeedrc(5) man page documentation for more details. The feeds() function is called to process the feeds. The default feed() function is executed concurrently as a background job in your sfeedrc(5) config file to make updating faster. The variable maxjobs can be changed to limit or increase the amount of concurrent jobs (8 by default). Files written at runtime by sfeed_update(1) ------------------------------------------- feedname - TAB-separated format containing all items per feed. The sfeed_update(1) script merges new items with this file. The format is documented in sfeed(5). File format ----------- man 5 sfeed man 5 sfeedrc man 1 sfeed Usage and examples ------------------ Find RSS/Atom feed URLs from a webpage: url="https://codemadness.org"; curl -L -s "$url" | sfeed_web "$url" output example: https://codemadness.org/atom.xml application/atom+xml https://codemadness.org/atom_content.xml application/atom+xml - - - Make sure your sfeedrc config file exists, see the sfeedrc.example file. To update your feeds (configfile argument is optional): sfeed_update "configfile" Format the feeds files: # Plain-text list. sfeed_plain $HOME/.sfeed/feeds/* > $HOME/.sfeed/feeds.txt # HTML view (no frames), copy style.css for a default style. sfeed_html $HOME/.sfeed/feeds/* > $HOME/.sfeed/feeds.html # HTML view with the menu as frames, copy style.css for a default style. mkdir -p somedir && cd somedir && sfeed_frames $HOME/.sfeed/feeds/* View formatted output in your browser: $BROWSER "$HOME/.sfeed/feeds.html" View formatted output in your editor: $EDITOR "$HOME/.sfeed/feeds.txt" - - - View formatted output in a curses interface. The interface has a look inspired by the mutt mail client. It has a sidebar panel for the feeds, a panel with a listing of the items and a small statusbar for the selected item/URL. Some functions like searching and scrolling are integrated in the interface itself. Just like the other format programs included in sfeed you can run it like this: sfeed_curses ~/.sfeed/feeds/* ... or by reading from stdin: sfeed_curses < ~/.sfeed/feeds/xkcd By default sfeed_curses marks the items of the last day as new/bold. To manage read/unread items in a different way a plain-text file with a list of the read URLs can be used. To enable this behaviour the path to this file can be specified by setting the environment variable $SFEED_URL_FILE to the URL file: export SFEED_URL_FILE="$HOME/.sfeed/urls" [ -f "$SFEED_URL_FILE" ] || touch "$SFEED_URL_FILE" sfeed_curses ~/.sfeed/feeds/* It then uses the shellscript "sfeed_markread" to process the read and unread items. - - - Example script to view feed items in a vertical list/menu in dmenu(1). It opens the selected URL in the browser set in $BROWSER: #!/bin/sh url=$(sfeed_plain "$HOME/.sfeed/feeds/"* | dmenu -l 35 -i | \ sed -n 's@^.* \([a-zA-Z]*://\)\(.*\)$@\1\2@p') test -n "${url}" && $BROWSER "${url}" dmenu can be found at: https://git.suckless.org/dmenu/ - - - Generate a sfeedrc config file from your exported list of feeds in OPML format: sfeed_opml_import < opmlfile.xml > $HOME/.sfeed/sfeedrc - - - Export an OPML file of your feeds from a sfeedrc config file (configfile argument is optional): sfeed_opml_export configfile > myfeeds.opml - - - The filter function can be overridden in your sfeedrc file. This allows filtering items per feed. It can be used to shorten URLs, filter away advertisements, strip tracking parameters and more. # filter fields. # filter(name) filter() { case "$1" in "tweakers") awk -F '\t' 'BEGIN { OFS = "\t"; } # skip ads. $2 ~ /^ADV:/ { next; } # shorten link. { if (match($3, /^https:\/\/tweakers\.net\/[a-z]+\/[0-9]+\//)) { $3 = substr($3, RSTART, RLENGTH); } print $0; }';; "yt BSDNow") # filter only BSD Now from channel. awk -F '\t' '$2 ~ / \| BSD Now/';; *) cat;; esac | \ # replace youtube links with embed links. sed 's@www.youtube.com/watch?v=@www.youtube.com/embed/@g' | \ awk -F '\t' 'BEGIN { OFS = "\t"; } function filterlink(s) { # protocol must start with http, https or gopher. if (match(s, /^(http|https|gopher):\/\//) == 0) { return ""; } # shorten feedburner links. if (match(s, /^(http|https):\/\/[^\/]+\/~r\/.*\/~3\/[^\/]+\//)) { s = substr($3, RSTART, RLENGTH); } # strip tracking parameters # urchin, facebook, piwik, webtrekk and generic. gsub(/\?(ad|campaign|fbclid|pk|tm|utm|wt)_([^&]+)/, "?", s); gsub(/&(ad|campaign|fbclid|pk|tm|utm|wt)_([^&]+)/, "", s); gsub(/\?&/, "?", s); gsub(/[\?&]+$/, "", s); return s } { $3 = filterlink($3); # link $8 = filterlink($8); # enclosure # try to remove tracking pixels: tags with 1px width or height. gsub("]*(width|height)[[:space:]]*=[[:space:]]*[\"'"'"' ]?1[\"'"'"' ]?[^0-9>]+[^>]*>", "", $4); print $0; }' } - - - Aggregate feeds. This filters new entries (maximum one day old) and sorts them by newest first. Prefix the feed name in the title. Convert the TSV output data to an Atom XML feed (again): #!/bin/sh cd ~/.sfeed/feeds/ || exit 1 awk -F '\t' -v "old=$(($(date +'%s') - 86400))" ' BEGIN { OFS = "\t"; } int($1) >= old { $2 = "[" FILENAME "] " $2; print $0; }' * | \ sort -k1,1rn | \ sfeed_atom - - - To have a "tail(1) -f"-like FIFO stream filtering for new unique feed items and showing them as plain-text per line similar to sfeed_plain(1): Create a FIFO: fifo="/tmp/sfeed_fifo" mkfifo "$fifo" On the reading side: # This keeps track of unique lines so might consume much memory. # It tries to reopen the $fifo after 1 second if it fails. while :; do cat "$fifo" || sleep 1; done | awk '!x[$0]++' On the writing side: feedsdir="$HOME/.sfeed/feeds/" cd "$feedsdir" || exit 1 test -p "$fifo" || exit 1 # 1 day is old news, don't write older items. awk -F '\t' -v "old=$(($(date +'%s') - 86400))" ' BEGIN { OFS = "\t"; } int($1) >= old { $2 = "[" FILENAME "] " $2; print $0; }' * | sort -k1,1n | sfeed_plain | cut -b 3- > "$fifo" cut -b is used to trim the "N " prefix of sfeed_plain(1). - - - For some podcast feed the following code can be used to filter the latest enclosure URL (probably some audio file): awk -F '\t' 'BEGIN { latest = 0; } length($8) { ts = int($1); if (ts > latest) { url = $8; latest = ts; } } END { if (length(url)) { print url; } }' ... or on a file already sorted from newest to oldest: awk -F '\t' '$8 { print $8; exit }' - - - Over time your feeds file might become quite big. You can archive items of a feed from (roughly) the last week by doing for example: awk -F '\t' -v "old=$(($(date +'%s') - 604800))" 'int($1) > old' < feed > feed.new mv feed feed.bak mv feed.new feed This could also be run weekly in a crontab to archive the feeds. Like throwing away old newspapers. It keeps the feeds list tidy and the formatted output small. - - - Convert mbox to separate maildirs per feed and filter duplicate messages using the fdm program. fdm is available at: https://github.com/nicm/fdm fdm config file (~/.sfeed/fdm.conf): set unmatched-mail keep account "sfeed" mbox "%[home]/.sfeed/mbox" $cachepath = "%[home]/.sfeed/fdm.cache" cache "${cachepath}" $maildir = "%[home]/feeds/" # Check if message is in the cache by Message-ID. match case "^Message-ID: (.*)" in headers action { tag "msgid" value "%1" } continue # If it is in the cache, stop. match matched and in-cache "${cachepath}" key "%[msgid]" action { keep } # Not in the cache, process it and add to cache. match case "^X-Feedname: (.*)" in headers action { # Store to local maildir. maildir "${maildir}%1" add-to-cache "${cachepath}" key "%[msgid]" keep } Now run: $ sfeed_mbox ~/.sfeed/feeds/* > ~/.sfeed/mbox $ fdm -f ~/.sfeed/fdm.conf fetch Now you can view feeds in mutt(1) for example. - - - Read from mbox and filter duplicate messages using the fdm program and deliver it to a SMTP server. This works similar to the rss2email program. fdm is available at: https://github.com/nicm/fdm fdm config file (~/.sfeed/fdm.conf): set unmatched-mail keep account "sfeed" mbox "%[home]/.sfeed/mbox" $cachepath = "%[home]/.sfeed/fdm.cache" cache "${cachepath}" # Check if message is in the cache by Message-ID. match case "^Message-ID: (.*)" in headers action { tag "msgid" value "%1" } continue # If it is in the cache, stop. match matched and in-cache "${cachepath}" key "%[msgid]" action { keep } # Not in the cache, process it and add to cache. match case "^X-Feedname: (.*)" in headers action { # Connect to a SMTP server and attempt to deliver the # mail to it. # Of course change the server and e-mail below. smtp server "codemadness.org" to "hiltjo@codemadness.org" add-to-cache "${cachepath}" key "%[msgid]" keep } Now run: $ sfeed_mbox ~/.sfeed/feeds/* > ~/.sfeed/mbox $ fdm -f ~/.sfeed/fdm.conf fetch Now you can view feeds in mutt(1) for example. - - - Convert mbox to separate maildirs per feed and filter duplicate messages using procmail(1). procmail_maildirs.sh file: maildir="$HOME/feeds" feedsdir="$HOME/.sfeed/feeds" procmailconfig="$HOME/.sfeed/procmailrc" # message-id cache to prevent duplicates. mkdir -p "${maildir}/.cache" if ! test -r "${procmailconfig}"; then printf "Procmail configuration file \"%s\" does not exist or is not readable.\n" "${procmailconfig}" >&2 echo "See procmailrc.example for an example." >&2 exit 1 fi find "${feedsdir}" -type f -exec printf '%s\n' {} \; | while read -r d; do name=$(basename "${d}") mkdir -p "${maildir}/${name}/cur" mkdir -p "${maildir}/${name}/new" mkdir -p "${maildir}/${name}/tmp" printf 'Mailbox %s\n' "${name}" sfeed_mbox "${d}" | formail -s procmail "${procmailconfig}" done Procmailrc(5) file: # Example for use with sfeed_mbox(1). # The header X-Feedname is used to split into separate maildirs. It is # assumed this name is sane. MAILDIR="$HOME/feeds/" :0 * ^X-Feedname: \/.* { FEED="$MATCH" :0 Wh: "msgid_$FEED.lock" | formail -D 1024000 ".cache/msgid_$FEED.cache" :0 "$FEED"/ } Now run: $ procmail_maildirs.sh Now you can view feeds in mutt(1) for example. - - - The fetch function can be overridden in your sfeedrc file. This allows to replace the default curl(1) for sfeed_update with any other client to fetch the RSS/Atom data or change the default curl options: # fetch a feed via HTTP/HTTPS etc. # fetch(name, url, feedfile) fetch() { hurl -m 1048576 -t 15 "$2" 2>/dev/null } - - - Caching, incremental data updates and bandwidth-saving For servers that support it some incremental updates and bandwidth-saving can be done by using the "ETag" HTTP header. Create a directory for storing the ETags per feed: mkdir -p ~/.sfeed/etags/ The curl ETag options (--etag-save and --etag-compare) can be used to store and send the previous ETag header value. curl version 7.73+ is recommended for it to work properly. The curl -z option can be used to send the modification date of a local file as a HTTP "If-Modified-Since" request header. The server can then respond if the data is modified or not or respond with only the incremental data. The curl --compressed option can be used to indicate the client supports decompression. Because RSS/Atom feeds are textual XML content this generally compresses very well. These options can be set by overriding the fetch() function in the sfeedrc file: # fetch(name, url, feedfile) fetch() { etag="$HOME/.sfeed/etags/$(basename "$3")" curl \ -L --max-redirs 0 -H "User-Agent:" -f -s -m 15 \ --compressed \ --etag-save "${etag}" --etag-compare "${etag}" \ -z "${etag}" \ "$2" 2>/dev/null } These options can come at a cost of some privacy, because it exposes additional metadata from the previous request. - - - CDNs blocking requests due to a missing HTTP User-Agent request header sfeed_update will not send the "User-Agent" header by default for privacy reasons. Some CDNs like Cloudflare don't like this and will block such HTTP requests. A custom User-Agent can be set by using the curl -H option, like so: curl -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0' The above example string pretends to be a Windows 10 (x86-64) machine running Firefox 78. - - - Page redirects For security and efficiency reasons by default redirects are not allowed and are treated as an error. For example to prevent hijacking an unencrypted http:// to https:// redirect or to not add time of an unnecessary page redirect each time. It is encouraged to use the final redirected URL in the sfeedrc config file. If you want to ignore this advise you can override the fetch() function in the sfeedrc file and change the curl options "-L --max-redirs 0". - - - Shellscript to update feeds in parallel more efficiently using xargs -P. It creates a queue of the feeds with its settings, then uses xargs to process them in parallel using the common, but non-POSIX -P option. This is more efficient than the more portable solution in sfeed_update which can stall a batch of $maxjobs in the queue if one item is slow. sfeed_update_xargs shellscript: #!/bin/sh # update feeds, merge with old feeds using xargs in parallel mode (non-POSIX). # include script and reuse its functions, but do not start main(). SFEED_UPDATE_INCLUDE="1" . sfeed_update # load config file, sets $config. loadconfig "$1" # process a single feed. # args are: config, tmpdir, name, feedurl, basesiteurl, encoding if [ "${SFEED_UPDATE_CHILD}" = "1" ]; then sfeedtmpdir="$2" _feed "$3" "$4" "$5" "$6" exit $? fi # ...else parent mode: # feed(name, feedurl, basesiteurl, encoding) feed() { # workaround: *BSD xargs doesn't handle empty fields in the middle. name="${1:-$$}" feedurl="${2:-http://}" basesiteurl="${3:-${feedurl}}" encoding="$4" printf '%s\0%s\0%s\0%s\0%s\0%s\0' "${config}" "${sfeedtmpdir}" \ "${name}" "${feedurl}" "${basesiteurl}" "${encoding}" } # fetch feeds and store in temporary directory. sfeedtmpdir="$(mktemp -d '/tmp/sfeed_XXXXXX')" mkdir -p "${sfeedtmpdir}/feeds" touch "${sfeedtmpdir}/ok" # make sure path exists. mkdir -p "${sfeedpath}" # print feeds for parallel processing with xargs. feeds | SFEED_UPDATE_CHILD="1" xargs -r -0 -P "${maxjobs}" -L 6 "$(readlink -f "$0")" status=$? # check error exit status indicator for parallel jobs. test -f "${sfeedtmpdir}/ok" || status=1 # cleanup temporary files etc. cleanup exit ${status} - - - Shellscript to handle URLs and enclosures in parallel using xargs -P. This can be used to download and process URLs for downloading podcasts, webcomics, download and convert webpages, mirror videos, etc. It uses a plain-text cache file for remembering processed URLs. The match patterns are defined in the shellscript fetch() function and in the awk script and can be modified to handle items differently depending on their context. The arguments for the script are files in the sfeed(5) format. If no file arguments are specified then the data is read from stdin. #!/bin/sh # sfeed_download: downloader for URLs and enclosures in sfeed(5) files. # Dependencies: awk, curl, flock, xargs (-P), youtube-dl. cachefile="${SFEED_CACHEFILE:-$HOME/.sfeed/downloaded_urls}" jobs="${SFEED_JOBS:-4}" lockfile="${HOME}/.sfeed/sfeed_download.lock" # log(feedname, s, status) log() { if [ "$1" != "-" ]; then s="[$1] $2" else s="$2" fi printf '[%s]: %s: %s\n' "$(date +'%H:%M:%S')" "${s}" "$3" } # fetch(url, feedname) fetch() { case "$1" in *youtube.com*) youtube-dl "$1";; *.flac|*.ogg|*.m3u|*.m3u8|*.m4a|*.mkv|*.mp3|*.mp4|*.wav|*.webm) # allow 2 redirects, hide User-Agent, connect timeout is 15 seconds. curl -O -L --max-redirs 2 -H "User-Agent:" -f -s --connect-timeout 15 "$1";; esac } # downloader(url, title, feedname) downloader() { url="$1" title="$2" feedname="${3##*/}" msg="${title}: ${url}" # download directory. if [ "${feedname}" != "-" ]; then mkdir -p "${feedname}" if ! cd "${feedname}"; then log "${feedname}" "${msg}: ${feedname}" "DIR FAIL" >&2 return 1 fi fi log "${feedname}" "${msg}" "START" if fetch "${url}" "${feedname}"; then log "${feedname}" "${msg}" "OK" # append it safely in parallel to the cachefile on a # successful download. (flock 9 || exit 1 printf '%s\n' "${url}" >> "${cachefile}" ) 9>"${lockfile}" else log "${feedname}" "${msg}" "FAIL" >&2 return 1 fi return 0 } if [ "${SFEED_DOWNLOAD_CHILD}" = "1" ]; then # Downloader helper for parallel downloading. # Receives arguments: $1 = URL, $2 = title, $3 = feed filename or "-". # It should write the URI to the cachefile if it is successful. downloader "$1" "$2" "$3" exit $? fi # ...else parent mode: tmp=$(mktemp) trap "rm -f ${tmp}" EXIT [ -f "${cachefile}" ] || touch "${cachefile}" cat "${cachefile}" > "${tmp}" echo >> "${tmp}" # force it to have one line for awk. LC_ALL=C awk -F '\t' ' # fast prefilter what to download or not. function filter(url, field, feedname) { u = tolower(url); return (match(u, "youtube\\.com") || match(u, "\\.(flac|ogg|m3u|m3u8|m4a|mkv|mp3|mp4|wav|webm)$")); } function download(url, field, title, filename) { if (!length(url) || urls[url] || !filter(url, field, filename)) return; # NUL-separated for xargs -0. printf("%s%c%s%c%s%c", url, 0, title, 0, filename, 0); urls[url] = 1; # print once } { FILENR += (FNR == 1); } # lookup table from cachefile which contains downloaded URLs. FILENR == 1 { urls[$0] = 1; } # feed file(s). FILENR != 1 { download($3, 3, $2, FILENAME); # link download($8, 8, $2, FILENAME); # enclosure } ' "${tmp}" "${@:--}" | \ SFEED_DOWNLOAD_CHILD="1" xargs -r -0 -L 3 -P "${jobs}" "$(readlink -f "$0")" - - - Shellscript to export existing newsboat cached items from sqlite3 to the sfeed TSV format. #!/bin/sh # Export newsbeuter/newsboat cached items from sqlite3 to the sfeed TSV format. # The data is split per file per feed with the name of the newsboat title/url. # It writes the URLs of the read items line by line to a "urls" file. # # Dependencies: sqlite3, awk. # # Usage: create some directory to store the feeds then run this script. # newsboat cache.db file. cachefile="$HOME/.newsboat/cache.db" test -n "$1" && cachefile="$1" # dump data. # .mode ascii: Columns/rows delimited by 0x1F and 0x1E # get the first fields in the order of the sfeed(5) format. sqlite3 "$cachefile" < "/dev/stderr"; } contenttype = field($5); if (contenttype == "") contenttype = "html"; else if (index(contenttype, "/html") || index(contenttype, "/xhtml")) contenttype = "html"; else contenttype = "plain"; print $1 "\t" field($2) "\t" field($3) "\t" content($4) "\t" \ contenttype "\t" field($6) "\t" field($7) "\t" field($8) "\t" \ > fname; # write URLs of the read items to a file line by line. if ($11 == "0") { print $3 > "urls"; } }' - - - Progress indicator ------------------ The below sfeed_update wrapper script counts the amount of feeds in a sfeedrc config. It then calls sfeed_update and pipes the output lines to a function that counts the current progress. It writes the total progress to stderr. Alternative: pv -l -s totallines #!/bin/sh # Progress indicator script. # Pass lines as input to stdin and write progress status to stderr. # progress(totallines) progress() { total="$(($1 + 0))" # must be a number, no divide by zero. test "${total}" -le 0 -o "$1" != "${total}" && return LC_ALL=C awk -v "total=${total}" ' { counter++; percent = (counter * 100) / total; printf("\033[K") > "/dev/stderr"; # clear EOL print $0; printf("[%s/%s] %.0f%%\r", counter, total, percent) > "/dev/stderr"; fflush(); # flush all buffers per line. } END { printf("\033[K") > "/dev/stderr"; }' } # Counts the feeds from the sfeedrc config. countfeeds() { count=0 . "$1" feed() { count=$((count + 1)) } feeds echo "${count}" } config="${1:-$HOME/.sfeed/sfeedrc}" total=$(countfeeds "${config}") sfeed_update "${config}" 2>&1 | progress "${total}" - - - Counting unread and total items ------------------------------- It can be useful to show the counts of unread items, for example in a windowmanager or statusbar. The below example script counts the items of the last day in the same way the formatting tools do: #!/bin/sh # Count the new items of the last day. LC_ALL=C awk -F '\t' -v "old=$(($(date +'%s') - 86400))" ' { total++; } int($1) >= old { totalnew++; } END { print "New: " totalnew; print "Total: " total; }' ~/.sfeed/feeds/* The below example script counts the unread items using the sfeed_curses URL file: #!/bin/sh # Count the unread and total items from feeds using the URL file. LC_ALL=C awk -F '\t' ' # URL file: amount of fields is 1. NF == 1 { u[$0] = 1; # lookup table of URLs. next; } # feed file: check by URL or id. { total++; if (length($3)) { if (u[$3]) read++; } else if (length($6)) { if (u[$6]) read++; } } END { print "Unread: " (total - read); print "Total: " total; }' ~/.sfeed/urls ~/.sfeed/feeds/* - - - sfeed.c: adding new XML tags or sfeed(5) fields to the parser ------------------------------------------------------------- sfeed.c contains definitions to parse XML tags and map them to sfeed(5) TSV fields. Parsed RSS and Atom tag names are first stored as a TagId, which is a number. This TagId is then mapped to the output field index. Steps to modify the code: * Add a new TagId enum for the tag. * (optional) Add a new FeedField* enum for the new output field or you can map it to an existing field. * Add the new XML tag name to the array variable of parsed RSS or Atom tags: rsstags[] or atomtags[]. These must be defined in alphabetical order, because a binary search is used which uses the strcasecmp() function. * Add the parsed TagId to the output field in the array variable fieldmap[]. When another tag is also mapped to the same output field then the tag with the highest TagId number value overrides the mapped field: the order is from least important to high. * If this defined tag is just using the inner data of the XML tag, then this definition is enough. If it for example has to parse a certain attribute you have to add a check for the TagId to the xmlattr() callback function. * (optional) Print the new field in the printfields() function. Below is a patch example to add the MRSS "media:content" tag as a new field: diff --git a/sfeed.c b/sfeed.c --- a/sfeed.c +++ b/sfeed.c @@ -50,7 +50,7 @@ enum TagId { RSSTagGuidPermalinkTrue, /* must be defined after GUID, because it can be a link (isPermaLink) */ RSSTagLink, - RSSTagEnclosure, + RSSTagMediaContent, RSSTagEnclosure, RSSTagAuthor, RSSTagDccreator, RSSTagCategory, /* Atom */ @@ -81,7 +81,7 @@ typedef struct field { enum { FeedFieldTime = 0, FeedFieldTitle, FeedFieldLink, FeedFieldContent, FeedFieldId, FeedFieldAuthor, FeedFieldEnclosure, FeedFieldCategory, - FeedFieldLast + FeedFieldMediaContent, FeedFieldLast }; typedef struct feedcontext { @@ -137,6 +137,7 @@ static const FeedTag rsstags[] = { { STRP("enclosure"), RSSTagEnclosure }, { STRP("guid"), RSSTagGuid }, { STRP("link"), RSSTagLink }, + { STRP("media:content"), RSSTagMediaContent }, { STRP("media:description"), RSSTagMediaDescription }, { STRP("pubdate"), RSSTagPubdate }, { STRP("title"), RSSTagTitle } @@ -180,6 +181,7 @@ static const int fieldmap[TagLast] = { [RSSTagGuidPermalinkFalse] = FeedFieldId, [RSSTagGuidPermalinkTrue] = FeedFieldId, /* special-case: both a link and an id */ [RSSTagLink] = FeedFieldLink, + [RSSTagMediaContent] = FeedFieldMediaContent, [RSSTagEnclosure] = FeedFieldEnclosure, [RSSTagAuthor] = FeedFieldAuthor, [RSSTagDccreator] = FeedFieldAuthor, @@ -677,6 +679,8 @@ printfields(void) string_print_uri(&ctx.fields[FeedFieldEnclosure].str); putchar(FieldSeparator); string_print_trimmed_multi(&ctx.fields[FeedFieldCategory].str); + putchar(FieldSeparator); + string_print_trimmed(&ctx.fields[FeedFieldMediaContent].str); putchar('\n'); if (ferror(stdout)) /* check for errors but do not flush */ @@ -718,7 +722,7 @@ xmlattr(XMLParser *p, const char *t, size_t tl, const char *n, size_t nl, } if (ctx.feedtype == FeedTypeRSS) { - if (ctx.tag.id == RSSTagEnclosure && + if ((ctx.tag.id == RSSTagEnclosure || ctx.tag.id == RSSTagMediaContent) && isattr(n, nl, STRP("url"))) { string_append(&tmpstr, v, vl); } else if (ctx.tag.id == RSSTagGuid && - - - Running custom commands inside the sfeed_curses program ------------------------------------------------------- Running commands inside the sfeed_curses program can be useful for example to sync items or mark all items across all feeds as read. It can be comfortable to have a keybind for this inside the program to perform a scripted action and then reload the feeds by sending the signal SIGHUP. In the input handling code you can then add a case: case 'M': forkexec((char *[]) { "markallread.sh", NULL }, 0); break; or case 'S': forkexec((char *[]) { "syncnews.sh", NULL }, 1); break; The specified script should be in $PATH or be an absolute path. Example of a `markallread.sh` shellscript to mark all URLs as read: #!/bin/sh # mark all items/URLs as read. tmp=$(mktemp) (cat ~/.sfeed/urls; cut -f 3 ~/.sfeed/feeds/*) | \ awk '!x[$0]++' > "$tmp" && mv "$tmp" ~/.sfeed/urls && pkill -SIGHUP sfeed_curses # reload feeds. Example of a `syncnews.sh` shellscript to update the feeds and reload them: #!/bin/sh sfeed_update pkill -SIGHUP sfeed_curses Running programs in a new session --------------------------------- By default processes are spawned in the same session and process group as sfeed_curses. When sfeed_curses is closed this can also close the spawned process in some cases. When the setsid command-line program is available the following wrapper command can be used to run the program in a new session, for a plumb program: setsid -f xdg-open "$@" Alternatively the code can be changed to call setsid() before execvp(). Open an URL directly in the same terminal ----------------------------------------- To open an URL directly in the same terminal using the text-mode lynx browser: SFEED_PLUMBER=lynx SFEED_PLUMBER_INTERACTIVE=1 sfeed_curses ~/.sfeed/feeds/* Yank to tmux buffer ------------------- This changes the yank command to set the tmux buffer, instead of X11 xclip: SFEED_YANKER="tmux set-buffer \`cat\`" Known terminal issues --------------------- Below lists some bugs or missing features in terminals that are found while testing sfeed_curses. Some of them might be fixed already upstream: - cygwin + mintty: the xterm mouse-encoding of the mouse position is broken for scrolling. - HaikuOS terminal: the xterm mouse-encoding of the mouse button number of the middle-button, right-button is incorrect / reversed. - putty: the full reset attribute (ESC c, typically `rs1`) does not reset the window title. - Mouse button encoding for extended buttons (like side-buttons) in some terminals are unsupported or map to the same button: for example side-buttons 7 and 8 map to the scroll buttons 4 and 5 in urxvt. License ------- ISC, see LICENSE file. Author ------ Hiltjo Posthuma