diff options
author | Benjamin Chausse <benjamin@chausse.xyz> | 2024-08-09 14:11:50 -0400 |
---|---|---|
committer | Benjamin Chausse <benjamin@chausse.xyz> | 2024-08-09 14:11:50 -0400 |
commit | 5857d82e8e596d6fda406a0c4d8d68ca7a03c124 (patch) | |
tree | 553916894dee907825360580c5d9a05c82c5af16 /README | |
parent | 3574e3cbf9d99546e868aeb995ce2c171cdc36a6 (diff) | |
parent | 19957bc272e745af7b56b79fa648e8b6b77113b1 (diff) |
Diffstat (limited to 'README')
-rw-r--r-- | README | 335 |
1 files changed, 251 insertions, 84 deletions
@@ -38,7 +38,7 @@ Initial setup: cp sfeedrc.example "$HOME/.sfeed/sfeedrc" Edit the sfeedrc(5) configuration file and change any RSS/Atom feeds. This file -is included and evaluated as a shellscript for sfeed_update, so it's functions +is included and evaluated as a shellscript for sfeed_update, so its functions and behaviour can be overridden: $EDITOR "$HOME/.sfeed/sfeedrc" @@ -76,6 +76,7 @@ HTML view (no frames), copy style.css for a default style: HTML view with the menu as frames, copy style.css for a default style: mkdir -p "$HOME/.sfeed/frames" + cp style.css "$HOME/.sfeed/frames/style.css" cd "$HOME/.sfeed/frames" && sfeed_frames $HOME/.sfeed/feeds/* To automatically update your feeds periodically and format them in a way you @@ -107,7 +108,8 @@ Optional dependencies - POSIX sh(1), used by sfeed_update(1) and sfeed_opml_export(1). - POSIX utilities such as awk(1) and sort(1), - used by sfeed_content(1), sfeed_markread(1) and sfeed_update(1). + used by sfeed_content(1), sfeed_markread(1), sfeed_opml_export(1) and + sfeed_update(1). - curl(1) binary: https://curl.haxx.se/ , used by sfeed_update(1), but can be replaced with any tool like wget(1), OpenBSD ftp(1) or hurl(1): https://git.codemadness.org/hurl/ @@ -115,6 +117,8 @@ Optional dependencies used by sfeed_update(1). If the text in your RSS/Atom feeds are already UTF-8 encoded then you don't need this. For a minimal iconv implementation: https://git.etalabs.net/cgit/noxcuse/tree/src/iconv.c +- xargs with support for the -P and -0 option, + used by sfeed_update(1). - mandoc for documentation: https://mdocml.bsd.lv/ - curses (typically ncurses), otherwise see minicurses.h, used by sfeed_curses(1). @@ -140,12 +144,12 @@ sfeed supports a subset of XML 1.0 and a subset of: - Atom 1.0 (RFC 4287): https://datatracker.ietf.org/doc/html/rfc4287 - Atom 0.3 (draft, historic). -- RSS 0.91+. +- RSS 0.90+. - RDF (when used with RSS). - MediaRSS extensions (media:). - Dublin Core extensions (dc:). -Other formats like JSONfeed, twtxt or certain RSS/Atom extensions can be +Other formats like JSON Feed, twtxt or certain RSS/Atom extensions are supported by converting them to RSS/Atom or to the sfeed(5) format directly. @@ -153,7 +157,7 @@ OS tested --------- - Linux, - compilers: clang, gcc, chibicc, cproc, lacc, pcc, tcc, + compilers: clang, gcc, chibicc, cproc, lacc, pcc, scc, tcc, libc: glibc, musl. - OpenBSD (clang, gcc). - NetBSD (with NetBSD curses). @@ -164,7 +168,7 @@ OS tested - Windows (cygwin gcc + mintty, mingw). - HaikuOS - SerenityOS -- FreeDOS (djgpp). +- FreeDOS (djgpp, Open Watcom). - FUZIX (sdcc -mz80, with the sfeed parser program). @@ -185,6 +189,7 @@ sfeed_curses - Format feed data (TSV) to a curses interface. sfeed_frames - Format feed data (TSV) to HTML file(s) with frames. sfeed_gopher - Format feed data (TSV) to Gopher files. sfeed_html - Format feed data (TSV) to HTML. +sfeed_json - Format feed data (TSV) to JSON Feed. sfeed_opml_export - Generate an OPML XML file from a sfeedrc config file. sfeed_opml_import - Generate a sfeedrc config file from an OPML XML file. sfeed_markread - Mark items as read/unread, for use with sfeed_curses. @@ -245,13 +250,13 @@ Find RSS/Atom feed URLs from a webpage: output example: - https://codemadness.org/blog/rss.xml application/rss+xml - https://codemadness.org/blog/atom.xml application/atom+xml + https://codemadness.org/atom.xml application/atom+xml + https://codemadness.org/atom_content.xml application/atom+xml - - - -Make sure your sfeedrc config file exists, see sfeedrc.example. To update your -feeds (configfile argument is optional): +Make sure your sfeedrc config file exists, see the sfeedrc.example file. To +update your feeds (configfile argument is optional): sfeed_update "configfile" @@ -287,10 +292,12 @@ Just like the other format programs included in sfeed you can run it like this: sfeed_curses < ~/.sfeed/feeds/xkcd -By default sfeed_curses marks the items of the last day as new/bold. To manage -read/unread items in a different way a plain-text file with a list of the read -URLs can be used. To enable this behaviour the path to this file can be -specified by setting the environment variable $SFEED_URL_FILE to the URL file: +By default sfeed_curses marks the items of the last day as new/bold. This limit +might be overridden by setting the environment variable $SFEED_NEW_AGE to the +desired maximum in seconds. To manage read/unread items in a different way a +plain-text file with a list of the read URLs can be used. To enable this +behaviour the path to this file can be specified by setting the environment +variable $SFEED_URL_FILE to the URL file: export SFEED_URL_FILE="$HOME/.sfeed/urls" [ -f "$SFEED_URL_FILE" ] || touch "$SFEED_URL_FILE" @@ -332,7 +339,7 @@ filtering items per feed. It can be used to shorten URLs, filter away advertisements, strip tracking parameters and more. # filter fields. - # filter(name) + # filter(name, url) filter() { case "$1" in "tweakers") @@ -578,7 +585,7 @@ procmail_maildirs.sh file: mkdir -p "${maildir}/.cache" if ! test -r "${procmailconfig}"; then - echo "Procmail configuration file \"${procmailconfig}\" does not exist or is not readable." >&2 + printf "Procmail configuration file \"%s\" does not exist or is not readable.\n" "${procmailconfig}" >&2 echo "See procmailrc.example for an example." >&2 exit 1 fi @@ -675,8 +682,8 @@ additional metadata from the previous request. CDNs blocking requests due to a missing HTTP User-Agent request header sfeed_update will not send the "User-Agent" header by default for privacy -reasons. Some CDNs like Cloudflare don't like this and will block such HTTP -requests. +reasons. Some CDNs like Cloudflare or websites like Reddit.com don't like this +and will block such HTTP requests. A custom User-Agent can be set by using the curl -H option, like so: @@ -701,56 +708,6 @@ sfeedrc file and change the curl options "-L --max-redirs 0". - - - -Shellscript to update feeds in parallel more efficiently using xargs -P. - -It creates a queue of the feeds with its settings, then uses xargs to process -them in parallel using the common, but non-POSIX -P option. This is more -efficient than the more portable solution in sfeed_update which can stall a -batch of $maxjobs in the queue if one item is slow. - -sfeed_update_xargs shellscript: - - #!/bin/sh - # update feeds, merge with old feeds using xargs in parallel mode (non-POSIX). - - # include script and reuse its functions, but do not start main(). - SFEED_UPDATE_INCLUDE="1" . sfeed_update - # load config file, sets $config. - loadconfig "$1" - - # process a single feed. - # args are: config, tmpdir, name, feedurl, basesiteurl, encoding - if [ "${SFEED_UPDATE_CHILD}" = "1" ]; then - sfeedtmpdir="$2" - _feed "$3" "$4" "$5" "$6" - exit $? - fi - - # ...else parent mode: - - # feed(name, feedurl, basesiteurl, encoding) - feed() { - # workaround: *BSD xargs doesn't handle empty fields in the middle. - name="${1:-$$}" - feedurl="${2:-http://}" - basesiteurl="${3:-${feedurl}}" - encoding="$4" - - printf '%s\0%s\0%s\0%s\0%s\0%s\0' "${config}" "${sfeedtmpdir}" \ - "${name}" "${feedurl}" "${basesiteurl}" "${encoding}" - } - - # fetch feeds and store in temporary directory. - sfeedtmpdir="$(mktemp -d '/tmp/sfeed_XXXXXX')" - # make sure path exists. - mkdir -p "${sfeedpath}" - # print feeds for parallel processing with xargs. - feeds | SFEED_UPDATE_CHILD="1" xargs -r -0 -P "${maxjobs}" -L 6 "$(readlink -f "$0")" - # cleanup temporary files etc. - cleanup - -- - - - Shellscript to handle URLs and enclosures in parallel using xargs -P. This can be used to download and process URLs for downloading podcasts, @@ -764,7 +721,7 @@ arguments are specified then the data is read from stdin. #!/bin/sh # sfeed_download: downloader for URLs and enclosures in sfeed(5) files. - # Dependencies: awk, curl, flock, xargs (-P), youtube-dl. + # Dependencies: awk, curl, flock, xargs (-P), yt-dlp. cachefile="${SFEED_CACHEFILE:-$HOME/.sfeed/downloaded_urls}" jobs="${SFEED_JOBS:-4}" @@ -777,14 +734,14 @@ arguments are specified then the data is read from stdin. else s="$2" fi - printf '[%s]: %s: %s\n' "$(date +'%H:%M:%S')" "${s}" "$3" >&2 + printf '[%s]: %s: %s\n' "$(date +'%H:%M:%S')" "${s}" "$3" } # fetch(url, feedname) fetch() { case "$1" in *youtube.com*) - youtube-dl "$1";; + yt-dlp "$1";; *.flac|*.ogg|*.m3u|*.m3u8|*.m4a|*.mkv|*.mp3|*.mp4|*.wav|*.webm) # allow 2 redirects, hide User-Agent, connect timeout is 15 seconds. curl -O -L --max-redirs 2 -H "User-Agent:" -f -s --connect-timeout 15 "$1";; @@ -803,14 +760,13 @@ arguments are specified then the data is read from stdin. if [ "${feedname}" != "-" ]; then mkdir -p "${feedname}" if ! cd "${feedname}"; then - log "${feedname}" "${msg}: ${feedname}" "DIR FAIL" - exit 1 + log "${feedname}" "${msg}: ${feedname}" "DIR FAIL" >&2 + return 1 fi fi log "${feedname}" "${msg}" "START" - fetch "${url}" "${feedname}" - if [ $? = 0 ]; then + if fetch "${url}" "${feedname}"; then log "${feedname}" "${msg}" "OK" # append it safely in parallel to the cachefile on a @@ -819,21 +775,23 @@ arguments are specified then the data is read from stdin. printf '%s\n' "${url}" >> "${cachefile}" ) 9>"${lockfile}" else - log "${feedname}" "${msg}" "FAIL" + log "${feedname}" "${msg}" "FAIL" >&2 + return 1 fi + return 0 } if [ "${SFEED_DOWNLOAD_CHILD}" = "1" ]; then # Downloader helper for parallel downloading. # Receives arguments: $1 = URL, $2 = title, $3 = feed filename or "-". - # It should write the URI to the cachefile if it is succesful. + # It should write the URI to the cachefile if it is successful. downloader "$1" "$2" "$3" exit $? fi # ...else parent mode: - tmp=$(mktemp) + tmp="$(mktemp)" || exit 1 trap "rm -f ${tmp}" EXIT [ -f "${cachefile}" ] || touch "${cachefile}" @@ -963,8 +921,199 @@ TSV format. - - - -Running custom commands inside the program ------------------------------------------- +Progress indicator +------------------ + +The below sfeed_update wrapper script counts the amount of feeds in a sfeedrc +config. It then calls sfeed_update and pipes the output lines to a function +that counts the current progress. It writes the total progress to stderr. +Alternative: pv -l -s totallines + + #!/bin/sh + # Progress indicator script. + + # Pass lines as input to stdin and write progress status to stderr. + # progress(totallines) + progress() { + total="$(($1 + 0))" # must be a number, no divide by zero. + test "${total}" -le 0 -o "$1" != "${total}" && return + LC_ALL=C awk -v "total=${total}" ' + { + counter++; + percent = (counter * 100) / total; + printf("\033[K") > "/dev/stderr"; # clear EOL + print $0; + printf("[%s/%s] %.0f%%\r", counter, total, percent) > "/dev/stderr"; + fflush(); # flush all buffers per line. + } + END { + printf("\033[K") > "/dev/stderr"; + }' + } + + # Counts the feeds from the sfeedrc config. + countfeeds() { + count=0 + . "$1" + feed() { + count=$((count + 1)) + } + feeds + echo "${count}" + } + + config="${1:-$HOME/.sfeed/sfeedrc}" + total=$(countfeeds "${config}") + sfeed_update "${config}" 2>&1 | progress "${total}" + +- - - + +Counting unread and total items +------------------------------- + +It can be useful to show the counts of unread items, for example in a +windowmanager or statusbar. + +The below example script counts the items of the last day in the same way the +formatting tools do: + + #!/bin/sh + # Count the new items of the last day. + LC_ALL=C awk -F '\t' -v "old=$(($(date +'%s') - 86400))" ' + { + total++; + } + int($1) >= old { + totalnew++; + } + END { + print "New: " totalnew; + print "Total: " total; + }' ~/.sfeed/feeds/* + +The below example script counts the unread items using the sfeed_curses URL +file: + + #!/bin/sh + # Count the unread and total items from feeds using the URL file. + LC_ALL=C awk -F '\t' ' + # URL file: amount of fields is 1. + NF == 1 { + u[$0] = 1; # lookup table of URLs. + next; + } + # feed file: check by URL or id. + { + total++; + if (length($3)) { + if (u[$3]) + read++; + } else if (length($6)) { + if (u[$6]) + read++; + } + } + END { + print "Unread: " (total - read); + print "Total: " total; + }' ~/.sfeed/urls ~/.sfeed/feeds/* + +- - - + +sfeed.c: adding new XML tags or sfeed(5) fields to the parser +------------------------------------------------------------- + +sfeed.c contains definitions to parse XML tags and map them to sfeed(5) TSV +fields. Parsed RSS and Atom tag names are first stored as a TagId, which is a +number. This TagId is then mapped to the output field index. + +Steps to modify the code: + +* Add a new TagId enum for the tag. + +* (optional) Add a new FeedField* enum for the new output field or you can map + it to an existing field. + +* Add the new XML tag name to the array variable of parsed RSS or Atom + tags: rsstags[] or atomtags[]. + + These must be defined in alphabetical order, because a binary search is used + which uses the strcasecmp() function. + +* Add the parsed TagId to the output field in the array variable fieldmap[]. + + When another tag is also mapped to the same output field then the tag with + the highest TagId number value overrides the mapped field: the order is from + least important to high. + +* If this defined tag is just using the inner data of the XML tag, then this + definition is enough. If it for example has to parse a certain attribute you + have to add a check for the TagId to the xmlattr() callback function. + +* (optional) Print the new field in the printfields() function. + +Below is a patch example to add the MRSS "media:content" tag as a new field: + +diff --git a/sfeed.c b/sfeed.c +--- a/sfeed.c ++++ b/sfeed.c +@@ -50,7 +50,7 @@ enum TagId { + RSSTagGuidPermalinkTrue, + /* must be defined after GUID, because it can be a link (isPermaLink) */ + RSSTagLink, +- RSSTagEnclosure, ++ RSSTagMediaContent, RSSTagEnclosure, + RSSTagAuthor, RSSTagDccreator, + RSSTagCategory, + /* Atom */ +@@ -81,7 +81,7 @@ typedef struct field { + enum { + FeedFieldTime = 0, FeedFieldTitle, FeedFieldLink, FeedFieldContent, + FeedFieldId, FeedFieldAuthor, FeedFieldEnclosure, FeedFieldCategory, +- FeedFieldLast ++ FeedFieldMediaContent, FeedFieldLast + }; + + typedef struct feedcontext { +@@ -137,6 +137,7 @@ static const FeedTag rsstags[] = { + { STRP("enclosure"), RSSTagEnclosure }, + { STRP("guid"), RSSTagGuid }, + { STRP("link"), RSSTagLink }, ++ { STRP("media:content"), RSSTagMediaContent }, + { STRP("media:description"), RSSTagMediaDescription }, + { STRP("pubdate"), RSSTagPubdate }, + { STRP("title"), RSSTagTitle } +@@ -180,6 +181,7 @@ static const int fieldmap[TagLast] = { + [RSSTagGuidPermalinkFalse] = FeedFieldId, + [RSSTagGuidPermalinkTrue] = FeedFieldId, /* special-case: both a link and an id */ + [RSSTagLink] = FeedFieldLink, ++ [RSSTagMediaContent] = FeedFieldMediaContent, + [RSSTagEnclosure] = FeedFieldEnclosure, + [RSSTagAuthor] = FeedFieldAuthor, + [RSSTagDccreator] = FeedFieldAuthor, +@@ -677,6 +679,8 @@ printfields(void) + string_print_uri(&ctx.fields[FeedFieldEnclosure].str); + putchar(FieldSeparator); + string_print_trimmed_multi(&ctx.fields[FeedFieldCategory].str); ++ putchar(FieldSeparator); ++ string_print_trimmed(&ctx.fields[FeedFieldMediaContent].str); + putchar('\n'); + + if (ferror(stdout)) /* check for errors but do not flush */ +@@ -718,7 +722,7 @@ xmlattr(XMLParser *p, const char *t, size_t tl, const char *n, size_t nl, + } + + if (ctx.feedtype == FeedTypeRSS) { +- if (ctx.tag.id == RSSTagEnclosure && ++ if ((ctx.tag.id == RSSTagEnclosure || ctx.tag.id == RSSTagMediaContent) && + isattr(n, nl, STRP("url"))) { + string_append(&tmpstr, v, vl); + } else if (ctx.tag.id == RSSTagGuid && + +- - - + +Running custom commands inside the sfeed_curses program +------------------------------------------------------- Running commands inside the sfeed_curses program can be useful for example to sync items or mark all items across all feeds as read. It can be comfortable to @@ -983,14 +1132,13 @@ or forkexec((char *[]) { "syncnews.sh", NULL }, 1); break; -The specified script should be in $PATH or an absolute path. +The specified script should be in $PATH or be an absolute path. Example of a `markallread.sh` shellscript to mark all URLs as read: #!/bin/sh # mark all items/URLs as read. - - tmp=$(mktemp) + tmp="$(mktemp)" || exit 1 (cat ~/.sfeed/urls; cut -f 3 ~/.sfeed/feeds/*) | \ awk '!x[$0]++' > "$tmp" && mv "$tmp" ~/.sfeed/urls && @@ -999,7 +1147,23 @@ Example of a `markallread.sh` shellscript to mark all URLs as read: Example of a `syncnews.sh` shellscript to update the feeds and reload them: #!/bin/sh - sfeed_update && pkill -SIGHUP sfeed_curses + sfeed_update + pkill -SIGHUP sfeed_curses + + +Running programs in a new session +--------------------------------- + +By default processes are spawned in the same session and process group as +sfeed_curses. When sfeed_curses is closed this can also close the spawned +process in some cases. + +When the setsid command-line program is available the following wrapper command +can be used to run the program in a new session, for a plumb program: + + setsid -f xdg-open "$@" + +Alternatively the code can be changed to call setsid() before execvp(). Open an URL directly in the same terminal @@ -1030,6 +1194,9 @@ testing sfeed_curses. Some of them might be fixed already upstream: middle-button, right-button is incorrect / reversed. - putty: the full reset attribute (ESC c, typically `rs1`) does not reset the window title. +- Mouse button encoding for extended buttons (like side-buttons) in some + terminals are unsupported or map to the same button: for example side-buttons 7 + and 8 map to the scroll buttons 4 and 5 in urxvt. License |