sfeed ----- RSS and Atom parser (and some format programs). Build and install ----------------- $ make # make install Usage ----- Initial setup: mkdir -p "$HOME/.sfeed/feeds" cp sfeedrc.example "$HOME/.sfeed/sfeedrc" Edit the configuration file and change any RSS/Atom feeds: $EDITOR "$HOME/.sfeed/sfeedrc" or you can import existing OPML subscriptions using sfeed_opml_import(1): sfeed_opml_import < file.opml > "$HOME/sfeed/sfeedrc" Update feeds, this script merges the new items: sfeed_update Format feeds: Plain-text list: sfeed_plain $HOME/.sfeed/feeds/* > "$HOME/.sfeed/feeds.txt" HTML view (no frames), copy style.css for a default style: cp style.css "$HOME/.sfeed/style.css" sfeed_html $HOME/.sfeed/feeds/* > "$HOME/.sfeed/feeds.html" HTML view with frames and content, copy style.css for a default style: mkdir -p "$HOME/.sfeed/frames" cd "$HOME/.sfeed/frames" && sfeed_frames $HOME/.sfeed/feeds/* To automatically update your feeds periodically and format them in a view you like you can make a wrapper script and add it as a cronjob. Most protocols are supported because curl(1) is used by default, therefore proxy settings from the environment (such as $http_proxy environment variable) are used. The sfeed(1) program itself is just a parser and therefore protocol-agnostic. It can be used for HTTP, HTTPs, Gopher, SSH, etc. See the section "Usage and examples" below and the man-pages for more information how to use sfeed(1) and the additional tools. Dependencies ------------ - C compiler (C99). - libc (recommended: C99 and POSIX >= 200809). Optional dependencies --------------------- - make(1) (for Makefile). - POSIX sh(1), used by sfeed_update(1) and sfeed_opml_export(1). - curl(1) binary: http://curl.haxx.se/ , used by sfeed_update(1), can be replaced with any tool like wget(1), OpenBSD ftp(1). - iconv(1) command-line utilities, used by sfeed_update(1). If the text in your RSS/Atom feeds are already UTF-8 encoded then you don't need this. For an alternative minimal iconv implementation: http://git.etalabs.net/cgit/noxcuse/tree/src/iconv.c - mandoc for documentation: http://mdocml.bsd.lv/ . OS tested --------- - Linux (glibc+gcc, musl+gcc, clang). - OpenBSD (gcc, clang). - NetBSD - FreeBSD - Windows (cygwin gcc, mingw). - HaikuOS Architectures tested -------------------- amd64, ARM, aarch64, i386, SPARC64. Files ----- sfeed - Read XML RSS or Atom feed data from stdin. Write feed data in TAB-separated format to stdout. sfeed_frames - Format feed data (TSV) to HTML file(s) with frames. sfeed_gph - Format feed data (TSV) to geomyidae .gph files. sfeed_html - Format feed data (TSV) to HTML. sfeed_opml_export - Generate an OPML XML file from a sfeedrc config file. sfeed_opml_import - Generate a sfeedrc config file from an OPML XML file. sfeed_mbox - Format feed data (TSV) to mbox. sfeed_plain - Format feed data (TSV) to a plain-text list. sfeed_tail - Format unseen feed data (TSV) to a plain-text list. sfeed_twtxt - Format feed data (TSV) to a twtxt feed. sfeed_update - Update feeds and merge with old feeds in the directory $HOME/.sfeed/feeds by default. sfeed_web - Find urls to RSS/Atom feed from a webpage. sfeed_xmlenc - Detect character-set encoding from XML stream. sfeedrc.example - Example config file. Can be copied to $HOME/.sfeed/sfeedrc. style.css - Example stylesheet to use with sfeed_html(1) and sfeed_frames(1). Files read at runtime by sfeed_update(1) ---------------------------------------- sfeedrc - Config file. This file is evaluated as a shellscript in sfeed_update(1). Atleast the following functions can be overridden per feed: - fetchfeed: to use wget(1), OpenBSD ftp(1) or an other download program. - merge: to change the merge logic. - filter: to filter on fields. - order: to change the sort order. The function feeds() is called to fetch the feeds. The function feed() can safely be executed concurrently as a background job in your sfeedrc(5) config file to make updating faster. Files written at runtime by sfeed_update(1) ------------------------------------------- feedname - TAB-separated format containing all items per feed. The sfeed_update(1) script merges new items with this file. feedname.new - Temporary file used by sfeed_update(1) to merge items. File format ----------- man 5 sfeed man 5 sfeedrc man 1 sfeed Usage and examples ------------------ Find RSS/Atom feed urls from a webpage: url="https://codemadness.org"; curl -L -s "$url" | sfeed_web "$url" output example: https://codemadness.org/blog/rss.xml application/rss+xml https://codemadness.org/blog/atom.xml application/atom+xml - - - Make sure your sfeedrc config file exists, see sfeedrc.example. To update your feeds (configfile argument is optional): sfeed_update "configfile" Format the feeds files: # Plain-text list. sfeed_plain $HOME/.sfeed/feeds/* > $HOME/.sfeed/feeds.txt # HTML view (no frames), copy style.css for a default style. sfeed_html $HOME/.sfeed/feeds/* > $HOME/.sfeed/feeds.html # HTML view with frames and content, copy style.css for a default style. mkdir -p somedir && cd somedir && sfeed_frames $HOME/.sfeed/feeds/* View in your browser: $BROWSER "$HOME/.sfeed/feeds.html" View in your editor: $EDITOR "$HOME/.sfeed/feeds.txt" - - - Example script to view feeds with dmenu(1), opens selected url in $BROWSER: #!/bin/sh url=$(sfeed_plain $HOME/.sfeed/feeds/* | dmenu -l 35 -i | sed 's@^.* \([a-zA-Z]*://\)\(.*\)$@\1\2@') [ ! "$url" = "" ] && $BROWSER "$url" - - - Generate a sfeedrc config file from your exported list of feeds in OPML format: sfeed_opml_import < opmlfile.xml > $HOME/.sfeed/sfeedrc - - - Export an OPML file of your feeds from a sfeedrc config file (configfile argument is optional): sfeed_opml_export configfile > myfeeds.opml - - - # filter fields. # filter(name) filter() { case "$1" in "tweakers") LC_LOCALE=C awk -F ' ' 'BEGIN { OFS = " "; } # skip ads. $2 ~ /^ADV:/ { next; } # shorten link. { if (match($3, /^https:\/\/tweakers\.net\/[a-z]+\/[0-9]+\//)) { $3 = substr($3, RSTART, RLENGTH); } print $0; }';; "yt BSDNow") # filter only BSD Now from channel. LC_LOCALE=C awk -F ' ' '$2 ~ / \| BSD Now/';; *) cat;; esac | \ # replace youtube links with embed links. sed 's@www.youtube.com/watch?v=@www.youtube.com/embed/@g' | \ LC_LOCALE=C awk -F ' ' 'BEGIN { OFS = " "; } { # shorten feedburner links. if (match($3, /^(http|https):\/\/[^/]+\/~r\/.*\/~3\/[^\/]+\//)) { $3 = substr($3, RSTART, RLENGTH); } # strip tracking parameters # urchin, facebook, piwik, webtrekk and generic. gsub(/\?(ad|campaign|pk|tm|wt)_([^&]+)/, "?", $3); gsub(/&(ad|campaign|pk|tm|wt)_([^&]+)/, "", $3); gsub(/\?&/, "?", $3); gsub(/[\?&]+$/, "", $3); print $0; }' } - - - Over time your feeds file might become quite big. You can archive items from a specific date by doing for example: File sfeed_archive.c: #include #include #include #include #include #include #include "util.h" int main(int argc, char *argv[]) { char *line = NULL, *p; time_t parsedtime, comparetime; struct tm tm; size_t size = 0; int r, c, y, m, d; if (argc != 2 || strlen(argv[1]) != 8 || sscanf(argv[1], "%4d%2d%2d", &y, &m, &d) != 3) { fputs("usage: sfeed_archive yyyymmdd\n", stderr); exit(1); } memset(&tm, 0, sizeof(tm)); tm.tm_isdst = -1; /* don't use DST */ tm.tm_year = y - 1900; tm.tm_mon = m - 1; tm.tm_mday = d; if ((comparetime = mktime(&tm)) == -1) err(1, "mktime"); while ((getline(&line, &size, stdin)) > 0) { if (!(p = strchr(line, '\t'))) continue; c = *p; *p = '\0'; /* temporary null-terminate */ if ((r = strtotime(line, &parsedtime)) != -1 && parsedtime >= comparetime) { *p = c; /* restore */ fputs(line, stdout); } } return 0; } Now compile and run: $ cc -std=c99 -o sfeed_archive util.c sfeed_archive.c $ ./sfeed_archive 20150101 < feeds > feeds.new $ mv feeds feeds.bak $ mv feeds.new feeds - - - Convert mbox to separate maildirs per feed and filter duplicate messages using fdm(1): https://github.com/nicm/fdm . fdm config file (~/.sfeed/fdm.conf): set unmatched-mail keep account "sfeed" mbox "%[home]/.sfeed/mbox" $cachepath = "%[home]/.sfeed/mbox.cache" cache "${cachepath}" $feedsdir = "%[home]/feeds/" # check if in cache by message-id. match case "^Message-ID: (.*)" in headers action { tag "msgid" value "%1" } continue # if in cache, stop. match matched and in-cache "${cachepath}" key "%[msgid]" action { keep } # not in cache, process it and add to cache. match case "^X-Feedname: (.*)" in headers action { maildir "${feedsdir}%1" add-to-cache "${cachepath}" key "%[msgid]" keep } Now run: $ sfeed_mbox ~/.sfeed/feeds/* > ~/.sfeed/mbox $ fdm -f ~/.sfeed/fdm.conf fetch Now you can view feeds in mutt(1) for example. - - - Convert mbox to separate maildirs per feed and filter duplicate messages using procmail(1). procmail_maildirs.sh file: maildir="$HOME/feeds" feedsdir="$HOME/.sfeed/feeds" procmailconfig="$HOME/.sfeed/procmailrc" # message-id cache to prevent duplicates. mkdir -p "${maildir}/.cache" if ! test -r "${procmailconfig}"; then echo "Procmail configuration file \"${procmailconfig}\" does not exist or is not readable." >&2 echo "See procmailrc.example for an example." >&2 exit 1 fi find "${feedsdir}" -type f -exec printf '%s\n' {} \; | while read -r d; do (name=$(basename "${d}") mkdir -p "${maildir}/${name}/cur" mkdir -p "${maildir}/${name}/new" mkdir -p "${maildir}/${name}/tmp" printf 'Mailbox %s\n' "${name}" sfeed_mbox "${d}" | formail -s procmail "${procmailconfig}") & done wait Procmailrc(5) file: # Example for use with sfeed_mbox(1). # The header X-Feedname is used to split into separate maildirs. It is # assumed this name is sane. MAILDIR="$HOME/feeds/" :0 * ^X-Feedname: \/.* { FEED="$MATCH" :0 Wh: "msgid_$FEED.lock" | formail -D 1024000 ".cache/msgid_$FEED.cache" :0 "$FEED"/ } Now run: $ procmail_maildirs.sh Now you can view feeds in mutt(1) for example. License ------- ISC, see LICENSE file. Author ------ Hiltjo Posthuma