sfeed v0.9 ---------- Simple RSS and Atom parser (and some format programs). Build and install ----------------- $ make # make install Usage ----- Initial setup: mkdir -p "$HOME/.sfeed/feeds" cp sfeedrc.example "$HOME/.sfeed/sfeedrc" cp style.css "$HOME/.sfeed/" Edit the feeds: $EDITOR "$HOME/.sfeed/sfeedrc" or you you can use sfeed_opml_import to import your existing subscriptions from OPML format: sfeed_opml_import < file.opml > "$HOME/sfeed/sfeedrc" Update feeds, this scripts merges and sorts the items aswell: sfeed_update Format feeds: Plain-text list: sfeed_plain $HOME/.sfeed/feeds/* > "$HOME/.sfeed/feeds.txt" HTML view (no frames), copy style.css for a default style: sfeed_html $HOME/.sfeed/feeds/* > "$HOME/.sfeed/feeds.html" HTML view with frames and content, copy style.css for a default style: mkdir -p "$HOME/.sfeed/frames" cd "$HOME/.sfeed/frames" && sfeed_frames $HOME/.sfeed/feeds/* To automatically update your feeds periodically and format them in a view you like you can make a simple wrapper script and add it as a cronjob. See the section "Usage and examples" below and the man-pages for more information how to use sfeed and the additional tools. Dependencies ------------ - C compiler (C99). - libc (recommended: C99 and POSIX >= 200809). Optional dependencies --------------------- - make (for Makefile). - POSIX shell used by sfeed_update and sfeed_opml_export. - curl binary: http://curl.haxx.se/ used by sfeed_update, can be replaced with any tool like wget, fetch. - iconv command-line utilities: http://www.gnu.org/software/libiconv/ used by sfeed_update. If the text in your RSS/Atom feeds are already UTF-8 encoded then you don't need this. For an alternative minimal iconv implementation: http://git.etalabs.net/cgit/noxcuse/tree/src/iconv.c - mandoc for documentation: http://mdocml.bsd.lv/ . Platforms tested ---------------- - Linux (glibc+gcc, musl-gcc, clang). - OpenBSD - Windows (cygwin gcc, mingw). Files ----- sfeed - Binary (from sfeed.c); read XML RSS or Atom feed data from stdin. Write feed data in TAB-separated format to stdout. sfeed_frames - Format feed data (TSV) to HTML file(s) with frames. sfeed_html - Format feed data (TSV) to HTML. sfeed_opml_export - Generate an OPML XML file from a sfeedrc config file. sfeed_opml_import - Generate a sfeedrc config file from an OPML XML file. sfeed_mbox - Format feed data (TSV) to mbox. sfeed_plain - Format feed data (TSV) to a plain-text list. sfeed_update - Shellscript; update feeds and merge with old feeds in the directory $HOME/.sfeed/feeds by default. sfeed_web - Find urls to RSS/Atom feed from a webpage. sfeed_xmlenc - Detect character-set encoding from XML stream. sfeedrc.example - Example config file. Can be copied to $HOME/.sfeed/sfeedrc. style.css - Example stylesheet to use with sfeed_html and sfeed_frames. Files read at runtime by sfeed_update ------------------------------------- sfeedrc - Config file. This file is evaluated as a shellscript in sfeed_update. You can for example override the fetchfeed() function to use wget, fetch or an other download program or you can override the merge() function to change the merge logic. The function feeds() is called to fetch the feeds. The function feed() can safely be executed concurrently as a background job in your sfeedrc config file to speed up updating. Files written at runtime by sfeed_update ---------------------------------------- feeds - TAB-separated format containing all feeds. The sfeed_update script merges new items with this file. feeds.new - Temporary file used by sfeed_update to merge items. TAB-separated format fields --------------------------- The items are saved in a TSV-like format. The fields: title, id, author are not allowed to have newlines and TABs. All whitespace is replaced by a single space character. Control characters are removed. The content field can contain newlines and is escaped. TABs, newlines and '\' are escaped with '\', so: '\n', '\t', and '\\'. Other whitespace characters except space are removed. Control characters are removed. The timestamp field is converted to a UNIX timestamp. The timestamp is also added as a formatted text text field. The order and format of the fields are: item UNIX timestamp - UNIX timestamp (UTC+0), empty on parse failure. item formatted timestamp - Date and time in the format: YYYY-mm-dd HH:MM:SS (UTC[+-][HHMM])|tz. item title - Title text, HTML in titles is treated as plain-text. item link - Absolute url, unsafe characters are encoded. item content - Newlines and TABs are escaped. Control characters are removed. See the "TAB-separated format fields" text. item contenttype - "html" or "plain". item id - RSS item GUID or Atom id. item author - Item author. feed type - "rss" or "atom". CAVEATS: - if a timezone is not supported (non-RFC-822) the UNIX timestamp is interpreted as UTC+0. - HTML in titles is not supported on purpose. Usage and examples ------------------ Find RSS/Atom feed urls from a webpage: url="codemadness.org"; curl -L -s "$url" | sfeed_web "$url" output: http://codemadness.org/blog/rss.xml application/rss+xml http://codemadness.org/blog/atom.xml application/atom+xml - - - Make sure your sfeedrc config file exists, see sfeedrc.example. To update your feeds (configfile argument is optional): sfeed_update "configfile" Format the feeds files: # Plain-text list. sfeed_plain $HOME/.sfeed/feeds/* > $HOME/.sfeed/feeds.txt # HTML view (no frames), copy style.css for a default style. sfeed_html $HOME/.sfeed/feeds/* > $HOME/.sfeed/feeds.html # HTML view with frames and content, copy style.css for a default style. mkdir -p somedir && cd somedir && sfeed_frames $HOME/.sfeed/feeds/* View in your browser: $BROWSER "$HOME/.sfeed/feeds.html" View in your editor: $EDITOR "$HOME/.sfeed/feeds.txt" - - - Example script to view feeds with dmenu, opens selected url in $BROWSER: #!/bin/sh url=$(sfeed_plain $HOME/.sfeed/feeds/* | dmenu -l 35 -i | sed 's@^.* \([a-zA-Z]*://\)\(.*\)$@\1\2@') [ ! "$url" = "" ] && $BROWSER "$url" - - - Generate a sfeedrc config file from your exported list of feeds in OPML format: sfeed_opml_import < opmlfile.xml > $HOME/.sfeed/sfeedrc - - - Export an OPML file of your feeds from a sfeedrc config file (configfile argument is optional): sfeed_opml_export configfile > myfeeds.opml - - - Over time your feeds file might become quite big. You can archive items from a specific date by doing for example: File sfeed_archive.c: #include #include #include #include #include #include "util.h" int main(int argc, char *argv[]) { char *line = NULL, *p; time_t parsedtime, comparetime; struct tm tm; size_t size = 0; int r, c, y, m, d; if (argc != 2 || strlen(argv[1]) != 8 || sscanf(argv[1], "%4d%2d%2d", &y, &m, &d) != 3) { fputs("usage: sfeed_archive yyyymmdd\n", stderr); exit(1); } memset(&tm, 0, sizeof(tm)); tm.tm_isdst = -1; /* don't use DST */ tm.tm_year = y - 1900; tm.tm_mon = m - 1; tm.tm_mday = d; if ((comparetime = mktime(&tm)) == -1) usage(); while ((getline(&line, &size, stdin)) > 0) { if (!(p = strchr(line, '\t'))) continue; c = *p; *p = '\0'; /* temporary null-terminate */ if ((r = strtotime(line, &parsedtime)) != -1 && parsedtime >= comparetime) { *p = c; /* restore */ fputs(line, stdout); } } return 0; } Now compile and run: $ cc util.c sfeed_archive.c -o sfeed_archive -std=c99 $ ./sfeed_archive 20150101 < feeds > feeds.new $ mv feeds feeds.bak $ mv feeds.new feeds - - - Convert mbox to separate maildirs per feed and filter duplicate messages using fdm: https://github.com/nicm/fdm . fdm config file (~/.sfeed/fdm.conf): set unmatched-mail keep account "sfeed" mbox "%[home]/.sfeed/mbox" $cachepath = "%[home]/.sfeed/mbox.cache" cache "${cachepath}" $feedsdir = "%[home]/feeds/" # check if in cache by message-id. match case "^Message-ID: (.*)" in headers action { tag "msgid" value "%1" } continue # if in cache, stop. match matched and in-cache "${cachepath}" key "%[msgid]" action { keep } # not in cache, process it and add to cache. match case "^X-Feedname: (.*)" in headers action { maildir "${feedsdir}%1" add-to-cache "${cachepath}" key "%[msgid]" keep } Now run: $ sfeed_mbox ~/.sfeed/feeds/* > ~/.sfeed/mbox $ fdm -f ~/.sfeed/fdm.conf fetch Now you can view feeds in mutt(1) for example. - - - Convert mbox to separate maildirs per feed and filter duplicate messages using procmail. procmail_maildirs.sh file: maildir="$HOME/feeds" feedsdir="$HOME/.sfeed/feeds" procmailconfig="$HOME/.sfeed/procmailrc" # message-id cache to prevent duplicates. mkdir -p "${maildir}/.cache" if ! test -r "${procmailconfig}"; then echo "Procmail configuration file \"${procmailconfig}\" does not exist or is not readable." >&2 echo "See procmailrc.example for an example." >&2 exit 1 fi find "${feedsdir}" -type f -exec printf '%s\n' {} \; | while read -r d; do (name=$(basename "${d}") mkdir -p "${maildir}/${name}/cur" mkdir -p "${maildir}/${name}/new" mkdir -p "${maildir}/${name}/tmp" printf 'Mailbox %s\n' "${name}" sfeed_mbox "${d}" | formail -s procmail "${procmailconfig}") & done wait Procmailrc file: # Example for use with sfeed_maildir. # The header X-Feedname is used to split into separate maildirs. It is # assumed this name is sane. MAILDIR="$HOME/feeds/" :0 * ^X-Feedname: \/.* { FEED="$MATCH" :0 Wh: "msgid_$FEED.lock" | formail -D 1024000 ".cache/msgid_$FEED.cache" :0 "$FEED"/ } Now run: $ procmail_maildirs.sh Now you can view feeds in mutt(1) for example. License ------- MIT, see LICENSE file. Author ------ Hiltjo Posthuma