sfeed v0.9 ---------- Simple RSS and Atom parser (and some format programs). Dependencies ------------ - C compiler (C99). - libc (recommended: C99 and POSIX >= 200809). Optional dependencies --------------------- - make (for Makefile). - POSIX shell used by sfeed_update and sfeed_opml_export. - curl binary: http://curl.haxx.se/ used by sfeed_update, can be replaced with any tool like wget, fetch. - iconv command-line utilities: http://www.gnu.org/software/libiconv/ used by sfeed_update. If the text in your RSS/Atom feeds are already UTF-8 encoded then you don't need this. For an alternative minimal iconv implementation: http://git.etalabs.net/cgit/noxcuse/tree/src/iconv.c - mandoc for documentation: http://mdocml.bsd.lv/ . Platforms tested ---------------- - Linux (glibc+gcc, musl-gcc, clang, tcc). - OpenBSD - Windows (cygwin gcc, mingw). Files ----- sfeed - Binary (from sfeed.c); read XML RSS or Atom feed data from stdin. Write feed data in tab-separated format to stdout. sfeed_html - Format feeds file (TSV) to HTML. sfeed_frames - Format feeds file (TSV) to HTML file(s) with frames. sfeed_mbox - Format feeds file (TSV) to mbox. sfeed_opml_import - Generate a sfeedrc config file based on an opml file. sfeed_opml_export - Generate an opml file based on a sfeedrc config file. sfeed_plain - Format feeds file (TSV) to a plain-text list. sfeed_update - Shellscript; update feeds and merge with old feeds in the file $HOME/.sfeed/feeds by default. sfeed_web - Find urls to RSS/Atom feed from a webpage. sfeed_xmlenc - Detect character-set encoding from XML stream. sfeedrc.example - Example config file. style.css - Example stylesheet to use with sfeed_html and sfeed_frames. Files read at runtime by sfeed_update ------------------------------------- sfeedrc - Config file. This file is evaluated as a shellscript in sfeed_update. You can for example override the fetchfeed() function to use wget, fetch or an other download program or you can override the merge() function to change the merge logic. The function feeds() is called to fetch the feeds. The function feed() can safely be executed as a parallel job in your sfeedrc config file to speed up updating. Files written at runtime by sfeed_update ---------------------------------------- feeds - TAB-separated format containing all feeds. The sfeed_update script merges new items with this file. feeds.new - Temporary file used by sfeed_update to merge items. TAB-separated format -------------------- The items are saved in a TSV-like format. The fields: title, id, author are not allowed to have newlines and TABs. All whitespace is replaced by a single space character. Control characters are removed. The content field can contain newlines and is escaped. TABs, newlines and '\' are escaped with '\', so: '\n', '\t', and '\\'. Other whitespace characters except space are removed. Control characters are removed. The timestamp field is converted to a UNIX timestamp. The timestamp is also stored as formatted as a separate field. The order and format of the fields are: item UNIX timestamp - string UNIX timestamp (UTC+0). item formatted timestamp - string timestamp, YYYY-mm-dd HH:MM:SS (UTC[+-]HH:MM)|tz item title - string item link - string, absolute url, characters are uri encoded. item content - string item contenttype - string, "html" or "plain". item id - string item author - string feed type - string, "rss" or "atom". CAVEAT: if a timezone is not supported (non-RFC-822) the UNIX timestamp is interpreted as UTC+0. Build and install ----------------- Using make (respects $DESTDIR and $PREFIX): make install Usage and examples ------------------ Find RSS/Atom feed urls from a webpage: url="codemadness.org"; curl -L -s "$url" | sfeed_web "$url" output: application/rss+xml http://codemadness.org/blog/rss.xml application/atom+xml http://codemadness.org/blog/atom.xml - - - To update feeds and format the feeds file (configfile argument is optional): sfeed_update "configfile" sfeed_html $HOME/.sfeed/feeds/* > $HOME/.sfeed/feeds.html sfeed_plain $HOME/.sfeed/feeds/* > $HOME/.sfeed/feeds.txt mkdir -p somedir && cd somedir && sfeed_frames $HOME/.sfeed/feeds/* Example script to view feeds with dmenu, opens selected url in $BROWSER: #!/bin/sh url=$(sfeed_plain $HOME/.sfeed/feeds/* | dmenu -l 35 -i | sed 's@^.* \([a-zA-Z]*://\)\(.*\)$@\1\2@') [ ! "$url" = "" ] && $BROWSER "$url" or to view in your browser: $BROWSER "$HOME/.sfeed/feeds.html" or to view in your editor: $EDITOR "$HOME/.sfeed/feeds.txt" Generate a sfeedrc config file from your exported list of feeds in opml format: sfeed_opml_import < opmlfile.xml > $HOME/.sfeed/sfeedrc - - - Export an opml file of your feeds from a sfeedrc config file (configfile argument is optional): sfeed_opml_export configfile > myfeeds.opml - - - Over time your feeds file might become quite big. You can archive items from a specific date by doing for example: (make sure to change mktime("YYYY mm dd HH mm ss")): #!/bin/sh set -x -e gawk -F '\t' 'BEGIN { time = mktime("2012 01 01 12 34 56"); } { if(int($1) >= int(time)) { print $0; } }' < feeds > feeds.clean mv feeds feeds.old mv feeds.clean feeds - - - Convert mbox to separate maildirs per feed and filter duplicate messages using fdm: https://github.com/nicm/fdm . For example using the following config (~/.sfeed/fdm.conf): set unmatched-mail keep account "sfeed" mbox "%[home]/.sfeed/mbox" $cachepath = "%[home]/.sfeed/mbox.cache" cache "${cachepath}" $feedsdir = "%[home]/feeds/" # check if in cache by message-id. match case "^Message-ID: (.*)" in headers action { tag "msgid" value "%1" } continue # if in cache, stop. match matched and in-cache "${cachepath}" key "%[msgid]" action { keep } # not in cache, process it and add to cache. match case "^X-Feedname: (.*)" in headers action { maildir "${feedsdir}%1" add-to-cache "${cachepath}" key "%[msgid]" keep } Now run: $ sfeed_mbox ~/.sfeed/feeds/* > ~/.sfeed/mbox $ fdm -f ~/.sfeed/fdm.conf fetch Now you can view feeds in mutt(1) for example. - - - Use procmail to format mbox to separate maildirs per feed. Depends on: procmail, formail, sfeed_mbox. procmail_maildirs.sh file: maildir="$HOME/feeds" feedsdir="$HOME/.sfeed/feeds" procmailconfig="$HOME/.sfeed/procmailrc" # message-id cache to prevent duplicates. mkdir -p "${maildir}/.cache" if ! test -r "${procmailconfig}"; then echo "Procmail configuration file \"${procmailconfig}\" does not exist or is not readable." >&2 echo "See procmailrc.example for an example." >&2 exit 1 fi find "${feedsdir}" -type f -exec printf '%s\n' {} \; | while read -r d; do (name=$(basename "${d}") mkdir -p "${maildir}/${name}/cur" mkdir -p "${maildir}/${name}/new" mkdir -p "${maildir}/${name}/tmp" printf 'Mailbox %s\n' "${name}" sfeed_mbox "${d}" | formail -s procmail "${procmailconfig}") & done wait Procmailrc file: # Example for use with sfeed_maildir. # The header X-Feedname is used to split into separate maildirs. It is assumes # this name is sane. MAILDIR="$HOME/feeds/" :0 * ^X-Feedname: \/.* { FEED="$MATCH" :0 Wh: "msgid_$FEED.lock" | formail -D 1024000 ".cache/msgid_$FEED.cache" :0 "$FEED"/ } Now run: $ procmail_maildirs.sh Now you can view feeds in mutt(1) for example. License ------- MIT, see LICENSE file. Author ------ Hiltjo Posthuma