sfeed
-----

RSS and Atom parser (and some format programs).


Build and install
-----------------

$ make
# make install


Usage
-----

Initial setup:

	mkdir -p "$HOME/.sfeed/feeds"
	cp sfeedrc.example "$HOME/.sfeed/sfeedrc"
	cp style.css "$HOME/.sfeed/style.css"

Edit the configuration file and append any RSS/Atom feeds:

	$EDITOR "$HOME/.sfeed/sfeedrc"

or you can import existing OPML subscriptions using sfeed_opml_import(1):

	sfeed_opml_import < file.opml > "$HOME/sfeed/sfeedrc"

Update feeds, this script merges the new items:

	sfeed_update

Format feeds:

Plain-text list:

	sfeed_plain $HOME/.sfeed/feeds/* > "$HOME/.sfeed/feeds.txt"

HTML view (no frames), copy style.css for a default style:

	sfeed_html $HOME/.sfeed/feeds/* > "$HOME/.sfeed/feeds.html"

HTML view with frames and content, copy style.css for a default style:

	mkdir -p "$HOME/.sfeed/frames"
	cd "$HOME/.sfeed/frames" && sfeed_frames $HOME/.sfeed/feeds/*

To automatically update your feeds periodically and format them in a view you
like you can make a wrapper script and add it as a cronjob.

Most protocols are supported because curl(1) is used by default, therefore
proxy settings from the environment (such as $http_proxy environment variable)
are used.

The sfeed(1) program itself is just a parser and therefore protocol-agnostic.
It can be used for HTTP, HTTPs, Gopher, SSH, etc.

See the section "Usage and examples" below and the man-pages for more
information how to use sfeed(1) and the additional tools.


Dependencies
------------

- C compiler (C99).
- libc (recommended: C99 and POSIX >= 200809).


Optional dependencies
---------------------

- make(1) (for Makefile).
- POSIX sh(1),
  used by sfeed_update(1) and sfeed_opml_export(1).
- curl(1) binary: http://curl.haxx.se/ ,
  used by sfeed_update(1), can be replaced with any tool like wget(1),
  OpenBSD ftp(1).
- iconv(1) command-line utilities,
  used by sfeed_update(1). If the text in your RSS/Atom feeds are already UTF-8
  encoded then you don't need this. For an alternative minimal iconv
  implementation: http://git.etalabs.net/cgit/noxcuse/tree/src/iconv.c
- mandoc for documentation: http://mdocml.bsd.lv/ .


OS tested
---------

- Linux (glibc+gcc, musl+gcc, clang).
- OpenBSD (gcc, clang).
- NetBSD
- FreeBSD
- Windows (cygwin gcc, mingw).
- HaikuOS


Architectures tested
--------------------
amd64, ARM, aarch64, i386, SPARC64.


Files
-----

sfeed             - Read XML RSS or Atom feed data from stdin. Write feed data
                    in TAB-separated format to stdout.
sfeed_frames      - Format feed data (TSV) to HTML file(s) with frames.
sfeed_gph         - Format feed data (TSV) to geomyidae .gph files.
sfeed_html        - Format feed data (TSV) to HTML.
sfeed_opml_export - Generate an OPML XML file from a sfeedrc config file.
sfeed_opml_import - Generate a sfeedrc config file from an OPML XML file.
sfeed_mbox        - Format feed data (TSV) to mbox.
sfeed_plain       - Format feed data (TSV) to a plain-text list.
sfeed_tail        - Format unseen feed data (TSV) to a plain-text list.
sfeed_twtxt       - Format feed data (TSV) to a twtxt feed.
sfeed_update      - Update feeds and merge with old feeds in the directory
                    $HOME/.sfeed/feeds by default.
sfeed_web         - Find urls to RSS/Atom feed from a webpage.
sfeed_xmlenc      - Detect character-set encoding from XML stream.
sfeedrc.example   - Example config file. Can be copied to $HOME/.sfeed/sfeedrc.
style.css         - Example stylesheet to use with sfeed_html(1) and
                    sfeed_frames(1).


Files read at runtime by sfeed_update(1)
----------------------------------------

sfeedrc - Config file. This file is evaluated as a shellscript in
          sfeed_update(1).

Atleast the following functions can be overridden per feed:

- fetchfeed: to use wget(1), OpenBSD ftp(1) or an other download program.
- merge: to change the merge logic.
- filter: to filter on fields.
- order: to change the sort order.

The function feeds() is called to fetch the feeds. The function feed() can
safely be executed concurrently as a background job in your sfeedrc(5) config
file to make updating faster.


Files written at runtime by sfeed_update(1)
-------------------------------------------

feedname     - TAB-separated format containing all items per feed. The
               sfeed_update(1) script merges new items with this file.
feedname.new - Temporary file used by sfeed_update(1) to merge items.


File format
-----------

man 5 sfeed
man 5 sfeedrc
man 1 sfeed


Usage and examples
------------------

Find RSS/Atom feed urls from a webpage:

	url="https://codemadness.org"; curl -L -s "$url" | sfeed_web "$url"

output example:

	https://codemadness.org/blog/rss.xml	application/rss+xml
	https://codemadness.org/blog/atom.xml	application/atom+xml

- - -

Make sure your sfeedrc config file exists, see sfeedrc.example. To update your
feeds (configfile argument is optional):

	sfeed_update "configfile"

Format the feeds files:

	# Plain-text list.
	sfeed_plain $HOME/.sfeed/feeds/* > $HOME/.sfeed/feeds.txt
	# HTML view (no frames), copy style.css for a default style.
	sfeed_html $HOME/.sfeed/feeds/* > $HOME/.sfeed/feeds.html
	# HTML view with frames and content, copy style.css for a default style.
	mkdir -p somedir && cd somedir && sfeed_frames $HOME/.sfeed/feeds/*

View in your browser:

	$BROWSER "$HOME/.sfeed/feeds.html"

View in your editor:

	$EDITOR "$HOME/.sfeed/feeds.txt"

- - -

Example script to view feeds with dmenu(1), opens selected url in $BROWSER:

	#!/bin/sh
	url=$(sfeed_plain $HOME/.sfeed/feeds/* | dmenu -l 35 -i |
		sed 's@^.* \([a-zA-Z]*://\)\(.*\)$@\1\2@')
	[ ! "$url" = "" ] && $BROWSER "$url"

- - -

Generate a sfeedrc config file from your exported list of feeds in OPML
format:

	sfeed_opml_import < opmlfile.xml > $HOME/.sfeed/sfeedrc

- - -

Export an OPML file of your feeds from a sfeedrc config file (configfile
argument is optional):

	sfeed_opml_export configfile > myfeeds.opml

- - -

# filter fields.
# filter(name)
filter() {
	case "$1" in
	"tweakers")
		LC_LOCALE=C awk -F '	' 'BEGIN { OFS = "	"; }
		# skip ads.
		$2 ~ /^ADV:/ {
			next;
		}
		# shorten link.
		{
			if (match($3, /^https:\/\/tweakers\.net\/[a-z]+\/[0-9]+\//)) {
				$3 = substr($3, RSTART, RLENGTH);
			}
			print $0;
		}';;
	"yt BSDNow")
		# filter only BSD Now from channel.
		LC_LOCALE=C awk -F '	' '$2 ~ / \| BSD Now/';;
	*)
		cat;;
	esac | \
		# replace youtube links with embed links.
		sed 's@www.youtube.com/watch?v=@www.youtube.com/embed/@g' | \

		LC_LOCALE=C awk -F '	' 'BEGIN { OFS = "	"; }
		{
			# shorten feedburner links.
			if (match($3, /^(http|https):\/\/[^/]+\/~r\/.*\/~3\/[^\/]+\//)) {
				$3 = substr($3, RSTART, RLENGTH);
			}

			# strip tracking parameters

			# urchin, facebook, piwik, webtrekk and generic.
			gsub(/\?(ad|campaign|pk|tm|wt)_([^&]+)/, "?", $3);
			gsub(/&(ad|campaign|pk|tm|wt)_([^&]+)/, "", $3);

			gsub(/\?&/, "?", $3);
			gsub(/[\?&]+$/, "", $3);

			print $0;
		}'
}

- - -

Over time your feeds file might become quite big. You can archive items from a
specific date by doing for example:

File sfeed_archive.c:

	#include <sys/types.h>

	#include <err.h>
	#include <stdio.h>
	#include <stdlib.h>
	#include <string.h>
	#include <time.h>

	#include "util.h"

	int
	main(int argc, char *argv[])
	{
		char *line = NULL, *p;
		time_t parsedtime, comparetime;
		struct tm tm;
		size_t size = 0;
		int r, c, y, m, d;

		if (argc != 2 || strlen(argv[1]) != 8 ||
		    sscanf(argv[1], "%4d%2d%2d", &y, &m, &d) != 3) {
			fputs("usage: sfeed_archive yyyymmdd\n", stderr);
			exit(1);
		}

		memset(&tm, 0, sizeof(tm));
		tm.tm_isdst = -1; /* don't use DST */
		tm.tm_year = y - 1900;
		tm.tm_mon = m - 1;
		tm.tm_mday = d;
		if ((comparetime = mktime(&tm)) == -1)
			err(1, "mktime");

		while ((getline(&line, &size, stdin)) > 0) {
			if (!(p = strchr(line, '\t')))
				continue;
			c = *p;
			*p = '\0'; /* temporary null-terminate */
			if ((r = strtotime(line, &parsedtime)) != -1 &&
			    parsedtime >= comparetime) {
				*p = c; /* restore */
				fputs(line, stdout);
			}
		}
		return 0;
	}

Now compile and run:

	$ cc -std=c99 -o sfeed_archive util.c sfeed_archive.c
	$ ./sfeed_archive 20150101 < feeds > feeds.new
	$ mv feeds feeds.bak
	$ mv feeds.new feeds

- - -

Convert mbox to separate maildirs per feed and filter duplicate messages using
fdm(1): https://github.com/nicm/fdm .

fdm config file (~/.sfeed/fdm.conf):

	set unmatched-mail keep

	account "sfeed" mbox "%[home]/.sfeed/mbox"
		$cachepath = "%[home]/.sfeed/mbox.cache"
		cache "${cachepath}"
		$feedsdir = "%[home]/feeds/"

		# check if in cache by message-id.
		match case "^Message-ID: (.*)" in headers
			action {
				tag "msgid" value "%1"
			}
			continue
			# if in cache, stop.
			match matched and in-cache "${cachepath}" key "%[msgid]"
			action {
				keep
			}

		# not in cache, process it and add to cache.
		match case "^X-Feedname: (.*)" in headers
		action {
			maildir "${feedsdir}%1"
			add-to-cache "${cachepath}" key "%[msgid]"
			keep
		}

Now run:

	$ sfeed_mbox ~/.sfeed/feeds/* > ~/.sfeed/mbox
	$ fdm -f ~/.sfeed/fdm.conf fetch

Now you can view feeds in mutt(1) for example.

- - -

Convert mbox to separate maildirs per feed and filter duplicate messages using
procmail(1).

procmail_maildirs.sh file:

	maildir="$HOME/feeds"
	feedsdir="$HOME/.sfeed/feeds"
	procmailconfig="$HOME/.sfeed/procmailrc"

	# message-id cache to prevent duplicates.
	mkdir -p "${maildir}/.cache"

	if ! test -r "${procmailconfig}"; then
		echo "Procmail configuration file \"${procmailconfig}\" does not exist or is not readable." >&2
		echo "See procmailrc.example for an example." >&2
		exit 1
	fi

	find "${feedsdir}" -type f -exec printf '%s\n' {} \; | while read -r d; do
		(name=$(basename "${d}")
		mkdir -p "${maildir}/${name}/cur"
		mkdir -p "${maildir}/${name}/new"
		mkdir -p "${maildir}/${name}/tmp"
		printf 'Mailbox %s\n' "${name}"
		sfeed_mbox "${d}" | formail -s procmail "${procmailconfig}") &
	done
	wait

Procmailrc(5) file:

	# Example for use with sfeed_mbox(1).
	# The header X-Feedname is used to split into separate maildirs. It is
	# assumed this name is sane.

	MAILDIR="$HOME/feeds/"

	:0
	* ^X-Feedname: \/.*
	{
		FEED="$MATCH"

		:0 Wh: "msgid_$FEED.lock"
		| formail -D 1024000 ".cache/msgid_$FEED.cache"

		:0
		"$FEED"/
	}

Now run:

	$ procmail_maildirs.sh

Now you can view feeds in mutt(1) for example.


License
-------

ISC, see LICENSE file.


Author
------

Hiltjo Posthuma <hiltjo@codemadness.org>