summaryrefslogtreecommitdiff
path: root/README
blob: 817a23568fffbb73d6968d67be799f9a65e68317 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
sfeed v0.9
----------

Simple RSS and Atom parser (and some format programs).


Dependencies
------------

- C compiler (C99).
- libc (recommended: C99 and POSIX >= 200809).


Optional dependencies
---------------------

- make (for Makefile).
- POSIX shell
  used by sfeed_update and sfeed_opml_export.
- curl binary: http://curl.haxx.se/
  used by sfeed_update, can be replaced with any tool like wget, fetch.
- iconv command-line utilities: http://www.gnu.org/software/libiconv/
  used by sfeed_update. If the text in your RSS/Atom feeds are already UTF-8
  encoded then you don't need this. For an alternative minimal iconv
  implementation: http://git.etalabs.net/cgit/noxcuse/tree/src/iconv.c
- mandoc for documentation: http://mdocml.bsd.lv/ .


Platforms tested
----------------

- Linux (glibc+gcc, musl-gcc, clang, tcc).
- OpenBSD
- Windows (cygwin gcc, mingw).


Files
-----

sfeed             - Binary (from sfeed.c); read XML RSS or Atom feed data from
                    stdin. Write feed data in tab-separated format to stdout.
sfeed_html        - Format feeds file (TSV) to HTML.
sfeed_frames      - Format feeds file (TSV) to HTML file(s) with frames.
sfeed_mbox        - Format feeds file (TSV) to mbox.
sfeed_opml_import - Generate a sfeedrc config file based on an opml file.
sfeed_opml_export - Generate an opml file based on a sfeedrc config file.
sfeed_plain       - Format feeds file (TSV) to a plain-text list.
sfeed_update      - Shellscript; update feeds and merge with old feeds in the
                    file $HOME/.sfeed/feeds by default.
sfeed_web         - Find urls to RSS/Atom feed from a webpage.
sfeed_xmlenc      - Detect character-set encoding from XML stream.
sfeedrc.example   - Example config file.
style.css         - Example stylesheet to use with sfeed_html and sfeed_frames.


Files read at runtime by sfeed_update
-------------------------------------

sfeedrc   - Config file. This file is evaluated as a shellscript in
            sfeed_update. You can for example override the fetchfeed() function
            to use wget, fetch or an other download program or you can override
            the merge() function to change the merge logic. The function
            feeds() is called to fetch the feeds. The function feed() can
            safely be executed concurrently as a background job in your sfeedrc
            config file to speed up updating.


Files written at runtime by sfeed_update
----------------------------------------

feeds     - TAB-separated format containing all feeds. The sfeed_update script
            merges new items with this file.
feeds.new - Temporary file used by sfeed_update to merge items.


TAB-separated format
--------------------

The items are saved in a TSV-like format.

The fields: title, id, author are not allowed to have newlines and TABs. All
whitespace is replaced by a single space character. Control characters are
removed.

The content field can contain newlines and is escaped. TABs, newlines and '\'
are escaped with '\', so: '\n', '\t', and '\\'. Other whitespace characters
except space are removed. Control characters are removed.

The timestamp field is converted to a UNIX timestamp. The timestamp is also
stored as formatted as a separate field.

The order and format of the fields are:

item UNIX timestamp      - string UNIX timestamp (UTC+0).
item formatted timestamp - string timestamp, YYYY-mm-dd HH:MM:SS (UTC[+-]HH:MM)|tz
item title               - string
item link                - string, absolute url, characters are uri encoded.
item content             - string
item contenttype         - string, "html" or "plain".
item id                  - string
item author              - string
feed type                - string, "rss" or "atom".

CAVEAT: if a timezone is not supported (non-RFC-822) the UNIX timestamp is
        interpreted as UTC+0.


Build and install
-----------------

Using make (respects $DESTDIR and $PREFIX):

make install


Usage and examples
------------------

Find RSS/Atom feed urls from a webpage:

	url="codemadness.org"; curl -L -s "$url" | sfeed_web "$url"

output:
	application/rss+xml http://codemadness.org/blog/rss.xml
	application/atom+xml http://codemadness.org/blog/atom.xml

- - -

To update feeds and format the feeds file (configfile argument is optional):

	sfeed_update "configfile"
	sfeed_html $HOME/.sfeed/feeds/* > $HOME/.sfeed/feeds.html
	sfeed_plain $HOME/.sfeed/feeds/* > $HOME/.sfeed/feeds.txt
	mkdir -p somedir && cd somedir && sfeed_frames $HOME/.sfeed/feeds/*

Example script to view feeds with dmenu, opens selected url in $BROWSER:

	#!/bin/sh
	url=$(sfeed_plain $HOME/.sfeed/feeds/* | dmenu -l 35 -i |
		sed 's@^.* \([a-zA-Z]*://\)\(.*\)$@\1\2@')
	[ ! "$url" = "" ] && $BROWSER "$url"


or to view in your browser:

	$BROWSER "$HOME/.sfeed/feeds.html"


or to view in your editor:

	$EDITOR "$HOME/.sfeed/feeds.txt"


Generate a sfeedrc config file from your exported list of feeds in opml
format:

	sfeed_opml_import < opmlfile.xml > $HOME/.sfeed/sfeedrc

- - -

Export an opml file of your feeds from a sfeedrc config file (configfile
argument is optional):

	sfeed_opml_export configfile > myfeeds.opml

- - -

Over time your feeds file might become quite big. You can archive items from a
specific date by doing for example:

File sfeed_archive.c:

	#include <sys/types.h>

	#include <stdio.h>
	#include <stdlib.h>
	#include <string.h>
	#include <time.h>

	#include "util.h"

	int
	main(int argc, char *argv[])
	{
		char *line = NULL, *p;
		time_t parsedtime, comparetime;
		struct tm tm;
		size_t size = 0;
		int r, c, y, m, d;

		if (argc != 2 || strlen(argv[1]) != 8 ||
		    sscanf(argv[1], "%4d%2d%2d", &y, &m, &d) != 3) {
			fputs("usage: sfeed_archive yyyymmdd\n", stderr);
			exit(1);
		}

		memset(&tm, 0, sizeof(tm));
		tm.tm_isdst = -1; /* don't use DST */
		tm.tm_year = y - 1900;
		tm.tm_mon = m - 1;
		tm.tm_mday = d;
		if ((comparetime = mktime(&tm)) == -1)
			usage();

		while ((getline(&line, &size, stdin)) > 0) {
			if (!(p = strchr(line, '\t')))
				continue;
			c = *p;
			*p = '\0'; /* temporary null-terminate */
			if ((r = strtotime(line, &parsedtime)) != -1 &&
			    parsedtime >= comparetime) {
				*p = c; /* restore */
				fputs(line, stdout);
			}
		}
		return 0;
	}

Now compile and run:

	$ cc util.c sfeed_archive.c -o sfeed_archive -std=c99
	$ ./sfeed_archive 20150101 < feeds > feeds.new
	$ mv feeds feeds.bak
	$ mv feeds.new feeds

- - -

Convert mbox to separate maildirs per feed and filter duplicate messages
using fdm: https://github.com/nicm/fdm .

For example using the following config (~/.sfeed/fdm.conf):

	set unmatched-mail keep

	account "sfeed" mbox "%[home]/.sfeed/mbox"
		$cachepath = "%[home]/.sfeed/mbox.cache"
		cache "${cachepath}"
		$feedsdir = "%[home]/feeds/"

		# check if in cache by message-id.
		match case "^Message-ID: (.*)" in headers
			action {
				tag "msgid" value "%1"
			}
			continue
			# if in cache, stop.
			match matched and in-cache "${cachepath}" key "%[msgid]"
			action {
				keep
			}

		# not in cache, process it and add to cache.
		match case "^X-Feedname: (.*)" in headers
		action {
			maildir "${feedsdir}%1"
			add-to-cache "${cachepath}" key "%[msgid]"
			keep
		}

Now run:

	$ sfeed_mbox ~/.sfeed/feeds/* > ~/.sfeed/mbox
	$ fdm -f ~/.sfeed/fdm.conf fetch

Now you can view feeds in mutt(1) for example.

- - -

Use procmail to format mbox to separate maildirs per feed.
Depends on: procmail, formail, sfeed_mbox.

procmail_maildirs.sh file:

	maildir="$HOME/feeds"
	feedsdir="$HOME/.sfeed/feeds"
	procmailconfig="$HOME/.sfeed/procmailrc"

	# message-id cache to prevent duplicates.
	mkdir -p "${maildir}/.cache"

	if ! test -r "${procmailconfig}"; then
		echo "Procmail configuration file \"${procmailconfig}\" does not exist or is not readable." >&2
		echo "See procmailrc.example for an example." >&2
		exit 1
	fi

	find "${feedsdir}" -type f -exec printf '%s\n' {} \; | while read -r d; do
		(name=$(basename "${d}")
		mkdir -p "${maildir}/${name}/cur"
		mkdir -p "${maildir}/${name}/new"
		mkdir -p "${maildir}/${name}/tmp"
		printf 'Mailbox %s\n' "${name}"
		sfeed_mbox "${d}" | formail -s procmail "${procmailconfig}") &
	done
	wait

Procmailrc file:

	# Example for use with sfeed_maildir.
	# The header X-Feedname is used to split into separate maildirs. It is assumes
	# this name is sane.

	MAILDIR="$HOME/feeds/"

	:0
	* ^X-Feedname: \/.*
	{
		FEED="$MATCH"

		:0 Wh: "msgid_$FEED.lock"
		| formail -D 1024000 ".cache/msgid_$FEED.cache"

		:0
		"$FEED"/
	}

Now run:

	$ procmail_maildirs.sh

Now you can view feeds in mutt(1) for example.


License
-------

MIT, see LICENSE file.


Author
------

Hiltjo Posthuma <hiltjo@codemadness.org>