summaryrefslogtreecommitdiff
path: root/README
blob: 8a1b900d9a49b1468aded5d6a572da6348c7921b (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
sfeed
-----

RSS and Atom parser (and some format programs).


Build and install
-----------------

$ make
# make install


Usage
-----

Initial setup:

	mkdir -p "$HOME/.sfeed/feeds"
	cp sfeedrc.example "$HOME/.sfeed/sfeedrc"
	cp style.css "$HOME/.sfeed/style.css"

Edit the configuration file containing your feed to update:

	$EDITOR "$HOME/.sfeed/sfeedrc"

or you you can use sfeed_opml_import(1) to import your existing
subscriptions from the OPML format:

	sfeed_opml_import < file.opml > "$HOME/sfeed/sfeedrc"

Update feeds, this script merges and sorts the items aswell:

	sfeed_update

Format feeds:

Plain-text list:

	sfeed_plain $HOME/.sfeed/feeds/* > "$HOME/.sfeed/feeds.txt"

HTML view (no frames), copy style.css for a default style:

	sfeed_html $HOME/.sfeed/feeds/* > "$HOME/.sfeed/feeds.html"

HTML view with frames and content, copy style.css for a default style:

	mkdir -p "$HOME/.sfeed/frames"
	cd "$HOME/.sfeed/frames" && sfeed_frames $HOME/.sfeed/feeds/*

To automatically update your feeds periodically and format them in a view you
like you can make a wrapper script and add it as a cronjob.

Most protocols are supported because curl(1) is used by default, therefore
proxy settings from the environment (such as $http_proxy environment variable)
are used.

See the section "Usage and examples" below and the man-pages for more
information how to use sfeed(1) and the additional tools.


Dependencies
------------

- C compiler (C99).
- libc (recommended: C99 and POSIX >= 200809).


Optional dependencies
---------------------

- make(1) (for Makefile).
- POSIX sh(1),
  used by sfeed_update(1) and sfeed_opml_export(1).
- curl(1) binary: http://curl.haxx.se/ ,
  used by sfeed_update(1), can be replaced with any tool like wget(1),
  OpenBSD ftp(1).
- iconv(1) command-line utilities,
  used by sfeed_update(1). If the text in your RSS/Atom feeds are already UTF-8
  encoded then you don't need this. For an alternative minimal iconv
  implementation: http://git.etalabs.net/cgit/noxcuse/tree/src/iconv.c
- mandoc for documentation: http://mdocml.bsd.lv/ .


Platforms tested
----------------

- Linux (glibc+gcc, musl-gcc, clang).
- NetBSD
- OpenBSD
- Windows (cygwin gcc, mingw).


Files
-----

sfeed             - Read XML RSS or Atom feed data from stdin. Write feed data
                    in TAB-separated format to stdout.
sfeed_frames      - Format feed data (TSV) to HTML file(s) with frames.
sfeed_html        - Format feed data (TSV) to HTML.
sfeed_opml_export - Generate an OPML XML file from a sfeedrc config file.
sfeed_opml_import - Generate a sfeedrc config file from an OPML XML file.
sfeed_mbox        - Format feed data (TSV) to mbox.
sfeed_plain       - Format feed data (TSV) to a plain-text list.
sfeed_tail        - Format unseen feed data (TSV) to a plain-text list.
sfeed_update      - Update feeds and merge with old feeds in the directory
                    $HOME/.sfeed/feeds by default.
sfeed_web         - Find urls to RSS/Atom feed from a webpage.
sfeed_xmlenc      - Detect character-set encoding from XML stream.
sfeedrc.example   - Example config file. Can be copied to $HOME/.sfeed/sfeedrc.
style.css         - Example stylesheet to use with sfeed_html(1) and
                    sfeed_frames(1).


Files read at runtime by sfeed_update(1)
----------------------------------------

sfeedrc - Config file. This file is evaluated as a shellscript in
          sfeed_update(1). You can for example override the fetchfeed()
          function to use wget(1), OpenBSD ftp(1) an other download program or
          you can override the merge() function to change the merge logic. The
          function feeds() is called to fetch the feeds. The function feed()
          can safely be executed concurrently as a background job in your
          sfeedrc(5) config file to make updating faster.


Files written at runtime by sfeed_update(1)
-------------------------------------------

feedname     - TAB-separated format containing all items per feed. The
               sfeed_update(1) script merges new items with this file.
feedname.new - Temporary file used by sfeed_update(1) to merge items.


TAB-separated format fields
---------------------------

The items are saved in a TSV-like format.

The fields: title, id, author are not allowed to have newlines and TABs, all
whitespace characters are replaced by a space character. Control characters are
removed.

The content field can contain newlines and TABS and are escaped. TABs, newlines
and '\' are escaped with '\', so it becomes: '\t', '\n' and '\\'. Other
whitespace characters except space are removed. Control characters are removed.

The order and format of the fields are:

item UNIX timestamp      - UNIX timestamp (UTC+0), empty on parse failure.
item title               - Title text, HTML in titles is treated as
                           plain-text.
item link                - Absolute url, unsafe characters are encoded.
item content             - Newlines and TABs are escaped. Control characters
                           are removed. See the "TAB-separated format fields"
                           text.
item contenttype         - "html" or "plain".
item id                  - RSS item GUID or Atom id.
item author              - Item author.

CAVEATS:
- if a timezone is not supported (non-RFC-822) the UNIX timestamp is
  interpreted as UTC+0.
- HTML in titles is not supported on purpose.


Usage and examples
------------------

Find RSS/Atom feed urls from a webpage:

	url="codemadness.org"; curl -L -s "$url" | sfeed_web "$url"

output:

	http://codemadness.org/blog/rss.xml	application/rss+xml
	http://codemadness.org/blog/atom.xml	application/atom+xml

- - -

Make sure your sfeedrc config file exists, see sfeedrc.example. To update
your feeds (configfile argument is optional):

	sfeed_update "configfile"

Format the feeds files:

	# Plain-text list.
	sfeed_plain $HOME/.sfeed/feeds/* > $HOME/.sfeed/feeds.txt
	# HTML view (no frames), copy style.css for a default style.
	sfeed_html $HOME/.sfeed/feeds/* > $HOME/.sfeed/feeds.html
	# HTML view with frames and content, copy style.css for a default style.
	mkdir -p somedir && cd somedir && sfeed_frames $HOME/.sfeed/feeds/*

View in your browser:

	$BROWSER "$HOME/.sfeed/feeds.html"

View in your editor:

	$EDITOR "$HOME/.sfeed/feeds.txt"

- - -

Example script to view feeds with dmenu(1), opens selected url in $BROWSER:

	#!/bin/sh
	url=$(sfeed_plain $HOME/.sfeed/feeds/* | dmenu -l 35 -i |
		sed 's@^.* \([a-zA-Z]*://\)\(.*\)$@\1\2@')
	[ ! "$url" = "" ] && $BROWSER "$url"

- - -

Generate a sfeedrc config file from your exported list of feeds in OPML
format:

	sfeed_opml_import < opmlfile.xml > $HOME/.sfeed/sfeedrc

- - -

Export an OPML file of your feeds from a sfeedrc config file (configfile
argument is optional):

	sfeed_opml_export configfile > myfeeds.opml

- - -

Over time your feeds file might become quite big. You can archive items from a
specific date by doing for example:

File sfeed_archive.c:

	#include <sys/types.h>

	#include <err.h>
	#include <stdio.h>
	#include <stdlib.h>
	#include <string.h>
	#include <time.h>

	#include "util.h"

	int
	main(int argc, char *argv[])
	{
		char *line = NULL, *p;
		time_t parsedtime, comparetime;
		struct tm tm;
		size_t size = 0;
		int r, c, y, m, d;

		if (argc != 2 || strlen(argv[1]) != 8 ||
		    sscanf(argv[1], "%4d%2d%2d", &y, &m, &d) != 3) {
			fputs("usage: sfeed_archive yyyymmdd\n", stderr);
			exit(1);
		}

		memset(&tm, 0, sizeof(tm));
		tm.tm_isdst = -1; /* don't use DST */
		tm.tm_year = y - 1900;
		tm.tm_mon = m - 1;
		tm.tm_mday = d;
		if ((comparetime = mktime(&tm)) == -1)
			err(1, "mktime");

		while ((getline(&line, &size, stdin)) > 0) {
			if (!(p = strchr(line, '\t')))
				continue;
			c = *p;
			*p = '\0'; /* temporary null-terminate */
			if ((r = strtotime(line, &parsedtime)) != -1 &&
			    parsedtime >= comparetime) {
				*p = c; /* restore */
				fputs(line, stdout);
			}
		}
		return 0;
	}

Now compile and run:

	$ cc util.c sfeed_archive.c -o sfeed_archive -std=c99
	$ ./sfeed_archive 20150101 < feeds > feeds.new
	$ mv feeds feeds.bak
	$ mv feeds.new feeds

- - -

Convert mbox to separate maildirs per feed and filter duplicate messages
using fdm(1): https://github.com/nicm/fdm .

fdm config file (~/.sfeed/fdm.conf):

	set unmatched-mail keep

	account "sfeed" mbox "%[home]/.sfeed/mbox"
		$cachepath = "%[home]/.sfeed/mbox.cache"
		cache "${cachepath}"
		$feedsdir = "%[home]/feeds/"

		# check if in cache by message-id.
		match case "^Message-ID: (.*)" in headers
			action {
				tag "msgid" value "%1"
			}
			continue
			# if in cache, stop.
			match matched and in-cache "${cachepath}" key "%[msgid]"
			action {
				keep
			}

		# not in cache, process it and add to cache.
		match case "^X-Feedname: (.*)" in headers
		action {
			maildir "${feedsdir}%1"
			add-to-cache "${cachepath}" key "%[msgid]"
			keep
		}

Now run:

	$ sfeed_mbox ~/.sfeed/feeds/* > ~/.sfeed/mbox
	$ fdm -f ~/.sfeed/fdm.conf fetch

Now you can view feeds in mutt(1) for example.

- - -

Convert mbox to separate maildirs per feed and filter duplicate messages
using procmail(1).

procmail_maildirs.sh file:

	maildir="$HOME/feeds"
	feedsdir="$HOME/.sfeed/feeds"
	procmailconfig="$HOME/.sfeed/procmailrc"

	# message-id cache to prevent duplicates.
	mkdir -p "${maildir}/.cache"

	if ! test -r "${procmailconfig}"; then
		echo "Procmail configuration file \"${procmailconfig}\" does not exist or is not readable." >&2
		echo "See procmailrc.example for an example." >&2
		exit 1
	fi

	find "${feedsdir}" -type f -exec printf '%s\n' {} \; | while read -r d; do
		(name=$(basename "${d}")
		mkdir -p "${maildir}/${name}/cur"
		mkdir -p "${maildir}/${name}/new"
		mkdir -p "${maildir}/${name}/tmp"
		printf 'Mailbox %s\n' "${name}"
		sfeed_mbox "${d}" | formail -s procmail "${procmailconfig}") &
	done
	wait

Procmailrc(5) file:

	# Example for use with sfeed_mbox(1).
	# The header X-Feedname is used to split into separate maildirs. It is
	# assumed this name is sane.

	MAILDIR="$HOME/feeds/"

	:0
	* ^X-Feedname: \/.*
	{
		FEED="$MATCH"

		:0 Wh: "msgid_$FEED.lock"
		| formail -D 1024000 ".cache/msgid_$FEED.cache"

		:0
		"$FEED"/
	}

Now run:

	$ procmail_maildirs.sh

Now you can view feeds in mutt(1) for example.


License
-------

ISC, see LICENSE file.


Author
------

Hiltjo Posthuma <hiltjo@codemadness.org>