summaryrefslogtreecommitdiff
path: root/README
blob: 40037ab5e34fccc1c8fe3dc22e41bed5035bb674 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
sfeed
-----

RSS and Atom parser (and some format programs).

It converts RSS or Atom feeds from XML to a TAB-separated file. There are
formatting programs included to convert this TAB-separated format to various
other formats. There are also some programs and scripts included to import and
export OPML and to fetch, filter, merge and order feed items.


Build and install
-----------------

$ make
# make install


To build sfeed without sfeed_curses set SFEED_CURSES to an empty string:

$ make SFEED_CURSES=""
# make SFEED_CURSES="" install


To change the theme for sfeed_curses you can set SFEED_THEME.  See the themes/
directory for the theme names.

$ make SFEED_THEME="templeos"
# make SFEED_THEME="templeos" install


Usage
-----

Initial setup:

	mkdir -p "$HOME/.sfeed/feeds"
	cp sfeedrc.example "$HOME/.sfeed/sfeedrc"

Edit the sfeedrc(5) configuration file and change any RSS/Atom feeds. This file
is included and evaluated as a shellscript for sfeed_update, so it's functions
and behaviour can be overridden:

	$EDITOR "$HOME/.sfeed/sfeedrc"

or you can import existing OPML subscriptions using sfeed_opml_import(1):

	sfeed_opml_import < file.opml > "$HOME/.sfeed/sfeedrc"

an example to export from an other RSS/Atom reader called newsboat and import
for sfeed_update:

	newsboat -e | sfeed_opml_import > "$HOME/.sfeed/sfeedrc"

an example to export from an other RSS/Atom reader called rss2email (3.x+) and
import for sfeed_update:

	r2e opmlexport | sfeed_opml_import > "$HOME/.sfeed/sfeedrc"

Update feeds, this script merges the new items, see sfeed_update(1) for more
information what it can do:

	sfeed_update

Format feeds:

Plain-text list:

	sfeed_plain $HOME/.sfeed/feeds/* > "$HOME/.sfeed/feeds.txt"

HTML view (no frames), copy style.css for a default style:

	cp style.css "$HOME/.sfeed/style.css"
	sfeed_html $HOME/.sfeed/feeds/* > "$HOME/.sfeed/feeds.html"

HTML view with the menu as frames, copy style.css for a default style:

	mkdir -p "$HOME/.sfeed/frames"
	cd "$HOME/.sfeed/frames" && sfeed_frames $HOME/.sfeed/feeds/*

To automatically update your feeds periodically and format them in a way you
like you can make a wrapper script and add it as a cronjob.

Most protocols are supported because curl(1) is used by default and also proxy
settings from the environment (such as the $http_proxy environment variable)
are used.

The sfeed(1) program itself is just a parser that parses XML data from stdin
and is therefore network protocol-agnostic. It can be used with HTTP, HTTPS,
Gopher, SSH, etc.

See the section "Usage and examples" below and the man-pages for more
information how to use sfeed(1) and the additional tools.


Dependencies
------------

- C compiler (C99).
- libc (recommended: C99 and POSIX >= 200809).


Optional dependencies
---------------------

- POSIX make(1) for the Makefile.
- POSIX sh(1),
  used by sfeed_update(1) and sfeed_opml_export(1).
- POSIX utilities such as awk(1) and sort(1),
  used by sfeed_content(1), sfeed_markread(1) and sfeed_update(1).
- curl(1) binary: https://curl.haxx.se/ ,
  used by sfeed_update(1), but can be replaced with any tool like wget(1),
  OpenBSD ftp(1) or hurl(1): https://git.codemadness.org/hurl/
- iconv(1) command-line utilities,
  used by sfeed_update(1). If the text in your RSS/Atom feeds are already UTF-8
  encoded then you don't need this. For a minimal iconv implementation:
  https://git.etalabs.net/cgit/noxcuse/tree/src/iconv.c
- mandoc for documentation: https://mdocml.bsd.lv/
- curses (typically ncurses), otherwise see minicurses.h,
  used by sfeed_curses(1).
- a terminal (emulator) supporting UTF-8 and the used capabilities,
  used by sfeed_curses(1).


Optional run-time dependencies for sfeed_curses
-----------------------------------------------

- xclip for yanking the URL or enclosure. See $SFEED_YANKER to change it.
- xdg-open, used as a plumber by default. See $SFEED_PLUMBER to change it.
- awk, used by the sfeed_content and sfeed_markread script.
  See the ENVIRONMENT VARIABLES section in the man page to change it.
- lynx, used by the sfeed_content script to convert HTML content.
  See the ENVIRONMENT VARIABLES section in the man page to change it.


Formats supported
-----------------

sfeed supports a subset of XML 1.0 and a subset of:

- Atom 1.0 (RFC 4287): https://datatracker.ietf.org/doc/html/rfc4287
- Atom 0.3 (draft, historic).
- RSS 0.91+.
- RDF (when used with RSS).
- MediaRSS extensions (media:).
- Dublin Core extensions (dc:).

Other formats like JSONfeed, twtxt or certain RSS/Atom extensions can be
supported by converting them to RSS/Atom or to the sfeed(5) format directly.


OS tested
---------

- Linux,
  compilers: clang, gcc, chibicc, cproc, lacc, pcc, tcc,
  libc: glibc, musl.
- OpenBSD (clang, gcc).
- NetBSD (with NetBSD curses).
- FreeBSD
- DragonFlyBSD
- GNU/Hurd
- Illumos (OpenIndiana).
- Windows (cygwin gcc + mintty, mingw).
- HaikuOS
- SerenityOS
- FreeDOS (djgpp).
- FUZIX (sdcc -mz80, with the sfeed parser program).


Architectures tested
--------------------

amd64, ARM, aarch64, HPPA, i386, MIPS32-BE, RISCV64, SPARC64, Z80.


Files
-----

sfeed             - Read XML RSS or Atom feed data from stdin. Write feed data
                    in TAB-separated format to stdout.
sfeed_atom        - Format feed data (TSV) to an Atom feed.
sfeed_content     - View item content, for use with sfeed_curses.
sfeed_curses      - Format feed data (TSV) to a curses interface.
sfeed_frames      - Format feed data (TSV) to HTML file(s) with frames.
sfeed_gopher      - Format feed data (TSV) to Gopher files.
sfeed_html        - Format feed data (TSV) to HTML.
sfeed_opml_export - Generate an OPML XML file from a sfeedrc config file.
sfeed_opml_import - Generate a sfeedrc config file from an OPML XML file.
sfeed_markread    - Mark items as read/unread, for use with sfeed_curses.
sfeed_mbox        - Format feed data (TSV) to mbox.
sfeed_plain       - Format feed data (TSV) to a plain-text list.
sfeed_twtxt       - Format feed data (TSV) to a twtxt feed.
sfeed_update      - Update feeds and merge items.
sfeed_web         - Find URLs to RSS/Atom feed from a webpage.
sfeed_xmlenc      - Detect character-set encoding from a XML stream.
sfeedrc.example   - Example config file. Can be copied to $HOME/.sfeed/sfeedrc.
style.css         - Example stylesheet to use with sfeed_html(1) and
                    sfeed_frames(1).


Files read at runtime by sfeed_update(1)
----------------------------------------

sfeedrc - Config file. This file is evaluated as a shellscript in
          sfeed_update(1).

At least the following functions can be overridden per feed:

- fetch: to use wget(1), OpenBSD ftp(1) or an other download program.
- filter: to filter on fields.
- merge: to change the merge logic.
- order: to change the sort order.

See also the sfeedrc(5) man page documentation for more details.

The feeds() function is called to process the feeds. The default feed()
function is executed concurrently as a background job in your sfeedrc(5) config
file to make updating faster. The variable maxjobs can be changed to limit or
increase the amount of concurrent jobs (8 by default).


Files written at runtime by sfeed_update(1)
-------------------------------------------

feedname     - TAB-separated format containing all items per feed. The
               sfeed_update(1) script merges new items with this file.
               The format is documented in sfeed(5).


File format
-----------

man 5 sfeed
man 5 sfeedrc
man 1 sfeed


Usage and examples
------------------

Find RSS/Atom feed URLs from a webpage:

	url="https://codemadness.org"; curl -L -s "$url" | sfeed_web "$url"

output example:

	https://codemadness.org/blog/rss.xml	application/rss+xml
	https://codemadness.org/blog/atom.xml	application/atom+xml

- - -

Make sure your sfeedrc config file exists, see sfeedrc.example. To update your
feeds (configfile argument is optional):

	sfeed_update "configfile"

Format the feeds files:

	# Plain-text list.
	sfeed_plain $HOME/.sfeed/feeds/* > $HOME/.sfeed/feeds.txt
	# HTML view (no frames), copy style.css for a default style.
	sfeed_html $HOME/.sfeed/feeds/* > $HOME/.sfeed/feeds.html
	# HTML view with the menu as frames, copy style.css for a default style.
	mkdir -p somedir && cd somedir && sfeed_frames $HOME/.sfeed/feeds/*

View formatted output in your browser:

	$BROWSER "$HOME/.sfeed/feeds.html"

View formatted output in your editor:

	$EDITOR "$HOME/.sfeed/feeds.txt"

- - -

View formatted output in a curses interface.  The interface has a look inspired
by the mutt mail client.  It has a sidebar panel for the feeds, a panel with a
listing of the items and a small statusbar for the selected item/URL. Some
functions like searching and scrolling are integrated in the interface itself.

Just like the other format programs included in sfeed you can run it like this:

	sfeed_curses ~/.sfeed/feeds/*

... or by reading from stdin:

	sfeed_curses < ~/.sfeed/feeds/xkcd

By default sfeed_curses marks the items of the last day as new/bold. To manage
read/unread items in a different way a plain-text file with a list of the read
URLs can be used. To enable this behaviour the path to this file can be
specified by setting the environment variable $SFEED_URL_FILE to the URL file:

	export SFEED_URL_FILE="$HOME/.sfeed/urls"
	[ -f "$SFEED_URL_FILE" ] || touch "$SFEED_URL_FILE"
	sfeed_curses ~/.sfeed/feeds/*

It then uses the shellscript "sfeed_markread" to process the read and unread
items.

- - -

Example script to view feed items in a vertical list/menu in dmenu(1). It opens
the selected URL in the browser set in $BROWSER:

	#!/bin/sh
	url=$(sfeed_plain "$HOME/.sfeed/feeds/"* | dmenu -l 35 -i | \
		sed -n 's@^.* \([a-zA-Z]*://\)\(.*\)$@\1\2@p')
	test -n "${url}" && $BROWSER "${url}"

dmenu can be found at: https://git.suckless.org/dmenu/

- - -

Generate a sfeedrc config file from your exported list of feeds in OPML
format:

	sfeed_opml_import < opmlfile.xml > $HOME/.sfeed/sfeedrc

- - -

Export an OPML file of your feeds from a sfeedrc config file (configfile
argument is optional):

	sfeed_opml_export configfile > myfeeds.opml

- - -

The filter function can be overridden in your sfeedrc file. This allows
filtering items per feed. It can be used to shorten URLs, filter away
advertisements, strip tracking parameters and more.

	# filter fields.
	# filter(name)
	filter() {
		case "$1" in
		"tweakers")
			awk -F '\t' 'BEGIN { OFS = "\t"; }
			# skip ads.
			$2 ~ /^ADV:/ {
				next;
			}
			# shorten link.
			{
				if (match($3, /^https:\/\/tweakers\.net\/[a-z]+\/[0-9]+\//)) {
					$3 = substr($3, RSTART, RLENGTH);
				}
				print $0;
			}';;
		"yt BSDNow")
			# filter only BSD Now from channel.
			awk -F '\t' '$2 ~ / \| BSD Now/';;
		*)
			cat;;
		esac | \
			# replace youtube links with embed links.
			sed 's@www.youtube.com/watch?v=@www.youtube.com/embed/@g' | \

			awk -F '\t' 'BEGIN { OFS = "\t"; }
			function filterlink(s) {
				# protocol must start with http, https or gopher.
				if (match(s, /^(http|https|gopher):\/\//) == 0) {
					return "";
				}

				# shorten feedburner links.
				if (match(s, /^(http|https):\/\/[^\/]+\/~r\/.*\/~3\/[^\/]+\//)) {
					s = substr($3, RSTART, RLENGTH);
				}

				# strip tracking parameters
				# urchin, facebook, piwik, webtrekk and generic.
				gsub(/\?(ad|campaign|fbclid|pk|tm|utm|wt)_([^&]+)/, "?", s);
				gsub(/&(ad|campaign|fbclid|pk|tm|utm|wt)_([^&]+)/, "", s);

				gsub(/\?&/, "?", s);
				gsub(/[\?&]+$/, "", s);

				return s
			}
			{
				$3 = filterlink($3); # link
				$8 = filterlink($8); # enclosure

				# try to remove tracking pixels: <img/> tags with 1px width or height.
				gsub("<img[^>]*(width|height)[[:space:]]*=[[:space:]]*[\"'"'"' ]?1[\"'"'"' ]?[^0-9>]+[^>]*>", "", $4);

				print $0;
			}'
	}

- - -

Aggregate feeds. This filters new entries (maximum one day old) and sorts them
by newest first. Prefix the feed name in the title. Convert the TSV output data
to an Atom XML feed (again):

	#!/bin/sh
	cd ~/.sfeed/feeds/ || exit 1

	awk -F '\t' -v "old=$(($(date +'%s') - 86400))" '
	BEGIN {	OFS = "\t"; }
	int($1) >= old {
		$2 = "[" FILENAME "] " $2;
		print $0;
	}' * | \
	sort -k1,1rn | \
	sfeed_atom

- - -

To have a "tail(1) -f"-like FIFO stream filtering for new unique feed items and
showing them as plain-text per line similar to sfeed_plain(1):

Create a FIFO:

	fifo="/tmp/sfeed_fifo"
	mkfifo "$fifo"

On the reading side:

	# This keeps track of unique lines so might consume much memory.
	# It tries to reopen the $fifo after 1 second if it fails.
	while :; do cat "$fifo" || sleep 1; done | awk '!x[$0]++'

On the writing side:

	feedsdir="$HOME/.sfeed/feeds/"
	cd "$feedsdir" || exit 1
	test -p "$fifo" || exit 1

	# 1 day is old news, don't write older items.
	awk -F '\t' -v "old=$(($(date +'%s') - 86400))" '
	BEGIN { OFS = "\t"; }
	int($1) >= old {
		$2 = "[" FILENAME "] " $2;
		print $0;
	}' * | sort -k1,1n | sfeed_plain | cut -b 3- > "$fifo"

cut -b is used to trim the "N " prefix of sfeed_plain(1).

- - -

For some podcast feed the following code can be used to filter the latest
enclosure URL (probably some audio file):

	awk -F '\t' 'BEGIN { latest = 0; }
	length($8) {
		ts = int($1);
		if (ts > latest) {
			url = $8;
			latest = ts;
		}
	}
	END { if (length(url)) { print url; } }'

... or on a file already sorted from newest to oldest:

	awk -F '\t' '$8 { print $8; exit }'

- - -

Over time your feeds file might become quite big. You can archive items of a
feed from (roughly) the last week by doing for example:

	awk -F '\t' -v "old=$(($(date +'%s') - 604800))" 'int($1) > old' < feed > feed.new
	mv feed feed.bak
	mv feed.new feed

This could also be run weekly in a crontab to archive the feeds. Like throwing
away old newspapers. It keeps the feeds list tidy and the formatted output
small.

- - -

Convert mbox to separate maildirs per feed and filter duplicate messages using the
fdm program.
fdm is available at: https://github.com/nicm/fdm

fdm config file (~/.sfeed/fdm.conf):

	set unmatched-mail keep

	account "sfeed" mbox "%[home]/.sfeed/mbox"
		$cachepath = "%[home]/.sfeed/fdm.cache"
		cache "${cachepath}"
		$maildir = "%[home]/feeds/"

		# Check if message is in the cache by Message-ID.
		match case "^Message-ID: (.*)" in headers
			action {
				tag "msgid" value "%1"
			}
			continue

		# If it is in the cache, stop.
		match matched and in-cache "${cachepath}" key "%[msgid]"
			action {
				keep
			}

		# Not in the cache, process it and add to cache.
		match case "^X-Feedname: (.*)" in headers
			action {
				# Store to local maildir.
				maildir "${maildir}%1"

				add-to-cache "${cachepath}" key "%[msgid]"
				keep
			}

Now run:

	$ sfeed_mbox ~/.sfeed/feeds/* > ~/.sfeed/mbox
	$ fdm -f ~/.sfeed/fdm.conf fetch

Now you can view feeds in mutt(1) for example.

- - -

Read from mbox and filter duplicate messages using the fdm program and deliver
it to a SMTP server. This works similar to the rss2email program.
fdm is available at: https://github.com/nicm/fdm

fdm config file (~/.sfeed/fdm.conf):

	set unmatched-mail keep

	account "sfeed" mbox "%[home]/.sfeed/mbox"
		$cachepath = "%[home]/.sfeed/fdm.cache"
		cache "${cachepath}"

		# Check if message is in the cache by Message-ID.
		match case "^Message-ID: (.*)" in headers
			action {
				tag "msgid" value "%1"
			}
			continue

		# If it is in the cache, stop.
		match matched and in-cache "${cachepath}" key "%[msgid]"
			action {
				keep
			}

		# Not in the cache, process it and add to cache.
		match case "^X-Feedname: (.*)" in headers
			action {
				# Connect to a SMTP server and attempt to deliver the
				# mail to it.
				# Of course change the server and e-mail below.
				smtp server "codemadness.org" to "hiltjo@codemadness.org"

				add-to-cache "${cachepath}" key "%[msgid]"
				keep
			}

Now run:

	$ sfeed_mbox ~/.sfeed/feeds/* > ~/.sfeed/mbox
	$ fdm -f ~/.sfeed/fdm.conf fetch

Now you can view feeds in mutt(1) for example.

- - -

Convert mbox to separate maildirs per feed and filter duplicate messages using
procmail(1).

procmail_maildirs.sh file:

	maildir="$HOME/feeds"
	feedsdir="$HOME/.sfeed/feeds"
	procmailconfig="$HOME/.sfeed/procmailrc"

	# message-id cache to prevent duplicates.
	mkdir -p "${maildir}/.cache"

	if ! test -r "${procmailconfig}"; then
		echo "Procmail configuration file \"${procmailconfig}\" does not exist or is not readable." >&2
		echo "See procmailrc.example for an example." >&2
		exit 1
	fi

	find "${feedsdir}" -type f -exec printf '%s\n' {} \; | while read -r d; do
		name=$(basename "${d}")
		mkdir -p "${maildir}/${name}/cur"
		mkdir -p "${maildir}/${name}/new"
		mkdir -p "${maildir}/${name}/tmp"
		printf 'Mailbox %s\n' "${name}"
		sfeed_mbox "${d}" | formail -s procmail "${procmailconfig}"
	done

Procmailrc(5) file:

	# Example for use with sfeed_mbox(1).
	# The header X-Feedname is used to split into separate maildirs. It is
	# assumed this name is sane.

	MAILDIR="$HOME/feeds/"

	:0
	* ^X-Feedname: \/.*
	{
		FEED="$MATCH"

		:0 Wh: "msgid_$FEED.lock"
		| formail -D 1024000 ".cache/msgid_$FEED.cache"

		:0
		"$FEED"/
	}

Now run:

	$ procmail_maildirs.sh

Now you can view feeds in mutt(1) for example.

- - -

The fetch function can be overridden in your sfeedrc file. This allows to
replace the default curl(1) for sfeed_update with any other client to fetch the
RSS/Atom data or change the default curl options:

	# fetch a feed via HTTP/HTTPS etc.
	# fetch(name, url, feedfile)
	fetch() {
		hurl -m 1048576 -t 15 "$2" 2>/dev/null
	}

- - -

Caching, incremental data updates and bandwidth-saving

For servers that support it some incremental updates and bandwidth-saving can
be done by using the "ETag" HTTP header.

Create a directory for storing the ETags per feed:

	mkdir -p ~/.sfeed/etags/

The curl ETag options (--etag-save and --etag-compare) can be used to store and
send the previous ETag header value. curl version 7.73+ is recommended for it
to work properly.

The curl -z option can be used to send the modification date of a local file as
a HTTP "If-Modified-Since" request header. The server can then respond if the
data is modified or not or respond with only the incremental data.

The curl --compressed option can be used to indicate the client supports
decompression. Because RSS/Atom feeds are textual XML content this generally
compresses very well.

These options can be set by overriding the fetch() function in the sfeedrc
file:

	# fetch(name, url, feedfile)
	fetch() {
		etag="$HOME/.sfeed/etags/$(basename "$3")"
		curl \
			-L --max-redirs 0 -H "User-Agent:" -f -s -m 15 \
			--compressed \
			--etag-save "${etag}" --etag-compare "${etag}" \
			-z "${etag}" \
			"$2" 2>/dev/null
	}

These options can come at a cost of some privacy, because it exposes
additional metadata from the previous request.

- - -

CDNs blocking requests due to a missing HTTP User-Agent request header

sfeed_update will not send the "User-Agent" header by default for privacy
reasons.  Some CDNs like Cloudflare don't like this and will block such HTTP
requests.

A custom User-Agent can be set by using the curl -H option, like so:

	curl -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0'

The above example string pretends to be a Windows 10 (x86-64) machine running
Firefox 78.

- - -

Page redirects

For security and efficiency reasons by default redirects are not allowed and
are treated as an error.

For example to prevent hijacking an unencrypted http:// to https:// redirect or
to not add time of an unnecessary page redirect each time.  It is encouraged to
use the final redirected URL in the sfeedrc config file.

If you want to ignore this advise you can override the fetch() function in the
sfeedrc file and change the curl options "-L --max-redirs 0".

- - -

Shellscript to update feeds in parallel more efficiently using xargs -P.

It creates a queue of the feeds with its settings, then uses xargs to process
them in parallel using the common, but non-POSIX -P option. This is more
efficient than the more portable solution in sfeed_update which can stall a
batch of $maxjobs in the queue if one item is slow.

sfeed_update_xargs shellscript:

	#!/bin/sh
	# update feeds, merge with old feeds using xargs in parallel mode (non-POSIX).
	
	# include script and reuse its functions, but do not start main().
	SFEED_UPDATE_INCLUDE="1" . sfeed_update
	# load config file, sets $config.
	loadconfig "$1"
	
	# process a single feed.
	# args are: config, tmpdir, name, feedurl, basesiteurl, encoding
	if [ "${SFEED_UPDATE_CHILD}" = "1" ]; then
		sfeedtmpdir="$2"
		_feed "$3" "$4" "$5" "$6"
		exit $?
	fi
	
	# ...else parent mode:
	
	# feed(name, feedurl, basesiteurl, encoding)
	feed() {
		# workaround: *BSD xargs doesn't handle empty fields in the middle.
		name="${1:-$$}"
		feedurl="${2:-http://}"
		basesiteurl="${3:-${feedurl}}"
		encoding="$4"
	
		printf '%s\0%s\0%s\0%s\0%s\0%s\0' "${config}" "${sfeedtmpdir}" \
			"${name}" "${feedurl}" "${basesiteurl}" "${encoding}"
	}
	
	# fetch feeds and store in temporary directory.
	sfeedtmpdir="$(mktemp -d '/tmp/sfeed_XXXXXX')"
	# make sure path exists.
	mkdir -p "${sfeedpath}"
	# print feeds for parallel processing with xargs.
	feeds | SFEED_UPDATE_CHILD="1" xargs -r -0 -P "${maxjobs}" -L 6 "$(readlink -f "$0")"
	# cleanup temporary files etc.
	cleanup

- - -

Shellscript to handle URLs and enclosures in parallel using xargs -P.

This can be used to download and process URLs for downloading podcasts,
webcomics, download and convert webpages, mirror videos, etc. It uses a
plain-text cache file for remembering processed URLs. The match patterns are
defined in the shellscript fetch() function and in the awk script and can be
modified to handle items differently depending on their context.

The arguments for the script are files in the sfeed(5) format. If no file
arguments are specified then the data is read from stdin.

	#!/bin/sh
	# sfeed_download: downloader for URLs and enclosures in sfeed(5) files.
	# Dependencies: awk, curl, flock, xargs (-P), youtube-dl.
	
	cachefile="${SFEED_CACHEFILE:-$HOME/.sfeed/downloaded_urls}"
	jobs="${SFEED_JOBS:-4}"
	lockfile="${HOME}/.sfeed/sfeed_download.lock"
	
	# log(feedname, s, status)
	log() {
		if [ "$1" != "-" ]; then
			s="[$1] $2"
		else
			s="$2"
		fi
		printf '[%s]: %s: %s\n' "$(date +'%H:%M:%S')" "${s}" "$3" >&2
	}
	
	# fetch(url, feedname)
	fetch() {
		case "$1" in
		*youtube.com*)
			youtube-dl "$1";;
		*.flac|*.ogg|*.m3u|*.m3u8|*.m4a|*.mkv|*.mp3|*.mp4|*.wav|*.webm)
			# allow 2 redirects, hide User-Agent, connect timeout is 15 seconds.
			curl -O -L --max-redirs 2 -H "User-Agent:" -f -s --connect-timeout 15 "$1";;
		esac
	}
	
	# downloader(url, title, feedname)
	downloader() {
		url="$1"
		title="$2"
		feedname="${3##*/}"
	
		msg="${title}: ${url}"
	
		# download directory.
		if [ "${feedname}" != "-" ]; then
			mkdir -p "${feedname}"
			if ! cd "${feedname}"; then
				log "${feedname}" "${msg}: ${feedname}" "DIR FAIL"
				exit 1
			fi
		fi
	
		log "${feedname}" "${msg}" "START"
		fetch "${url}" "${feedname}"
		if [ $? = 0 ]; then
			log "${feedname}" "${msg}" "OK"
	
			# append it safely in parallel to the cachefile on a
			# successful download.
			(flock 9 || exit 1
			printf '%s\n' "${url}" >> "${cachefile}"
			) 9>"${lockfile}"
		else
			log "${feedname}" "${msg}" "FAIL"
		fi
	}
	
	if [ "${SFEED_DOWNLOAD_CHILD}" = "1" ]; then
		# Downloader helper for parallel downloading.
		# Receives arguments: $1 = URL, $2 = title, $3 = feed filename or "-".
		# It should write the URI to the cachefile if it is succesful.
		downloader "$1" "$2" "$3"
		exit $?
	fi
	
	# ...else parent mode:
	
	tmp=$(mktemp)
	trap "rm -f ${tmp}" EXIT
	
	[ -f "${cachefile}" ] || touch "${cachefile}"
	cat "${cachefile}" > "${tmp}"
	echo >> "${tmp}" # force it to have one line for awk.
	
	LC_ALL=C awk -F '\t' '
	# fast prefilter what to download or not.
	function filter(url, field, feedname) {
		u = tolower(url);
		return (match(u, "youtube\\.com") ||
		        match(u, "\\.(flac|ogg|m3u|m3u8|m4a|mkv|mp3|mp4|wav|webm)$"));
	}
	function download(url, field, title, filename) {
		if (!length(url) || urls[url] || !filter(url, field, filename))
			return;
		# NUL-separated for xargs -0.
		printf("%s%c%s%c%s%c", url, 0, title, 0, filename, 0);
		urls[url] = 1; # print once
	}
	{
		FILENR += (FNR == 1);
	}
	# lookup table from cachefile which contains downloaded URLs.
	FILENR == 1 {
		urls[$0] = 1;
	}
	# feed file(s).
	FILENR != 1 {
		download($3, 3, $2, FILENAME); # link
		download($8, 8, $2, FILENAME); # enclosure
	}
	' "${tmp}" "${@:--}" | \
	SFEED_DOWNLOAD_CHILD="1" xargs -r -0 -L 3 -P "${jobs}" "$(readlink -f "$0")"

- - -

Shellscript to export existing newsboat cached items from sqlite3 to the sfeed
TSV format.

	#!/bin/sh
	# Export newsbeuter/newsboat cached items from sqlite3 to the sfeed TSV format.
	# The data is split per file per feed with the name of the newsboat title/url.
	# It writes the URLs of the read items line by line to a "urls" file.
	#
	# Dependencies: sqlite3, awk.
	#
	# Usage: create some directory to store the feeds then run this script.
	
	# newsboat cache.db file.
	cachefile="$HOME/.newsboat/cache.db"
	test -n "$1" && cachefile="$1"
	
	# dump data.
	# .mode ascii: Columns/rows delimited by 0x1F and 0x1E
	# get the first fields in the order of the sfeed(5) format.
	sqlite3 "$cachefile" <<!EOF |
	.headers off
	.mode ascii
	.output
	SELECT
		i.pubDate, i.title, i.url, i.content, i.content_mime_type,
		i.guid, i.author, i.enclosure_url,
		f.rssurl AS rssurl, f.title AS feedtitle, i.unread
		-- i.id, i.enclosure_type, i.enqueued, i.flags, i.deleted, i.base
	FROM rss_feed f
	INNER JOIN rss_item i ON i.feedurl = f.rssurl
	ORDER BY
		i.feedurl ASC, i.pubDate DESC;
	.quit
	!EOF
	# convert to sfeed(5) TSV format.
	LC_ALL=C awk '
	BEGIN {
		FS = "\x1f";
		RS = "\x1e";
	}
	# normal non-content fields.
	function field(s) {
		gsub("^[[:space:]]*", "", s);
		gsub("[[:space:]]*$", "", s);
		gsub("[[:space:]]", " ", s);
		gsub("[[:cntrl:]]", "", s);
		return s;
	}
	# content field.
	function content(s) {
		gsub("^[[:space:]]*", "", s);
		gsub("[[:space:]]*$", "", s);
		# escape chars in content field.
		gsub("\\\\", "\\\\", s);
		gsub("\n", "\\n", s);
		gsub("\t", "\\t", s);
		return s;
	}
	function feedname(feedurl, feedtitle) {
		if (feedtitle == "") {
			gsub("/", "_", feedurl);
			return feedurl;
		}
		gsub("/", "_", feedtitle);
		return feedtitle;
	}
	{
		fname = feedname($9, $10);
		if (!feed[fname]++) {
			print "Writing file: \"" fname "\" (title: " $10 ", url: " $9 ")" > "/dev/stderr";
		}
	
		contenttype = field($5);
		if (contenttype == "")
			contenttype = "html";
		else if (index(contenttype, "/html") || index(contenttype, "/xhtml"))
			contenttype = "html";
		else
			contenttype = "plain";
	
		print $1 "\t" field($2) "\t" field($3) "\t" content($4) "\t" \
			contenttype "\t" field($6) "\t" field($7) "\t" field($8) "\t" \
			> fname;
	
		# write URLs of the read items to a file line by line.
		if ($11 == "0") {
			print $3 > "urls";
		}
	}'

- - -

Running custom commands inside the program
------------------------------------------

Running commands inside the sfeed_curses program can be useful for example to
sync items or mark all items across all feeds as read. It can be comfortable to
have a keybind for this inside the program to perform a scripted action and
then reload the feeds by sending the signal SIGHUP.

In the input handling code you can then add a case:

	case 'M':
		forkexec((char *[]) { "markallread.sh", NULL }, 0);
		break;

or

	case 'S':
		forkexec((char *[]) { "syncnews.sh", NULL }, 1);
		break;

The specified script should be in $PATH or an absolute path.

Example of a `markallread.sh` shellscript to mark all URLs as read:

	#!/bin/sh
	# mark all items/URLs as read.

	tmp=$(mktemp)
	(cat ~/.sfeed/urls; cut -f 3 ~/.sfeed/feeds/*) | \
	awk '!x[$0]++' > "$tmp" &&
	mv "$tmp" ~/.sfeed/urls &&
	pkill -SIGHUP sfeed_curses # reload feeds.

Example of a `syncnews.sh` shellscript to update the feeds and reload them:

	#!/bin/sh
	sfeed_update && pkill -SIGHUP sfeed_curses


Open an URL directly in the same terminal
-----------------------------------------

To open an URL directly in the same terminal using the text-mode lynx browser:

	SFEED_PLUMBER=lynx SFEED_PLUMBER_INTERACTIVE=1 sfeed_curses ~/.sfeed/feeds/*


Yank to tmux buffer
-------------------

This changes the yank command to set the tmux buffer, instead of X11 xclip:

	SFEED_YANKER="tmux set-buffer \`cat\`"


Known terminal issues
---------------------

Below lists some bugs or missing features in terminals that are found while
testing sfeed_curses.  Some of them might be fixed already upstream:

- cygwin + mintty: the xterm mouse-encoding of the mouse position is broken for
  scrolling.
- HaikuOS terminal: the xterm mouse-encoding of the mouse button number of the
  middle-button, right-button is incorrect / reversed.
- putty: the full reset attribute (ESC c, typically `rs1`) does not reset the
  window title.


License
-------

ISC, see LICENSE file.


Author
------

Hiltjo Posthuma <hiltjo@codemadness.org>