README (29220B) - raw
1 sfeed 2 ----- 3 4 RSS and Atom parser (and some format programs). 5 6 It converts RSS or Atom feeds from XML to a TAB-separated file. There are 7 formatting programs included to convert this TAB-separated format to various 8 other formats. There are also some programs and scripts included to import and 9 export OPML and to fetch, filter, merge and order feed items. 10 11 12 Build and install 13 ----------------- 14 15 $ make 16 # make install 17 18 19 To build sfeed without sfeed_curses set SFEED_CURSES to an empty string: 20 21 $ make SFEED_CURSES="" 22 # make SFEED_CURSES="" install 23 24 25 To change the theme for sfeed_curses you can set SFEED_THEME. See the themes/ 26 directory for the theme names. 27 28 $ make SFEED_THEME="templeos" 29 # make SFEED_THEME="templeos" install 30 31 32 Usage 33 ----- 34 35 Initial setup: 36 37 mkdir -p "$HOME/.sfeed/feeds" 38 cp sfeedrc.example "$HOME/.sfeed/sfeedrc" 39 40 Edit the sfeedrc(5) configuration file and change any RSS/Atom feeds. This file 41 is included and evaluated as a shellscript for sfeed_update, so it's functions 42 and behaviour can be overridden: 43 44 $EDITOR "$HOME/.sfeed/sfeedrc" 45 46 or you can import existing OPML subscriptions using sfeed_opml_import(1): 47 48 sfeed_opml_import < file.opml > "$HOME/.sfeed/sfeedrc" 49 50 an example to export from an other RSS/Atom reader called newsboat and import 51 for sfeed_update: 52 53 newsboat -e | sfeed_opml_import > "$HOME/.sfeed/sfeedrc" 54 55 an example to export from an other RSS/Atom reader called rss2email (3.x+) and 56 import for sfeed_update: 57 58 r2e opmlexport | sfeed_opml_import > "$HOME/.sfeed/sfeedrc" 59 60 Update feeds, this script merges the new items, see sfeed_update(1) for more 61 information what it can do: 62 63 sfeed_update 64 65 Format feeds: 66 67 Plain-text list: 68 69 sfeed_plain $HOME/.sfeed/feeds/* > "$HOME/.sfeed/feeds.txt" 70 71 HTML view (no frames), copy style.css for a default style: 72 73 cp style.css "$HOME/.sfeed/style.css" 74 sfeed_html $HOME/.sfeed/feeds/* > "$HOME/.sfeed/feeds.html" 75 76 HTML view with the menu as frames, copy style.css for a default style: 77 78 mkdir -p "$HOME/.sfeed/frames" 79 cd "$HOME/.sfeed/frames" && sfeed_frames $HOME/.sfeed/feeds/* 80 81 To automatically update your feeds periodically and format them in a way you 82 like you can make a wrapper script and add it as a cronjob. 83 84 Most protocols are supported because curl(1) is used by default and also proxy 85 settings from the environment (such as the $http_proxy environment variable) 86 are used. 87 88 The sfeed(1) program itself is just a parser that parses XML data from stdin 89 and is therefore network protocol-agnostic. It can be used with HTTP, HTTPS, 90 Gopher, SSH, etc. 91 92 See the section "Usage and examples" below and the man-pages for more 93 information how to use sfeed(1) and the additional tools. 94 95 96 Dependencies 97 ------------ 98 99 - C compiler (C99). 100 - libc (recommended: C99 and POSIX >= 200809). 101 102 103 Optional dependencies 104 --------------------- 105 106 - POSIX make(1) for the Makefile. 107 - POSIX sh(1), 108 used by sfeed_update(1) and sfeed_opml_export(1). 109 - POSIX utilities such as awk(1) and sort(1), 110 used by sfeed_content(1), sfeed_markread(1) and sfeed_update(1). 111 - curl(1) binary: https://curl.haxx.se/ , 112 used by sfeed_update(1), but can be replaced with any tool like wget(1), 113 OpenBSD ftp(1) or hurl(1): https://git.codemadness.org/hurl/ 114 - iconv(1) command-line utilities, 115 used by sfeed_update(1). If the text in your RSS/Atom feeds are already UTF-8 116 encoded then you don't need this. For a minimal iconv implementation: 117 https://git.etalabs.net/cgit/noxcuse/tree/src/iconv.c 118 - mandoc for documentation: https://mdocml.bsd.lv/ 119 - curses (typically ncurses), otherwise see minicurses.h, 120 used by sfeed_curses(1). 121 - a terminal (emulator) supporting UTF-8 and the used capabilities, 122 used by sfeed_curses(1). 123 124 125 Optional run-time dependencies for sfeed_curses 126 ----------------------------------------------- 127 128 - xclip for yanking the URL or enclosure. See $SFEED_YANKER to change it. 129 - xdg-open, used as a plumber by default. See $SFEED_PLUMBER to change it. 130 - awk, used by the sfeed_content and sfeed_markread script. 131 See the ENVIRONMENT VARIABLES section in the man page to change it. 132 - lynx, used by the sfeed_content script to convert HTML content. 133 See the ENVIRONMENT VARIABLES section in the man page to change it. 134 135 136 Formats supported 137 ----------------- 138 139 sfeed supports a subset of XML 1.0 and a subset of: 140 141 - Atom 1.0 (RFC 4287): https://datatracker.ietf.org/doc/html/rfc4287 142 - Atom 0.3 (draft, historic). 143 - RSS 0.91+. 144 - RDF (when used with RSS). 145 - MediaRSS extensions (media:). 146 - Dublin Core extensions (dc:). 147 148 Other formats like JSONfeed, twtxt or certain RSS/Atom extensions can be 149 supported by converting them to RSS/Atom or to the sfeed(5) format directly. 150 151 152 OS tested 153 --------- 154 155 - Linux, 156 compilers: clang, gcc, chibicc, cproc, lacc, pcc, tcc, 157 libc: glibc, musl. 158 - OpenBSD (clang, gcc). 159 - NetBSD (with NetBSD curses). 160 - FreeBSD 161 - DragonFlyBSD 162 - GNU/Hurd 163 - Illumos (OpenIndiana). 164 - Windows (cygwin gcc + mintty, mingw). 165 - HaikuOS 166 - SerenityOS 167 - FreeDOS (djgpp). 168 - FUZIX (sdcc -mz80, with the sfeed parser program). 169 170 171 Architectures tested 172 -------------------- 173 174 amd64, ARM, aarch64, HPPA, i386, MIPS32-BE, RISCV64, SPARC64, Z80. 175 176 177 Files 178 ----- 179 180 sfeed - Read XML RSS or Atom feed data from stdin. Write feed data 181 in TAB-separated format to stdout. 182 sfeed_atom - Format feed data (TSV) to an Atom feed. 183 sfeed_content - View item content, for use with sfeed_curses. 184 sfeed_curses - Format feed data (TSV) to a curses interface. 185 sfeed_frames - Format feed data (TSV) to HTML file(s) with frames. 186 sfeed_gopher - Format feed data (TSV) to Gopher files. 187 sfeed_html - Format feed data (TSV) to HTML. 188 sfeed_opml_export - Generate an OPML XML file from a sfeedrc config file. 189 sfeed_opml_import - Generate a sfeedrc config file from an OPML XML file. 190 sfeed_markread - Mark items as read/unread, for use with sfeed_curses. 191 sfeed_mbox - Format feed data (TSV) to mbox. 192 sfeed_plain - Format feed data (TSV) to a plain-text list. 193 sfeed_twtxt - Format feed data (TSV) to a twtxt feed. 194 sfeed_update - Update feeds and merge items. 195 sfeed_web - Find URLs to RSS/Atom feed from a webpage. 196 sfeed_xmlenc - Detect character-set encoding from a XML stream. 197 sfeedrc.example - Example config file. Can be copied to $HOME/.sfeed/sfeedrc. 198 style.css - Example stylesheet to use with sfeed_html(1) and 199 sfeed_frames(1). 200 201 202 Files read at runtime by sfeed_update(1) 203 ---------------------------------------- 204 205 sfeedrc - Config file. This file is evaluated as a shellscript in 206 sfeed_update(1). 207 208 At least the following functions can be overridden per feed: 209 210 - fetch: to use wget(1), OpenBSD ftp(1) or an other download program. 211 - filter: to filter on fields. 212 - merge: to change the merge logic. 213 - order: to change the sort order. 214 215 See also the sfeedrc(5) man page documentation for more details. 216 217 The feeds() function is called to process the feeds. The default feed() 218 function is executed concurrently as a background job in your sfeedrc(5) config 219 file to make updating faster. The variable maxjobs can be changed to limit or 220 increase the amount of concurrent jobs (8 by default). 221 222 223 Files written at runtime by sfeed_update(1) 224 ------------------------------------------- 225 226 feedname - TAB-separated format containing all items per feed. The 227 sfeed_update(1) script merges new items with this file. 228 The format is documented in sfeed(5). 229 230 231 File format 232 ----------- 233 234 man 5 sfeed 235 man 5 sfeedrc 236 man 1 sfeed 237 238 239 Usage and examples 240 ------------------ 241 242 Find RSS/Atom feed URLs from a webpage: 243 244 url="https://codemadness.org"; curl -L -s "$url" | sfeed_web "$url" 245 246 output example: 247 248 https://codemadness.org/blog/rss.xml application/rss+xml 249 https://codemadness.org/blog/atom.xml application/atom+xml 250 251 - - - 252 253 Make sure your sfeedrc config file exists, see sfeedrc.example. To update your 254 feeds (configfile argument is optional): 255 256 sfeed_update "configfile" 257 258 Format the feeds files: 259 260 # Plain-text list. 261 sfeed_plain $HOME/.sfeed/feeds/* > $HOME/.sfeed/feeds.txt 262 # HTML view (no frames), copy style.css for a default style. 263 sfeed_html $HOME/.sfeed/feeds/* > $HOME/.sfeed/feeds.html 264 # HTML view with the menu as frames, copy style.css for a default style. 265 mkdir -p somedir && cd somedir && sfeed_frames $HOME/.sfeed/feeds/* 266 267 View formatted output in your browser: 268 269 $BROWSER "$HOME/.sfeed/feeds.html" 270 271 View formatted output in your editor: 272 273 $EDITOR "$HOME/.sfeed/feeds.txt" 274 275 - - - 276 277 View formatted output in a curses interface. The interface has a look inspired 278 by the mutt mail client. It has a sidebar panel for the feeds, a panel with a 279 listing of the items and a small statusbar for the selected item/URL. Some 280 functions like searching and scrolling are integrated in the interface itself. 281 282 Just like the other format programs included in sfeed you can run it like this: 283 284 sfeed_curses ~/.sfeed/feeds/* 285 286 ... or by reading from stdin: 287 288 sfeed_curses < ~/.sfeed/feeds/xkcd 289 290 By default sfeed_curses marks the items of the last day as new/bold. To manage 291 read/unread items in a different way a plain-text file with a list of the read 292 URLs can be used. To enable this behaviour the path to this file can be 293 specified by setting the environment variable $SFEED_URL_FILE to the URL file: 294 295 export SFEED_URL_FILE="$HOME/.sfeed/urls" 296 [ -f "$SFEED_URL_FILE" ] || touch "$SFEED_URL_FILE" 297 sfeed_curses ~/.sfeed/feeds/* 298 299 It then uses the shellscript "sfeed_markread" to process the read and unread 300 items. 301 302 - - - 303 304 Example script to view feed items in a vertical list/menu in dmenu(1). It opens 305 the selected URL in the browser set in $BROWSER: 306 307 #!/bin/sh 308 url=$(sfeed_plain "$HOME/.sfeed/feeds/"* | dmenu -l 35 -i | \ 309 sed -n 's@^.* \([a-zA-Z]*://\)\(.*\)$@\1\2@p') 310 test -n "${url}" && $BROWSER "${url}" 311 312 dmenu can be found at: https://git.suckless.org/dmenu/ 313 314 - - - 315 316 Generate a sfeedrc config file from your exported list of feeds in OPML 317 format: 318 319 sfeed_opml_import < opmlfile.xml > $HOME/.sfeed/sfeedrc 320 321 - - - 322 323 Export an OPML file of your feeds from a sfeedrc config file (configfile 324 argument is optional): 325 326 sfeed_opml_export configfile > myfeeds.opml 327 328 - - - 329 330 The filter function can be overridden in your sfeedrc file. This allows 331 filtering items per feed. It can be used to shorten URLs, filter away 332 advertisements, strip tracking parameters and more. 333 334 # filter fields. 335 # filter(name) 336 filter() { 337 case "$1" in 338 "tweakers") 339 awk -F '\t' 'BEGIN { OFS = "\t"; } 340 # skip ads. 341 $2 ~ /^ADV:/ { 342 next; 343 } 344 # shorten link. 345 { 346 if (match($3, /^https:\/\/tweakers\.net\/[a-z]+\/[0-9]+\//)) { 347 $3 = substr($3, RSTART, RLENGTH); 348 } 349 print $0; 350 }';; 351 "yt BSDNow") 352 # filter only BSD Now from channel. 353 awk -F '\t' '$2 ~ / \| BSD Now/';; 354 *) 355 cat;; 356 esac | \ 357 # replace youtube links with embed links. 358 sed 's@www.youtube.com/watch?v=@www.youtube.com/embed/@g' | \ 359 360 awk -F '\t' 'BEGIN { OFS = "\t"; } 361 function filterlink(s) { 362 # protocol must start with http, https or gopher. 363 if (match(s, /^(http|https|gopher):\/\//) == 0) { 364 return ""; 365 } 366 367 # shorten feedburner links. 368 if (match(s, /^(http|https):\/\/[^\/]+\/~r\/.*\/~3\/[^\/]+\//)) { 369 s = substr($3, RSTART, RLENGTH); 370 } 371 372 # strip tracking parameters 373 # urchin, facebook, piwik, webtrekk and generic. 374 gsub(/\?(ad|campaign|fbclid|pk|tm|utm|wt)_([^&]+)/, "?", s); 375 gsub(/&(ad|campaign|fbclid|pk|tm|utm|wt)_([^&]+)/, "", s); 376 377 gsub(/\?&/, "?", s); 378 gsub(/[\?&]+$/, "", s); 379 380 return s 381 } 382 { 383 $3 = filterlink($3); # link 384 $8 = filterlink($8); # enclosure 385 386 # try to remove tracking pixels: <img/> tags with 1px width or height. 387 gsub("<img[^>]*(width|height)[[:space:]]*=[[:space:]]*[\"'"'"' ]?1[\"'"'"' ]?[^0-9>]+[^>]*>", "", $4); 388 389 print $0; 390 }' 391 } 392 393 - - - 394 395 Aggregate feeds. This filters new entries (maximum one day old) and sorts them 396 by newest first. Prefix the feed name in the title. Convert the TSV output data 397 to an Atom XML feed (again): 398 399 #!/bin/sh 400 cd ~/.sfeed/feeds/ || exit 1 401 402 awk -F '\t' -v "old=$(($(date +'%s') - 86400))" ' 403 BEGIN { OFS = "\t"; } 404 int($1) >= old { 405 $2 = "[" FILENAME "] " $2; 406 print $0; 407 }' * | \ 408 sort -k1,1rn | \ 409 sfeed_atom 410 411 - - - 412 413 To have a "tail(1) -f"-like FIFO stream filtering for new unique feed items and 414 showing them as plain-text per line similar to sfeed_plain(1): 415 416 Create a FIFO: 417 418 fifo="/tmp/sfeed_fifo" 419 mkfifo "$fifo" 420 421 On the reading side: 422 423 # This keeps track of unique lines so might consume much memory. 424 # It tries to reopen the $fifo after 1 second if it fails. 425 while :; do cat "$fifo" || sleep 1; done | awk '!x[$0]++' 426 427 On the writing side: 428 429 feedsdir="$HOME/.sfeed/feeds/" 430 cd "$feedsdir" || exit 1 431 test -p "$fifo" || exit 1 432 433 # 1 day is old news, don't write older items. 434 awk -F '\t' -v "old=$(($(date +'%s') - 86400))" ' 435 BEGIN { OFS = "\t"; } 436 int($1) >= old { 437 $2 = "[" FILENAME "] " $2; 438 print $0; 439 }' * | sort -k1,1n | sfeed_plain | cut -b 3- > "$fifo" 440 441 cut -b is used to trim the "N " prefix of sfeed_plain(1). 442 443 - - - 444 445 For some podcast feed the following code can be used to filter the latest 446 enclosure URL (probably some audio file): 447 448 awk -F '\t' 'BEGIN { latest = 0; } 449 length($8) { 450 ts = int($1); 451 if (ts > latest) { 452 url = $8; 453 latest = ts; 454 } 455 } 456 END { if (length(url)) { print url; } }' 457 458 ... or on a file already sorted from newest to oldest: 459 460 awk -F '\t' '$8 { print $8; exit }' 461 462 - - - 463 464 Over time your feeds file might become quite big. You can archive items of a 465 feed from (roughly) the last week by doing for example: 466 467 awk -F '\t' -v "old=$(($(date +'%s') - 604800))" 'int($1) > old' < feed > feed.new 468 mv feed feed.bak 469 mv feed.new feed 470 471 This could also be run weekly in a crontab to archive the feeds. Like throwing 472 away old newspapers. It keeps the feeds list tidy and the formatted output 473 small. 474 475 - - - 476 477 Convert mbox to separate maildirs per feed and filter duplicate messages using the 478 fdm program. 479 fdm is available at: https://github.com/nicm/fdm 480 481 fdm config file (~/.sfeed/fdm.conf): 482 483 set unmatched-mail keep 484 485 account "sfeed" mbox "%[home]/.sfeed/mbox" 486 $cachepath = "%[home]/.sfeed/fdm.cache" 487 cache "${cachepath}" 488 $maildir = "%[home]/feeds/" 489 490 # Check if message is in the cache by Message-ID. 491 match case "^Message-ID: (.*)" in headers 492 action { 493 tag "msgid" value "%1" 494 } 495 continue 496 497 # If it is in the cache, stop. 498 match matched and in-cache "${cachepath}" key "%[msgid]" 499 action { 500 keep 501 } 502 503 # Not in the cache, process it and add to cache. 504 match case "^X-Feedname: (.*)" in headers 505 action { 506 # Store to local maildir. 507 maildir "${maildir}%1" 508 509 add-to-cache "${cachepath}" key "%[msgid]" 510 keep 511 } 512 513 Now run: 514 515 $ sfeed_mbox ~/.sfeed/feeds/* > ~/.sfeed/mbox 516 $ fdm -f ~/.sfeed/fdm.conf fetch 517 518 Now you can view feeds in mutt(1) for example. 519 520 - - - 521 522 Read from mbox and filter duplicate messages using the fdm program and deliver 523 it to a SMTP server. This works similar to the rss2email program. 524 fdm is available at: https://github.com/nicm/fdm 525 526 fdm config file (~/.sfeed/fdm.conf): 527 528 set unmatched-mail keep 529 530 account "sfeed" mbox "%[home]/.sfeed/mbox" 531 $cachepath = "%[home]/.sfeed/fdm.cache" 532 cache "${cachepath}" 533 534 # Check if message is in the cache by Message-ID. 535 match case "^Message-ID: (.*)" in headers 536 action { 537 tag "msgid" value "%1" 538 } 539 continue 540 541 # If it is in the cache, stop. 542 match matched and in-cache "${cachepath}" key "%[msgid]" 543 action { 544 keep 545 } 546 547 # Not in the cache, process it and add to cache. 548 match case "^X-Feedname: (.*)" in headers 549 action { 550 # Connect to a SMTP server and attempt to deliver the 551 # mail to it. 552 # Of course change the server and e-mail below. 553 smtp server "codemadness.org" to "hiltjo@codemadness.org" 554 555 add-to-cache "${cachepath}" key "%[msgid]" 556 keep 557 } 558 559 Now run: 560 561 $ sfeed_mbox ~/.sfeed/feeds/* > ~/.sfeed/mbox 562 $ fdm -f ~/.sfeed/fdm.conf fetch 563 564 Now you can view feeds in mutt(1) for example. 565 566 - - - 567 568 Convert mbox to separate maildirs per feed and filter duplicate messages using 569 procmail(1). 570 571 procmail_maildirs.sh file: 572 573 maildir="$HOME/feeds" 574 feedsdir="$HOME/.sfeed/feeds" 575 procmailconfig="$HOME/.sfeed/procmailrc" 576 577 # message-id cache to prevent duplicates. 578 mkdir -p "${maildir}/.cache" 579 580 if ! test -r "${procmailconfig}"; then 581 echo "Procmail configuration file \"${procmailconfig}\" does not exist or is not readable." >&2 582 echo "See procmailrc.example for an example." >&2 583 exit 1 584 fi 585 586 find "${feedsdir}" -type f -exec printf '%s\n' {} \; | while read -r d; do 587 name=$(basename "${d}") 588 mkdir -p "${maildir}/${name}/cur" 589 mkdir -p "${maildir}/${name}/new" 590 mkdir -p "${maildir}/${name}/tmp" 591 printf 'Mailbox %s\n' "${name}" 592 sfeed_mbox "${d}" | formail -s procmail "${procmailconfig}" 593 done 594 595 Procmailrc(5) file: 596 597 # Example for use with sfeed_mbox(1). 598 # The header X-Feedname is used to split into separate maildirs. It is 599 # assumed this name is sane. 600 601 MAILDIR="$HOME/feeds/" 602 603 :0 604 * ^X-Feedname: \/.* 605 { 606 FEED="$MATCH" 607 608 :0 Wh: "msgid_$FEED.lock" 609 | formail -D 1024000 ".cache/msgid_$FEED.cache" 610 611 :0 612 "$FEED"/ 613 } 614 615 Now run: 616 617 $ procmail_maildirs.sh 618 619 Now you can view feeds in mutt(1) for example. 620 621 - - - 622 623 The fetch function can be overridden in your sfeedrc file. This allows to 624 replace the default curl(1) for sfeed_update with any other client to fetch the 625 RSS/Atom data or change the default curl options: 626 627 # fetch a feed via HTTP/HTTPS etc. 628 # fetch(name, url, feedfile) 629 fetch() { 630 hurl -m 1048576 -t 15 "$2" 2>/dev/null 631 } 632 633 - - - 634 635 Caching, incremental data updates and bandwidth-saving 636 637 For servers that support it some incremental updates and bandwidth-saving can 638 be done by using the "ETag" HTTP header. 639 640 Create a directory for storing the ETags per feed: 641 642 mkdir -p ~/.sfeed/etags/ 643 644 The curl ETag options (--etag-save and --etag-compare) can be used to store and 645 send the previous ETag header value. curl version 7.73+ is recommended for it 646 to work properly. 647 648 The curl -z option can be used to send the modification date of a local file as 649 a HTTP "If-Modified-Since" request header. The server can then respond if the 650 data is modified or not or respond with only the incremental data. 651 652 The curl --compressed option can be used to indicate the client supports 653 decompression. Because RSS/Atom feeds are textual XML content this generally 654 compresses very well. 655 656 These options can be set by overriding the fetch() function in the sfeedrc 657 file: 658 659 # fetch(name, url, feedfile) 660 fetch() { 661 etag="$HOME/.sfeed/etags/$(basename "$3")" 662 curl \ 663 -L --max-redirs 0 -H "User-Agent:" -f -s -m 15 \ 664 --compressed \ 665 --etag-save "${etag}" --etag-compare "${etag}" \ 666 -z "${etag}" \ 667 "$2" 2>/dev/null 668 } 669 670 These options can come at a cost of some privacy, because it exposes 671 additional metadata from the previous request. 672 673 - - - 674 675 CDNs blocking requests due to a missing HTTP User-Agent request header 676 677 sfeed_update will not send the "User-Agent" header by default for privacy 678 reasons. Some CDNs like Cloudflare don't like this and will block such HTTP 679 requests. 680 681 A custom User-Agent can be set by using the curl -H option, like so: 682 683 curl -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0' 684 685 The above example string pretends to be a Windows 10 (x86-64) machine running 686 Firefox 78. 687 688 - - - 689 690 Page redirects 691 692 For security and efficiency reasons by default redirects are not allowed and 693 are treated as an error. 694 695 For example to prevent hijacking an unencrypted http:// to https:// redirect or 696 to not add time of an unnecessary page redirect each time. It is encouraged to 697 use the final redirected URL in the sfeedrc config file. 698 699 If you want to ignore this advise you can override the fetch() function in the 700 sfeedrc file and change the curl options "-L --max-redirs 0". 701 702 - - - 703 704 Shellscript to update feeds in parallel more efficiently using xargs -P. 705 706 It creates a queue of the feeds with its settings, then uses xargs to process 707 them in parallel using the common, but non-POSIX -P option. This is more 708 efficient than the more portable solution in sfeed_update which can stall a 709 batch of $maxjobs in the queue if one item is slow. 710 711 sfeed_update_xargs shellscript: 712 713 #!/bin/sh 714 # update feeds, merge with old feeds using xargs in parallel mode (non-POSIX). 715 716 # include script and reuse its functions, but do not start main(). 717 SFEED_UPDATE_INCLUDE="1" . sfeed_update 718 # load config file, sets $config. 719 loadconfig "$1" 720 721 # process a single feed. 722 # args are: config, tmpdir, name, feedurl, basesiteurl, encoding 723 if [ "${SFEED_UPDATE_CHILD}" = "1" ]; then 724 sfeedtmpdir="$2" 725 _feed "$3" "$4" "$5" "$6" 726 exit $? 727 fi 728 729 # ...else parent mode: 730 731 # feed(name, feedurl, basesiteurl, encoding) 732 feed() { 733 # workaround: *BSD xargs doesn't handle empty fields in the middle. 734 name="${1:-$$}" 735 feedurl="${2:-http://}" 736 basesiteurl="${3:-${feedurl}}" 737 encoding="$4" 738 739 printf '%s\0%s\0%s\0%s\0%s\0%s\0' "${config}" "${sfeedtmpdir}" \ 740 "${name}" "${feedurl}" "${basesiteurl}" "${encoding}" 741 } 742 743 # fetch feeds and store in temporary directory. 744 sfeedtmpdir="$(mktemp -d '/tmp/sfeed_XXXXXX')" 745 # make sure path exists. 746 mkdir -p "${sfeedpath}" 747 # print feeds for parallel processing with xargs. 748 feeds | SFEED_UPDATE_CHILD="1" xargs -r -0 -P "${maxjobs}" -L 6 "$(readlink -f "$0")" 749 # cleanup temporary files etc. 750 cleanup 751 752 - - - 753 754 Shellscript to handle URLs and enclosures in parallel using xargs -P. 755 756 This can be used to download and process URLs for downloading podcasts, 757 webcomics, download and convert webpages, mirror videos, etc. It uses a 758 plain-text cache file for remembering processed URLs. The match patterns are 759 defined in the shellscript fetch() function and in the awk script and can be 760 modified to handle items differently depending on their context. 761 762 The arguments for the script are files in the sfeed(5) format. If no file 763 arguments are specified then the data is read from stdin. 764 765 #!/bin/sh 766 # sfeed_download: downloader for URLs and enclosures in sfeed(5) files. 767 # Dependencies: awk, curl, flock, xargs (-P), youtube-dl. 768 769 cachefile="${SFEED_CACHEFILE:-$HOME/.sfeed/downloaded_urls}" 770 jobs="${SFEED_JOBS:-4}" 771 lockfile="${HOME}/.sfeed/sfeed_download.lock" 772 773 # log(feedname, s, status) 774 log() { 775 if [ "$1" != "-" ]; then 776 s="[$1] $2" 777 else 778 s="$2" 779 fi 780 printf '[%s]: %s: %s\n' "$(date +'%H:%M:%S')" "${s}" "$3" >&2 781 } 782 783 # fetch(url, feedname) 784 fetch() { 785 case "$1" in 786 *youtube.com*) 787 youtube-dl "$1";; 788 *.flac|*.ogg|*.m3u|*.m3u8|*.m4a|*.mkv|*.mp3|*.mp4|*.wav|*.webm) 789 # allow 2 redirects, hide User-Agent, connect timeout is 15 seconds. 790 curl -O -L --max-redirs 2 -H "User-Agent:" -f -s --connect-timeout 15 "$1";; 791 esac 792 } 793 794 # downloader(url, title, feedname) 795 downloader() { 796 url="$1" 797 title="$2" 798 feedname="${3##*/}" 799 800 msg="${title}: ${url}" 801 802 # download directory. 803 if [ "${feedname}" != "-" ]; then 804 mkdir -p "${feedname}" 805 if ! cd "${feedname}"; then 806 log "${feedname}" "${msg}: ${feedname}" "DIR FAIL" 807 exit 1 808 fi 809 fi 810 811 log "${feedname}" "${msg}" "START" 812 fetch "${url}" "${feedname}" 813 if [ $? = 0 ]; then 814 log "${feedname}" "${msg}" "OK" 815 816 # append it safely in parallel to the cachefile on a 817 # successful download. 818 (flock 9 || exit 1 819 printf '%s\n' "${url}" >> "${cachefile}" 820 ) 9>"${lockfile}" 821 else 822 log "${feedname}" "${msg}" "FAIL" 823 fi 824 } 825 826 if [ "${SFEED_DOWNLOAD_CHILD}" = "1" ]; then 827 # Downloader helper for parallel downloading. 828 # Receives arguments: $1 = URL, $2 = title, $3 = feed filename or "-". 829 # It should write the URI to the cachefile if it is succesful. 830 downloader "$1" "$2" "$3" 831 exit $? 832 fi 833 834 # ...else parent mode: 835 836 tmp=$(mktemp) 837 trap "rm -f ${tmp}" EXIT 838 839 [ -f "${cachefile}" ] || touch "${cachefile}" 840 cat "${cachefile}" > "${tmp}" 841 echo >> "${tmp}" # force it to have one line for awk. 842 843 LC_ALL=C awk -F '\t' ' 844 # fast prefilter what to download or not. 845 function filter(url, field, feedname) { 846 u = tolower(url); 847 return (match(u, "youtube\\.com") || 848 match(u, "\\.(flac|ogg|m3u|m3u8|m4a|mkv|mp3|mp4|wav|webm)$")); 849 } 850 function download(url, field, title, filename) { 851 if (!length(url) || urls[url] || !filter(url, field, filename)) 852 return; 853 # NUL-separated for xargs -0. 854 printf("%s%c%s%c%s%c", url, 0, title, 0, filename, 0); 855 urls[url] = 1; # print once 856 } 857 { 858 FILENR += (FNR == 1); 859 } 860 # lookup table from cachefile which contains downloaded URLs. 861 FILENR == 1 { 862 urls[$0] = 1; 863 } 864 # feed file(s). 865 FILENR != 1 { 866 download($3, 3, $2, FILENAME); # link 867 download($8, 8, $2, FILENAME); # enclosure 868 } 869 ' "${tmp}" "${@:--}" | \ 870 SFEED_DOWNLOAD_CHILD="1" xargs -r -0 -L 3 -P "${jobs}" "$(readlink -f "$0")" 871 872 - - - 873 874 Shellscript to export existing newsboat cached items from sqlite3 to the sfeed 875 TSV format. 876 877 #!/bin/sh 878 # Export newsbeuter/newsboat cached items from sqlite3 to the sfeed TSV format. 879 # The data is split per file per feed with the name of the newsboat title/url. 880 # It writes the URLs of the read items line by line to a "urls" file. 881 # 882 # Dependencies: sqlite3, awk. 883 # 884 # Usage: create some directory to store the feeds then run this script. 885 886 # newsboat cache.db file. 887 cachefile="$HOME/.newsboat/cache.db" 888 test -n "$1" && cachefile="$1" 889 890 # dump data. 891 # .mode ascii: Columns/rows delimited by 0x1F and 0x1E 892 # get the first fields in the order of the sfeed(5) format. 893 sqlite3 "$cachefile" <<!EOF | 894 .headers off 895 .mode ascii 896 .output 897 SELECT 898 i.pubDate, i.title, i.url, i.content, i.content_mime_type, 899 i.guid, i.author, i.enclosure_url, 900 f.rssurl AS rssurl, f.title AS feedtitle, i.unread 901 -- i.id, i.enclosure_type, i.enqueued, i.flags, i.deleted, i.base 902 FROM rss_feed f 903 INNER JOIN rss_item i ON i.feedurl = f.rssurl 904 ORDER BY 905 i.feedurl ASC, i.pubDate DESC; 906 .quit 907 !EOF 908 # convert to sfeed(5) TSV format. 909 LC_ALL=C awk ' 910 BEGIN { 911 FS = "\x1f"; 912 RS = "\x1e"; 913 } 914 # normal non-content fields. 915 function field(s) { 916 gsub("^[[:space:]]*", "", s); 917 gsub("[[:space:]]*$", "", s); 918 gsub("[[:space:]]", " ", s); 919 gsub("[[:cntrl:]]", "", s); 920 return s; 921 } 922 # content field. 923 function content(s) { 924 gsub("^[[:space:]]*", "", s); 925 gsub("[[:space:]]*$", "", s); 926 # escape chars in content field. 927 gsub("\\\\", "\\\\", s); 928 gsub("\n", "\\n", s); 929 gsub("\t", "\\t", s); 930 return s; 931 } 932 function feedname(feedurl, feedtitle) { 933 if (feedtitle == "") { 934 gsub("/", "_", feedurl); 935 return feedurl; 936 } 937 gsub("/", "_", feedtitle); 938 return feedtitle; 939 } 940 { 941 fname = feedname($9, $10); 942 if (!feed[fname]++) { 943 print "Writing file: \"" fname "\" (title: " $10 ", url: " $9 ")" > "/dev/stderr"; 944 } 945 946 contenttype = field($5); 947 if (contenttype == "") 948 contenttype = "html"; 949 else if (index(contenttype, "/html") || index(contenttype, "/xhtml")) 950 contenttype = "html"; 951 else 952 contenttype = "plain"; 953 954 print $1 "\t" field($2) "\t" field($3) "\t" content($4) "\t" \ 955 contenttype "\t" field($6) "\t" field($7) "\t" field($8) "\t" \ 956 > fname; 957 958 # write URLs of the read items to a file line by line. 959 if ($11 == "0") { 960 print $3 > "urls"; 961 } 962 }' 963 964 - - - 965 966 Running custom commands inside the program 967 ------------------------------------------ 968 969 Running commands inside the sfeed_curses program can be useful for example to 970 sync items or mark all items across all feeds as read. It can be comfortable to 971 have a keybind for this inside the program to perform a scripted action and 972 then reload the feeds by sending the signal SIGHUP. 973 974 In the input handling code you can then add a case: 975 976 case 'M': 977 forkexec((char *[]) { "markallread.sh", NULL }, 0); 978 break; 979 980 or 981 982 case 'S': 983 forkexec((char *[]) { "syncnews.sh", NULL }, 1); 984 break; 985 986 The specified script should be in $PATH or an absolute path. 987 988 Example of a `markallread.sh` shellscript to mark all URLs as read: 989 990 #!/bin/sh 991 # mark all items/URLs as read. 992 993 tmp=$(mktemp) 994 (cat ~/.sfeed/urls; cut -f 3 ~/.sfeed/feeds/*) | \ 995 awk '!x[$0]++' > "$tmp" && 996 mv "$tmp" ~/.sfeed/urls && 997 pkill -SIGHUP sfeed_curses # reload feeds. 998 999 Example of a `syncnews.sh` shellscript to update the feeds and reload them: 1000 1001 #!/bin/sh 1002 sfeed_update && pkill -SIGHUP sfeed_curses 1003 1004 1005 Open an URL directly in the same terminal 1006 ----------------------------------------- 1007 1008 To open an URL directly in the same terminal using the text-mode lynx browser: 1009 1010 SFEED_PLUMBER=lynx SFEED_PLUMBER_INTERACTIVE=1 sfeed_curses ~/.sfeed/feeds/* 1011 1012 1013 Yank to tmux buffer 1014 ------------------- 1015 1016 This changes the yank command to set the tmux buffer, instead of X11 xclip: 1017 1018 SFEED_YANKER="tmux set-buffer \`cat\`" 1019 1020 1021 Known terminal issues 1022 --------------------- 1023 1024 Below lists some bugs or missing features in terminals that are found while 1025 testing sfeed_curses. Some of them might be fixed already upstream: 1026 1027 - cygwin + mintty: the xterm mouse-encoding of the mouse position is broken for 1028 scrolling. 1029 - HaikuOS terminal: the xterm mouse-encoding of the mouse button number of the 1030 middle-button, right-button is incorrect / reversed. 1031 - putty: the full reset attribute (ESC c, typically `rs1`) does not reset the 1032 window title. 1033 1034 1035 License 1036 ------- 1037 1038 ISC, see LICENSE file. 1039 1040 1041 Author 1042 ------ 1043 1044 Hiltjo Posthuma <hiltjo@codemadness.org>