summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-11-01sfeed_xmlenc: be more paranoid in printing encoding namesHiltjo Posthuma
sfeed_xmlenc is used automatically in sfeed_update for detecting the encoding. In particular do not allow slashes anymore either. For example "//IGNORE" and "//TRANSLIT" which are normally allowed. Some iconv implementation might allow other funky names or even pathnames too, so disallow that. See also the notes about the "frommap" for the "-f" option. https://pubs.opengroup.org/onlinepubs/9699919799/utilities/iconv.html + some minor parsing handling improvements.
2020-10-31sfeed_web: improve parsing a <link> if it has no type attributeHiltjo Posthuma
This happens because the previous link type is not reset when a <link> tag starts again, but it is reset when a type attribute starts. Found on the spanish newspaper site: elpais.com Input: <link rel="alternate" href="https://feeds.elpais.com/mrss-s/pages/ep/site/elpais.com/portada" type="application/rss+xml" title="RSS de la portada de El PaĆ­s"/> <link rel="canonical" href="https://elpais.com"/> Would print (second line is incorrect). https://feeds.elpais.com/mrss-s/pages/ep/site/elpais.com/portada application/rss+xml https://elpais.com/ application/rss+xml Now prints: https://feeds.elpais.com/mrss-s/pages/ep/site/elpais.com/portada application/rss+xml Fix: reset it also at the start of a <link> tag in this case (for <base href /> it is still not wanted).
2020-10-24bump version to 0.9.19Hiltjo Posthuma
2020-10-22sfeed_web: whoops, fix bug mentioned in the previous commitHiltjo Posthuma
(ascii.jp)
2020-10-22sfeed_web: attribute parsing improvements, improve man pageHiltjo Posthuma
Fix attribute parsing and now decode entities. The following now works (from helsinkitimes.fi): <base href="https://www.helsinkitimes.fi/" /> <link href="/?format=feed&amp;type=rss" rel="alternate" type="application/rss+xml" title="RSS 2.0" /> <link href="/?format=feed&amp;type=atom" rel="alternate" type="application/atom+xml" title="Atom 1.0" /> Properly associate attributes with the actual tag, this now parses properly (from ascii.jp). <link rel="apple-touch-icon-precomposed" href="/img/apple-touch-icon.png" /> <link rel="alternate" type="application/rss+xml" />
2020-10-22Do not change the referenced matched tag data (from gettag()).Hiltjo Posthuma
Fixes a regression introduced in the refactor in commit e43b7a48b08a6bbcb4e730e80395b3257681b33e Now copy the data by value. This structure is small and no performance regression has been seen. This was because the tag ID was modified which made subsequent parsed tags of this type behave strangely: ctx.tag->id = RSSTagGuidPermalinkTrue; Input data to reproduce: <rss> <channel> <item> <guid isPermaLink="false">https://def/</guid> </item> <item> <guid>https://abc/</guid> </item> </channel> </rss>
2020-10-21README: filter example, filter Google Analytics utm_* parametersHiltjo Posthuma
https://support.google.com/analytics/answer/1033867?hl=nl
2020-10-21sfeed_web: reset feedlink bufferHiltjo Posthuma
Noticed strange output on the site ascii.jp: The site HTML contained: <link rel="apple-touch-icon-precomposed" href="/img/apple-touch-icon.png" /> <link rel="alternate" type="application/rss+xml" /> This would print: "/img/apple-touch-icon.png application/rss+xml" Now it prints: " application/rss+xml"
2020-10-18README: improve etag example with escaping of the filenameHiltjo Posthuma
Use the same base filename as the feed file, because sfeed_update replaces '/' in names with '_': filename="$(printf '%s' "$1" | tr '/' '_')" This fixes the example for fetching feeds with names containing '/'. Reported by __20h__, thanks!
2020-10-18README: add example to support ETag cachingHiltjo Posthuma
2020-10-18xml.c: initialize i = 0Hiltjo Posthuma
Forgot it in the cleanup commit 37afcf334fa1ba0b668bde08e8fcaaa9fd7dfa0d
2020-10-16README.xml: reference examples, ANSI compatible, mention original parserHiltjo Posthuma
2020-10-16README: fix unescaped character in regex in awk in filter exampleHiltjo Posthuma
Found by testing using mawk.
2020-10-12add a comment about the intended date priorityHiltjo Posthuma
2020-10-12Revert "RSS: give Dublin Core <dc:date> higher priority over <pubDate>"Hiltjo Posthuma
This reverts commit a1516cb7869a0dd99ebaacf846ad4161f2b9b9a2.
2020-10-12README: filter example: strip Facebook fbclid parameterHiltjo Posthuma
2020-10-12simplify time parsingHiltjo Posthuma
2020-10-12remove unneeded check for NUL terminatorHiltjo Posthuma
2020-10-12RSS: give Dublin Core <dc:date> higher priority over <pubDate>Hiltjo Posthuma
This way dc:date could be the updated time of the item. For Atom there is <published> and <updated> with the same logic.
2020-10-12parse categories, add multiple field values support (for categories)Hiltjo Posthuma
Fields with multiple values are separated by '|'. In the future multiple enclosure support might be added. The categories tags are now parsed. This feature is useful for filtering and categorizing. Parsing of nested tags such as <author><name> has been improved. This code has been refactored. RSS <guid> isPermaLink is now handled differently also and will now prefer a permalink with "true" (link) over the ID. In practise multiple <guid> in an item does not happen.
2020-10-09xml: remove unused code for sfeedHiltjo Posthuma
2020-10-09fix counting due to uninitialized variable when the time could not be parsedHiltjo Posthuma
Since commit 276d5789fd91d1cbe84b7baee736dea28b1e04c0 if the time is empty or could not be parsed then it is shown/aligned as a blank space instead of being skipped. An oversight in this change was that items should be counted and set in `isnew`. This commit fixes the uninitialized variable and possible miscounting.
2020-10-09xml.h: minor comment rewordingHiltjo Posthuma
2020-10-09sfeed: parse day with max 2 digits (instead of 4)Hiltjo Posthuma
2020-10-09sfeed: support the ISO8601 time format without separatorsHiltjo Posthuma
For example "19720229T132245Z" is now supported.
2020-10-09README: tested with cproc and sdcc on Z80 emulator, for funHiltjo Posthuma
cproc: cproc: https://github.com/michaelforney/cproc qbe: https://c9x.me/compile/ z80 (sfeed base program) fuzix: http://www.fuzix.org/ RC2014 emulator: https://github.com/EtchedPixels/RC2014 sdcc: http://sdcc.sourceforge.net/
2020-10-09man pages: tweak alignment of listsHiltjo Posthuma
2020-10-09xml.c: remove buffering of comment data, which is unused anywayHiltjo Posthuma
2020-10-09xml.h: add underscore for #ifdef guardHiltjo Posthuma
This is the common style.
2020-10-09XML cdata callback: handle CDATA as dataHiltjo Posthuma
This improves handling CDATA for example in Atom feeds with: <author><email><![CDATA[abc]]><name><![CDATA[[person]]></name></author>
2020-07-06bump version to 0.9.18Hiltjo Posthuma
2020-07-05sfeed_atom: minor simplification, gmtime_r is not needed hereHiltjo Posthuma
2020-07-05README: reference sfeed_cursesHiltjo Posthuma
2020-07-05README: improvementsHiltjo Posthuma
- Add an example to optimize bandwidth use with the curl -z option. - Add a note about CDNs blocking based on the User-Agent (based on a question mailed to me). - Add an script to convert existing newsboat items to the sfeed(5) TSV format.
2020-07-05format tools: don't skip items with a missing/invalid timestamp fieldHiltjo Posthuma
Handle it appropriately in the context of each format tool. Output the item but keep it blanked. NOTE: maybe in sfeed_twtxt it should use the current time instead?
2020-07-05sfeed_mbox: don't ignore items with a missing/invalid timestampHiltjo Posthuma
The Date header is mandatory. Use the current time if it is missing/invalid.
2020-07-05sfeed_atom: the updated field is mandatory: use the current time...Hiltjo Posthuma
... if it is missing/invalid.
2020-07-05sfeed_atom: fix timezone, output if timestamp is setHiltjo Posthuma
Timezone should be GMT (as intended), do not convert to localtime.
2020-06-25README: small tweaks and a filter example improvementHiltjo Posthuma
This is a "quick&dirty" regex to block some of the typical 1px width or height tracking pixels.
2020-06-21sfeed_html/sfeed_frames: simplify struct feed allocationHiltjo Posthuma
There's no need for a dynamic struct feed **. The required size is known (argc). Just allocate it in one go.
2020-06-21Makefile: tiny compatibility improvement for tar -cfHiltjo Posthuma
2020-06-10Makefile: pedantic change: use ar -rc instead of ar rcHiltjo Posthuma
2020-06-04sfeed.{1,5}: clarify the timestamp field a bitHiltjo Posthuma
In particular for RSS feeds where a pubDate is optional.
2020-06-04sfeed_atom: make the output more conformHiltjo Posthuma
- Set mandatory entry tags: id, updated. - Change entry published (optional tag) to updated (mandatory). - Add <feed> tags: author name, id, updated, title. Thanks lich for the feedback and testing.
2020-06-01fix typoHiltjo Posthuma
2020-05-28sfeed: simplify/optimize checking end tags while inside a RSS/Atom tagHiltjo Posthuma
Instead of a binary search do set a pointer to the assigned expected end tag. This makes more sense and is also a minor optimization. No behavioural change intended.
2020-05-27util: encodeuri: simplify conditionHiltjo Posthuma
iscntrl is c < ' ' || c == 127 I want to encode a space and everything above 127 also. So this condition can be simplified to this.
2020-05-15README: fix indentation for fdm.conf examplesHiltjo Posthuma
No functional difference, but it should improve readability.
2020-05-13sfeed_gopher: if a gopher url cannot be parsed then show it anyway as a "URL:"Hiltjo Posthuma
This should never be able to happen though in practise because sfeed parses the uri aswell.
2020-05-13sfeed_gopher: do not use URL: prefix for gopher:// urls.Hiltjo Posthuma
Support the Gopher protocol directly and use the specified Gopher type. Idea by adc, thanks!