Age | Commit message (Collapse) | Author |
|
|
|
|
|
Feeds should contain absolute urls, but if it does not have it then this makes
it more convenient to configure such feeds.
|
|
|
|
sfeed_xmlenc is used automatically in sfeed_update for detecting the encoding.
In particular do not allow slashes anymore either. For example "//IGNORE" and
"//TRANSLIT" which are normally allowed.
Some iconv implementation might allow other funky names or even pathnames too,
so disallow that.
See also the notes about the "frommap" for the "-f" option.
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/iconv.html
+ some minor parsing handling improvements.
|
|
This happens because the previous link type is not reset when a <link> tag
starts again, but it is reset when a type attribute starts.
Found on the spanish newspaper site: elpais.com
Input:
<link rel="alternate" href="https://feeds.elpais.com/mrss-s/pages/ep/site/elpais.com/portada" type="application/rss+xml" title="RSS de la portada de El PaĆs"/>
<link rel="canonical" href="https://elpais.com"/>
Would print (second line is incorrect).
https://feeds.elpais.com/mrss-s/pages/ep/site/elpais.com/portada application/rss+xml
https://elpais.com/ application/rss+xml
Now prints:
https://feeds.elpais.com/mrss-s/pages/ep/site/elpais.com/portada application/rss+xml
Fix: reset it also at the start of a <link> tag in this case (for <base href />
it is still not wanted).
|
|
|
|
(ascii.jp)
|
|
Fix attribute parsing and now decode entities. The following now works (from
helsinkitimes.fi):
<base href="https://www.helsinkitimes.fi/" />
<link href="/?format=feed&type=rss" rel="alternate" type="application/rss+xml" title="RSS 2.0" />
<link href="/?format=feed&type=atom" rel="alternate" type="application/atom+xml" title="Atom 1.0" />
Properly associate attributes with the actual tag, this now parses properly
(from ascii.jp).
<link rel="apple-touch-icon-precomposed" href="/img/apple-touch-icon.png" />
<link rel="alternate" type="application/rss+xml" />
|
|
Fixes a regression introduced in the refactor in commit
e43b7a48b08a6bbcb4e730e80395b3257681b33e
Now copy the data by value. This structure is small and no performance
regression has been seen.
This was because the tag ID was modified which made subsequent parsed tags of
this type behave strangely:
ctx.tag->id = RSSTagGuidPermalinkTrue;
Input data to reproduce:
<rss>
<channel>
<item>
<guid isPermaLink="false">https://def/</guid>
</item>
<item>
<guid>https://abc/</guid>
</item>
</channel>
</rss>
|
|
https://support.google.com/analytics/answer/1033867?hl=nl
|
|
Noticed strange output on the site ascii.jp:
The site HTML contained:
<link rel="apple-touch-icon-precomposed" href="/img/apple-touch-icon.png" />
<link rel="alternate" type="application/rss+xml" />
This would print:
"/img/apple-touch-icon.png application/rss+xml"
Now it prints:
" application/rss+xml"
|
|
Use the same base filename as the feed file, because sfeed_update replaces '/'
in names with '_':
filename="$(printf '%s' "$1" | tr '/' '_')"
This fixes the example for fetching feeds with names containing '/'.
Reported by __20h__, thanks!
|
|
|
|
Forgot it in the cleanup commit 37afcf334fa1ba0b668bde08e8fcaaa9fd7dfa0d
|
|
|
|
Found by testing using mawk.
|
|
|
|
This reverts commit a1516cb7869a0dd99ebaacf846ad4161f2b9b9a2.
|
|
|
|
|
|
|
|
This way dc:date could be the updated time of the item. For Atom there is
<published> and <updated> with the same logic.
|
|
Fields with multiple values are separated by '|'. In the future multiple
enclosure support might be added.
The categories tags are now parsed. This feature is useful for filtering and
categorizing.
Parsing of nested tags such as <author><name> has been improved. This code has
been refactored.
RSS <guid> isPermaLink is now handled differently also and will now prefer a
permalink with "true" (link) over the ID. In practise multiple <guid> in an
item does not happen.
|
|
|
|
Since commit 276d5789fd91d1cbe84b7baee736dea28b1e04c0 if the time is empty or
could not be parsed then it is shown/aligned as a blank space instead of being
skipped.
An oversight in this change was that items should be counted and set in
`isnew`.
This commit fixes the uninitialized variable and possible miscounting.
|
|
|
|
|
|
For example "19720229T132245Z" is now supported.
|
|
cproc:
cproc: https://github.com/michaelforney/cproc
qbe: https://c9x.me/compile/
z80 (sfeed base program)
fuzix: http://www.fuzix.org/
RC2014 emulator: https://github.com/EtchedPixels/RC2014
sdcc: http://sdcc.sourceforge.net/
|
|
|
|
|
|
This is the common style.
|
|
This improves handling CDATA for example in Atom feeds with:
<author><email><![CDATA[abc]]><name><![CDATA[[person]]></name></author>
|
|
|
|
|
|
|
|
- Add an example to optimize bandwidth use with the curl -z option.
- Add a note about CDNs blocking based on the User-Agent (based on a question
mailed to me).
- Add an script to convert existing newsboat items to the sfeed(5) TSV format.
|
|
Handle it appropriately in the context of each format tool. Output the item but
keep it blanked.
NOTE: maybe in sfeed_twtxt it should use the current time instead?
|
|
The Date header is mandatory. Use the current time if it is missing/invalid.
|
|
... if it is missing/invalid.
|
|
Timezone should be GMT (as intended), do not convert to localtime.
|
|
This is a "quick&dirty" regex to block some of the typical 1px width or height
tracking pixels.
|
|
There's no need for a dynamic struct feed **. The required size is known
(argc). Just allocate it in one go.
|
|
|
|
|
|
In particular for RSS feeds where a pubDate is optional.
|
|
- Set mandatory entry tags: id, updated.
- Change entry published (optional tag) to updated (mandatory).
- Add <feed> tags: author name, id, updated, title.
Thanks lich for the feedback and testing.
|
|
|
|
Instead of a binary search do set a pointer to the assigned expected end tag.
This makes more sense and is also a minor optimization.
No behavioural change intended.
|