Age | Commit message (Collapse) | Author |
|
This happens because the previous link type is not reset when a <link> tag
starts again, but it is reset when a type attribute starts.
Found on the spanish newspaper site: elpais.com
Input:
<link rel="alternate" href="https://feeds.elpais.com/mrss-s/pages/ep/site/elpais.com/portada" type="application/rss+xml" title="RSS de la portada de El PaĆs"/>
<link rel="canonical" href="https://elpais.com"/>
Would print (second line is incorrect).
https://feeds.elpais.com/mrss-s/pages/ep/site/elpais.com/portada application/rss+xml
https://elpais.com/ application/rss+xml
Now prints:
https://feeds.elpais.com/mrss-s/pages/ep/site/elpais.com/portada application/rss+xml
Fix: reset it also at the start of a <link> tag in this case (for <base href />
it is still not wanted).
|
|
(ascii.jp)
|
|
Fix attribute parsing and now decode entities. The following now works (from
helsinkitimes.fi):
<base href="https://www.helsinkitimes.fi/" />
<link href="/?format=feed&type=rss" rel="alternate" type="application/rss+xml" title="RSS 2.0" />
<link href="/?format=feed&type=atom" rel="alternate" type="application/atom+xml" title="Atom 1.0" />
Properly associate attributes with the actual tag, this now parses properly
(from ascii.jp).
<link rel="apple-touch-icon-precomposed" href="/img/apple-touch-icon.png" />
<link rel="alternate" type="application/rss+xml" />
|
|
Noticed strange output on the site ascii.jp:
The site HTML contained:
<link rel="apple-touch-icon-precomposed" href="/img/apple-touch-icon.png" />
<link rel="alternate" type="application/rss+xml" />
This would print:
"/img/apple-touch-icon.png application/rss+xml"
Now it prints:
" application/rss+xml"
|
|
- Fix a theoretical issue where "found" can overflow and return a zero exit
status when there are many feeds found.
- When there are no RSS/Atom feeds this is not an error, so return 0.
- Style: change unsigned int to int.
|
|
|
|
|
|
This reduces much function call overhead. getnext is defined in xml.h for
inline optimization. sfeed only uses one XML parser context per program, this
allows further optimizations of the compiler also.
On OpenBSD it was noticable because of retpoline etc function call overhead.
Using clang and a 500MB test XML file reduces processing time from +- 12s to
5s.
Tested using some crazy optimization flags:
SFEED_CFLAGS = -O3 -std=c99 -DGETNEXT=getchar_unlocked -fno-ret-protector \
-mno-retpoline -static
A GETNEXT macro is also nice for programs which mmap(2) some big XML file. Then
you can simply define:
#define GETNEXT() (off >= len ? EOF : reg[off++])
|
|
|
|
|
|
- cast all ctype(3) function argument to (unsigned char) to avoid UB
POSIX says:
"The c argument is an int, the value of which the application shall ensure is a
character representable as an unsigned char or equal to the value of the macro
EOF. If the argument has any other value, the behavior is undefined."
Many libc cast implicitly the value, but NetBSD does not, which is probably the
correct thing to interpret it.
- no need to cast for putchar + rename some fputc(..., stdout) to putchar
POSIX says:
"The fputc() function shall write the byte specified by c (converted to an
unsigned char) to the output stream pointed to by stream [...]"
Major thanks to Leonardo Taccari <iamleot@gmail.com> for reporting and testing
it on NetBSD!
|
|
the uint* types in XML are not exposed anymore.
|
|
This makes sure xml.c in particular can be compiled without further
feature macros.
|
|
|
|
|
|
|
|
|
|
|
|
- pledge tools and add define to enable it on platforms that support it, currently
only OpenBSD 5.9+
- separate getline and parseline functionality.
- use murmur3 hash instead of jenkins1: faster and less collisions.
- make some error messages a bit more clear, for example with path truncation.
- some small cleanups, move printutf8pad to util.
|
|
|
|
|
|
|
|
|
|
|
|
also:
- rename xmlparser_ prefix to xml_.
- make xml_parse public, this allows a custom reader like a direct mmap,
see: XMLParser.getnext and (optionall) XMLParser.getnext_data.
- improve the README text.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- dont free at end (not needed in our case).
- use 0 and 1 instead of EXIT_SUCCESS, EXIT_FAILURE.
- use err (from err.h) instead of custom die().
|
|
|
|
|
|
Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>
|
|
reorder static -> public xml functions.
Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>
|
|
Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>
|
|
Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>
|
|
Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>
|
|
Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>
|
|
lots of things changed, but cleanup todo. changelog and consistent stream of small updates will come in the future.
Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>
|