Age | Commit message (Collapse) | Author |
|
|
|
also reduce size of return type (32-bit+ should be enough).
|
|
|
|
see RFC2822 4.3 page 32:
"
[...]
However, because of the error in
[RFC822], they SHOULD all be considered equivalent to "-0000" unless
there is out-of-band information confirming their meaning.
"
|
|
- handle type attribute for MRSS media:description,
media:description type="plain" is now parsed properly.
- handle default content-types per tag now.
- when multiple content-like fields are specified use the proper content-type.
- be flexible about type attribute handling.
- minor code tweaks.
|
|
This is useful for example for podcasts (audio attachment), newsposts (usually
some image) or comic strips (link to page, image as enclosure).
thanks leot for the feedback!
|
|
This reduces much function call overhead. getnext is defined in xml.h for
inline optimization. sfeed only uses one XML parser context per program, this
allows further optimizations of the compiler also.
On OpenBSD it was noticable because of retpoline etc function call overhead.
Using clang and a 500MB test XML file reduces processing time from +- 12s to
5s.
Tested using some crazy optimization flags:
SFEED_CFLAGS = -O3 -std=c99 -DGETNEXT=getchar_unlocked -fno-ret-protector \
-mno-retpoline -static
A GETNEXT macro is also nice for programs which mmap(2) some big XML file. Then
you can simply define:
#define GETNEXT() (off >= len ? EOF : reg[off++])
|
|
|
|
|
|
|
|
this style change is useful for my local coverage profile.
|
|
|
|
In RSS2 (but not RSS0.9), a <link> is optional and it can also be specified by
<guid isPermaLink="true"> (permalink is "true" by default).
When a <link> is also present this will be used instead of the GUID permalink.
|
|
|
|
the Atom link parsing is more strict now and checks the rel attribute. When the
rel attribute is empty it is handled as a normal link ("alternate").
This makes sure when an link with an other type is specified (such as
"enclosure", "related", "self" or "via") before a link it is not used.
sfeed does not handle enclosures, but the code is reworked so it is very simple
to add this. Enclosures are often used for example to attach some image to a
newspost or an audio file to a podcast.
|
|
|
|
Noticed in the webcomic "amphibian":
http://amphibian.com/feeds/atom
|
|
... and abstract printing timetamp and uri to string_print_{timestamp,uri}
similar to string_print_trimmed (normal string) and string_print_encoded
(content).
Noticed with whitespace around the field in the webcomic "amphibian":
http://amphibian.com/feeds/atom
|
|
|
|
|
|
|
|
- reorder and remove a goto.
- no need for a separate variable "end".
- don't use s[0] style because the pointer was changed.
|
|
noticed in "RMS notes" RSS.
|
|
- cast all ctype(3) function argument to (unsigned char) to avoid UB
POSIX says:
"The c argument is an int, the value of which the application shall ensure is a
character representable as an unsigned char or equal to the value of the macro
EOF. If the argument has any other value, the behavior is undefined."
Many libc cast implicitly the value, but NetBSD does not, which is probably the
correct thing to interpret it.
- no need to cast for putchar + rename some fputc(..., stdout) to putchar
POSIX says:
"The fputc() function shall write the byte specified by c (converted to an
unsigned char) to the output stream pointed to by stream [...]"
Major thanks to Leonardo Taccari <iamleot@gmail.com> for reporting and testing
it on NetBSD!
|
|
the uint* types in XML are not exposed anymore.
|
|
This makes sure xml.c in particular can be compiled without further
feature macros.
|
|
|
|
|
|
|
|
|
|
Remove type of feed per item, it is not that interesting. sfeed(1) can parse
both RSS and Atom feeds.
|
|
|
|
use long long: atleast 32-bit, but now time_t (real) to 32-bit or
64-bit is supported. Long long is C99 though, but that is fine.
check errno, it can have ERANGE.
|
|
|
|
|
|
|
|
- less overhead (we only need GMT time) so no setenv("TZ", ...) tzset() crap.
- timezone format (for example %z in strptime) is non-standard,
this will add some lines of code and some complexity to our code though, but
the trade-off is worth it imho.
|
|
|
|
|
|
|
|
add some detail to the comments
|
|
|
|
|
|
|
|
|
|
- pledge tools and add define to enable it on platforms that support it, currently
only OpenBSD 5.9+
- separate getline and parseline functionality.
- use murmur3 hash instead of jenkins1: faster and less collisions.
- make some error messages a bit more clear, for example with path truncation.
- some small cleanups, move printutf8pad to util.
|
|
|
|
|
|
|
|
This reverts commit 5e43bd658e578ced54f6065e95f6efb4892e114c.
It is a neat bit trick, but it doesn't matter much in thiscase and it's
less readable and possibly less portable.
|