Age | Commit message (Collapse) | Author |
|
|
|
A numeric entity could be 5 bytes, so use a round number of 8 bytes.
|
|
This is not clearly defined by the C99 standard.
Define ctype-like macros to force it to be ASCII / UTF-8 (not extended ASCII or
something like noticed on OpenBSD 3.8).
(In practise modern libc libraries are all ASCII and UTF-8-compatible. Otherwise
this would break many programs)
|
|
It is unspecified if the C locale iscntrl is compatible with ASCII or not.
Noticed when testing on OpenBSD 3.8 which uses extended ASCII and also uses the
C1 range for control-characters. This breaks support with UTF-8.
Reference:
https://en.wikipedia.org/wiki/C0_and_C1_control_codes#C1_control_codes_for_general_use
C1 table.
Force an own definition of an ASCII-compatible control-character range since
sfeed expects input to be UTF-8 (or converted from iconv) and so output to be
UTF-8 aswell.
|
|
This also makes the programs exit with a non-zero status when a read or write
error occurs.
This makes checking the exit status more reliable in scripts.
A simple example to simulate a disk with no space left:
curl -s 'https://codemadness.org/atom.xml' | sfeed > f
/mnt/test: write failed, file system is full
echo $?
0
Which now produces:
curl -s 'https://codemadness.org/atom.xml' | sfeed > f
/mnt/test: write failed, file system is full
write error: <stdout>
echo $?
1
Tested with a small mfs on OpenBSD, fstab entry:
swap /mnt/test mfs rw,nodev,nosuid,-s=1M 0 0
|
|
These are BSD functions.
- HaikuOS now compiles without having to use libbsd.
- Tested on SerenityOS (for fun), which doesn't have these functions (yet).
With a small change to support wcwidth() sfeed works on SerenityOS.
|
|
|
|
|
|
|
|
This nested format is used by Mozilla Thunderbird:
<outline title="">
<outline type="rss" title="" text="" xmlUrl=""/>
</outline>
other tests:
- newsboat -e uses the title attribute, not text.
- rss2email 3.x, r2e opmlexport uses type, text and xmlUrl (correct).
|
|
Use the same XML handlers names as the fields to improve readability a bit.
This is also consistent across all the programs using xml.c. Shorten the
variable names aswell.
|
|
This reduces much function call overhead. getnext is defined in xml.h for
inline optimization. sfeed only uses one XML parser context per program, this
allows further optimizations of the compiler also.
On OpenBSD it was noticable because of retpoline etc function call overhead.
Using clang and a 500MB test XML file reduces processing time from +- 12s to
5s.
Tested using some crazy optimization flags:
SFEED_CFLAGS = -O3 -std=c99 -DGETNEXT=getchar_unlocked -fno-ret-protector \
-mno-retpoline -static
A GETNEXT macro is also nice for programs which mmap(2) some big XML file. Then
you can simply define:
#define GETNEXT() (off >= len ? EOF : reg[off++])
|
|
- cast all ctype(3) function argument to (unsigned char) to avoid UB
POSIX says:
"The c argument is an int, the value of which the application shall ensure is a
character representable as an unsigned char or equal to the value of the macro
EOF. If the argument has any other value, the behavior is undefined."
Many libc cast implicitly the value, but NetBSD does not, which is probably the
correct thing to interpret it.
- no need to cast for putchar + rename some fputc(..., stdout) to putchar
POSIX says:
"The fputc() function shall write the byte specified by c (converted to an
unsigned char) to the output stream pointed to by stream [...]"
Major thanks to Leonardo Taccari <iamleot@gmail.com> for reporting and testing
it on NetBSD!
|
|
the uint* types in XML are not exposed anymore.
|
|
the shell escape \' was a mistake.
|
|
This makes sure xml.c in particular can be compiled without further
feature macros.
|
|
make sure to escape them.
|
|
|
|
|
|
|
|
- pledge tools and add define to enable it on platforms that support it, currently
only OpenBSD 5.9+
- separate getline and parseline functionality.
- use murmur3 hash instead of jenkins1: faster and less collisions.
- make some error messages a bit more clear, for example with path truncation.
- some small cleanups, move printutf8pad to util.
|
|
|
|
|
|
|
|
|
|
- support attribute entities better.
- escape some characters (it's still recommended to check the output ofcourse).
|
|
|
|
also:
- rename xmlparser_ prefix to xml_.
- make xml_parse public, this allows a custom reader like a direct mmap,
see: XMLParser.getnext and (optionall) XMLParser.getnext_data.
- improve the README text.
|
|
|
|
|
|
|
|
|
|
|
|
- dont free at end (not needed in our case).
- use 0 and 1 instead of EXIT_SUCCESS, EXIT_FAILURE.
- use err (from err.h) instead of custom die().
|
|
|
|
Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>
|
|
Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>
|
|
Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>
|
|
Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>
|
|
lots of things changed, but cleanup todo. changelog and consistent stream of small updates will come in the future.
Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>
|
|
Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>
|
|
Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>
|
|
just use a static buffer
Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>
|
|
Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>
|
|
Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>
|
|
Because I'm adding a sfeed_opml_export script.
Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>
|