Age | Commit message (Collapse) | Author | |
---|---|---|---|
2020-10-09 | xml: remove unused code for sfeed | Hiltjo Posthuma | |
2020-10-09 | xml.c: remove buffering of comment data, which is unused anyway | Hiltjo Posthuma | |
2020-06-01 | fix typo | Hiltjo Posthuma | |
2020-01-24 | cleanup some includes | Hiltjo Posthuma | |
2020-01-18 | improve XML entity conversion | Hiltjo Posthuma | |
- return -1 for invalid XML entities. - separate between NUL (�) and invalid entities: although both are unwanted in sfeed. - validate the number range more strictly and don't wrap to unsigned. entities lik: "&#-1;" are handled as invalid now. "&#;" is also invalid instead of the same as "�". | |||
2019-11-22 | xml.c: upper-case named-entities are invalid in XML | Hiltjo Posthuma | |
Named entities are case-sensitive and in XML lower-case. (In HTML some of these are valid. Although ' is invalid there too). References: 4.6 Predefined entities: https://www.w3.org/TR/xml/#sec-predefined-ent In the definition of "match": https://www.w3.org/TR/xml/#dt-match "No case folding is performed." | |||
2019-06-11 | xml: improve cdata and comment callback logic | Hiltjo Posthuma | |
it used to call both handlers twice at the end for "-->" (comment) or "]]>" (CDATA) with the data "" and length 0. Now it is only called when non-empty. The start and end handlers can still be used. | |||
2019-03-16 | xml: write x->getnext to a default GETNEXT macro | Hiltjo Posthuma | |
this allows to override x->getnext to expand to global context parsing and allows the compiler to optimize this inline. also remove checking if the x->getnext function exists (just crash hard). | |||
2019-01-08 | xml: remove unnecesary checks | Hiltjo Posthuma | |
- reduce amount of data to check. - remove unnecesary checks from (now) internal functions. | |||
2018-12-02 | XML tag parse improvements for PI and end tags | Hiltjo Posthuma | |
- Stricter parsing of tags, no whitespace stripping after <. - For end tags the "internal" context x->tag would be "/sometag". Make sure this matches exactly with the parameter tag. - Reset tagname after parsing an end tag. - Make end tag handling more consistent. - Remove temporary variable taglen. | |||
2018-08-26 | xml: use ANSI types and struct initialization | Hiltjo Posthuma | |
long is atleast 32-bits, codepointtoutf8() works with >= 32-bit types. Valid codepoint ranges are not larger than this. unsigned char is not needed because converted unicode bytes don't use this range. tested all valid codepoints and output on amd64, i386 and SPARC64. | |||
2018-08-23 | xml: remove TODO comments and add a note | Hiltjo Posthuma | |
2018-08-22 | xml: improve parsing of invalid attribute values separated by whitespace | Hiltjo Posthuma | |
It is invalid XML, but this allows parsing old HTML pages aswell. For example: <input id=cb checked type="checkbox" title='checkbox' /> or <FONT FACE=wingdings SIZE=12><BLINK>oh hai</BLINK></FONT> | |||
2018-08-22 | xml: improve handling of invalid long data entities | Hiltjo Posthuma | |
this also fixes an issue with truncating and missing data on invalid input. | |||
2018-08-21 | xml: rewrite codepointtoutf8 function | Hiltjo Posthuma | |
No more converting to a uint32_t type. Just convert to a byte buffer. Tested on little- and big-endian. The code should be more clear too hopefully. | |||
2018-08-21 | xml: don't reset internal tagname when parsing non-tag types like CDATA | Hiltjo Posthuma | |
... this affects "tags" starting with < such as CDATA and processing instructions. | |||
2018-08-21 | xml: fix missing first byte when parsing a long incorrect attribute entity | Hiltjo Posthuma | |
... the entity had to be invalid (start with &) and longer than the buffer size. + tiny style fix. | |||
2018-08-21 | xml: interface change: make some functions private | Hiltjo Posthuma | |
... this does not expose the uint* types either. | |||
2018-08-21 | xml: increase allowed size of attribute names | Hiltjo Posthuma | |
2018-08-16 | XML parser: numeric entity: check unicode codepoint range | Hiltjo Posthuma | |
2018-03-11 | include <sys/types.h> for types size_t, ssize_t etc | Hiltjo Posthuma | |
This makes sure xml.c in particular can be compiled without further feature macros. | |||
2018-03-11 | xml: improve comment parsing | Hiltjo Posthuma | |
note that ---> is officially invalid XML, but we allow it anyway. | |||
2018-03-11 | xml: fix parsing of cdata when a handler is unset | Hiltjo Posthuma | |
2018-03-11 | xml: improve CDATA parsing | Hiltjo Posthuma | |
thanks Svyatoslav Mishyn for the feedback! | |||
2017-12-24 | xml: make name entities static, minor clarifications | Hiltjo Posthuma | |
2016-04-10 | xml: stricter check of entity: must end with ';', ... | Hiltjo Posthuma | |
... zero output buffer if codepoint length is 0 | |||
2015-08-22 | xml: fix includes | Hiltjo Posthuma | |
2015-08-22 | xml: simplify XML reader | Hiltjo Posthuma | |
2015-08-16 | xml: change xml_parse_string to xml_parse_buf | Hiltjo Posthuma | |
In the parser itself allow reading '\0' in the XML itself. Add a length parameter to specify the buffer size. | |||
2015-08-14 | minor code-style improvements | Hiltjo Posthuma | |
2015-08-14 | xml: whoops, remove leftover xml_getnext_stdin | Hiltjo Posthuma | |
2015-08-14 | xml: separate reader context from parser | Hiltjo Posthuma | |
also: - rename xmlparser_ prefix to xml_. - make xml_parse public, this allows a custom reader like a direct mmap, see: XMLParser.getnext and (optionall) XMLParser.getnext_data. - improve the README text. | |||
2015-08-08 | xml: move entity to namedentitystr() | Hiltjo Posthuma | |
2015-08-06 | xml: remove forced __inline__ attribute | Hiltjo Posthuma | |
2015-08-06 | general cleanups | Hiltjo Posthuma | |
2015-08-01 | xml: only allow full uppercase or full lowercase for entities | Hiltjo Posthuma | |
2015-07-31 | xml: fix xml_namedentitytostr loop | Hiltjo Posthuma | |
2015-07-31 | xml: fix missing include strings.h, for strncasecmp | Hiltjo Posthuma | |
2015-07-29 | improve includes (dont include headers in .h), fix build on Linux | Hiltjo Posthuma | |
2015-07-28 | improve code-style and consistency | Hiltjo Posthuma | |
2015-06-23 | xml: fix comment issue, improve cdata and comment while encountering separator | Hiltjo Posthuma | |
2015-06-22 | xml: fix cdata issue | Hiltjo Posthuma | |
2015-06-21 | separate xml specific code into xml.c | Hiltjo Posthuma | |
2015-06-21 | xml.c: fix empty cdata callback | Hiltjo Posthuma | |
2015-05-16 | xml: only call data handler if set | Hiltjo Posthuma | |
2015-05-16 | xml: call parse | Hiltjo Posthuma | |
2015-05-16 | xml: attrentity handler will be called if set | Hiltjo Posthuma | |
it used to be if attrentity is NULL it would call attrdata. | |||
2015-05-16 | xml: allow to read from fd or string buffer | Hiltjo Posthuma | |
+ minor code style. | |||
2014-11-17 | code-style, ugly test-code (remove later) | Hiltjo Posthuma | |
2014-11-11 | comment style | Hiltjo Posthuma | |