summaryrefslogtreecommitdiff
path: root/xml.c
AgeCommit message (Collapse)Author
2020-01-18improve XML entity conversionHiltjo Posthuma
- return -1 for invalid XML entities. - separate between NUL (�) and invalid entities: although both are unwanted in sfeed. - validate the number range more strictly and don't wrap to unsigned. entities lik: "&#-1;" are handled as invalid now. "&#;" is also invalid instead of the same as "�".
2019-11-22xml.c: upper-case named-entities are invalid in XMLHiltjo Posthuma
Named entities are case-sensitive and in XML lower-case. (In HTML some of these are valid. Although ' is invalid there too). References: 4.6 Predefined entities: https://www.w3.org/TR/xml/#sec-predefined-ent In the definition of "match": https://www.w3.org/TR/xml/#dt-match "No case folding is performed."
2019-06-11xml: improve cdata and comment callback logicHiltjo Posthuma
it used to call both handlers twice at the end for "-->" (comment) or "]]>" (CDATA) with the data "" and length 0. Now it is only called when non-empty. The start and end handlers can still be used.
2019-03-16xml: write x->getnext to a default GETNEXT macroHiltjo Posthuma
this allows to override x->getnext to expand to global context parsing and allows the compiler to optimize this inline. also remove checking if the x->getnext function exists (just crash hard).
2019-01-08xml: remove unnecesary checksHiltjo Posthuma
- reduce amount of data to check. - remove unnecesary checks from (now) internal functions.
2018-12-02XML tag parse improvements for PI and end tagsHiltjo Posthuma
- Stricter parsing of tags, no whitespace stripping after <. - For end tags the "internal" context x->tag would be "/sometag". Make sure this matches exactly with the parameter tag. - Reset tagname after parsing an end tag. - Make end tag handling more consistent. - Remove temporary variable taglen.
2018-08-26xml: use ANSI types and struct initializationHiltjo Posthuma
long is atleast 32-bits, codepointtoutf8() works with >= 32-bit types. Valid codepoint ranges are not larger than this. unsigned char is not needed because converted unicode bytes don't use this range. tested all valid codepoints and output on amd64, i386 and SPARC64.
2018-08-23xml: remove TODO comments and add a noteHiltjo Posthuma
2018-08-22xml: improve parsing of invalid attribute values separated by whitespaceHiltjo Posthuma
It is invalid XML, but this allows parsing old HTML pages aswell. For example: <input id=cb checked type="checkbox" title='checkbox' /> or <FONT FACE=wingdings SIZE=12><BLINK>oh hai</BLINK></FONT>
2018-08-22xml: improve handling of invalid long data entitiesHiltjo Posthuma
this also fixes an issue with truncating and missing data on invalid input.
2018-08-21xml: rewrite codepointtoutf8 functionHiltjo Posthuma
No more converting to a uint32_t type. Just convert to a byte buffer. Tested on little- and big-endian. The code should be more clear too hopefully.
2018-08-21xml: don't reset internal tagname when parsing non-tag types like CDATAHiltjo Posthuma
... this affects "tags" starting with < such as CDATA and processing instructions.
2018-08-21xml: fix missing first byte when parsing a long incorrect attribute entityHiltjo Posthuma
... the entity had to be invalid (start with &) and longer than the buffer size. + tiny style fix.
2018-08-21xml: interface change: make some functions privateHiltjo Posthuma
... this does not expose the uint* types either.
2018-08-21xml: increase allowed size of attribute namesHiltjo Posthuma
2018-08-16XML parser: numeric entity: check unicode codepoint rangeHiltjo Posthuma
2018-03-11include <sys/types.h> for types size_t, ssize_t etcHiltjo Posthuma
This makes sure xml.c in particular can be compiled without further feature macros.
2018-03-11xml: improve comment parsingHiltjo Posthuma
note that ---> is officially invalid XML, but we allow it anyway.
2018-03-11xml: fix parsing of cdata when a handler is unsetHiltjo Posthuma
2018-03-11xml: improve CDATA parsingHiltjo Posthuma
thanks Svyatoslav Mishyn for the feedback!
2017-12-24xml: make name entities static, minor clarificationsHiltjo Posthuma
2016-04-10xml: stricter check of entity: must end with ';', ...Hiltjo Posthuma
... zero output buffer if codepoint length is 0
2015-08-22xml: fix includesHiltjo Posthuma
2015-08-22xml: simplify XML readerHiltjo Posthuma
2015-08-16xml: change xml_parse_string to xml_parse_bufHiltjo Posthuma
In the parser itself allow reading '\0' in the XML itself. Add a length parameter to specify the buffer size.
2015-08-14minor code-style improvementsHiltjo Posthuma
2015-08-14xml: whoops, remove leftover xml_getnext_stdinHiltjo Posthuma
2015-08-14xml: separate reader context from parserHiltjo Posthuma
also: - rename xmlparser_ prefix to xml_. - make xml_parse public, this allows a custom reader like a direct mmap, see: XMLParser.getnext and (optionall) XMLParser.getnext_data. - improve the README text.
2015-08-08xml: move entity to namedentitystr()Hiltjo Posthuma
2015-08-06xml: remove forced __inline__ attributeHiltjo Posthuma
2015-08-06general cleanupsHiltjo Posthuma
2015-08-01xml: only allow full uppercase or full lowercase for entitiesHiltjo Posthuma
2015-07-31xml: fix xml_namedentitytostr loopHiltjo Posthuma
2015-07-31xml: fix missing include strings.h, for strncasecmpHiltjo Posthuma
2015-07-29improve includes (dont include headers in .h), fix build on LinuxHiltjo Posthuma
2015-07-28improve code-style and consistencyHiltjo Posthuma
2015-06-23xml: fix comment issue, improve cdata and comment while encountering separatorHiltjo Posthuma
2015-06-22xml: fix cdata issueHiltjo Posthuma
2015-06-21separate xml specific code into xml.cHiltjo Posthuma
2015-06-21xml.c: fix empty cdata callbackHiltjo Posthuma
2015-05-16xml: only call data handler if setHiltjo Posthuma
2015-05-16xml: call parseHiltjo Posthuma
2015-05-16xml: attrentity handler will be called if setHiltjo Posthuma
it used to be if attrentity is NULL it would call attrdata.
2015-05-16xml: allow to read from fd or string bufferHiltjo Posthuma
+ minor code style.
2014-11-17code-style, ugly test-code (remove later)Hiltjo Posthuma
2014-11-11comment styleHiltjo Posthuma
2014-11-11fix typo in man pageHiltjo Posthuma
2014-06-28xml: fix attribute without data:Hiltjo Posthuma
<input checked /> Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>
2014-06-27small fixesHiltjo Posthuma
reorder static -> public xml functions. Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>
2014-05-08style: linewrap, etcHiltjo Posthuma
Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>