summaryrefslogtreecommitdiff
path: root/xml.c
AgeCommit message (Collapse)Author
2018-12-02XML tag parse improvements for PI and end tagsHiltjo Posthuma
- Stricter parsing of tags, no whitespace stripping after <. - For end tags the "internal" context x->tag would be "/sometag". Make sure this matches exactly with the parameter tag. - Reset tagname after parsing an end tag. - Make end tag handling more consistent. - Remove temporary variable taglen.
2018-08-26xml: use ANSI types and struct initializationHiltjo Posthuma
long is atleast 32-bits, codepointtoutf8() works with >= 32-bit types. Valid codepoint ranges are not larger than this. unsigned char is not needed because converted unicode bytes don't use this range. tested all valid codepoints and output on amd64, i386 and SPARC64.
2018-08-23xml: remove TODO comments and add a noteHiltjo Posthuma
2018-08-22xml: improve parsing of invalid attribute values separated by whitespaceHiltjo Posthuma
It is invalid XML, but this allows parsing old HTML pages aswell. For example: <input id=cb checked type="checkbox" title='checkbox' /> or <FONT FACE=wingdings SIZE=12><BLINK>oh hai</BLINK></FONT>
2018-08-22xml: improve handling of invalid long data entitiesHiltjo Posthuma
this also fixes an issue with truncating and missing data on invalid input.
2018-08-21xml: rewrite codepointtoutf8 functionHiltjo Posthuma
No more converting to a uint32_t type. Just convert to a byte buffer. Tested on little- and big-endian. The code should be more clear too hopefully.
2018-08-21xml: don't reset internal tagname when parsing non-tag types like CDATAHiltjo Posthuma
... this affects "tags" starting with < such as CDATA and processing instructions.
2018-08-21xml: fix missing first byte when parsing a long incorrect attribute entityHiltjo Posthuma
... the entity had to be invalid (start with &) and longer than the buffer size. + tiny style fix.
2018-08-21xml: interface change: make some functions privateHiltjo Posthuma
... this does not expose the uint* types either.
2018-08-21xml: increase allowed size of attribute namesHiltjo Posthuma
2018-08-16XML parser: numeric entity: check unicode codepoint rangeHiltjo Posthuma
2018-03-11include <sys/types.h> for types size_t, ssize_t etcHiltjo Posthuma
This makes sure xml.c in particular can be compiled without further feature macros.
2018-03-11xml: improve comment parsingHiltjo Posthuma
note that ---> is officially invalid XML, but we allow it anyway.
2018-03-11xml: fix parsing of cdata when a handler is unsetHiltjo Posthuma
2018-03-11xml: improve CDATA parsingHiltjo Posthuma
thanks Svyatoslav Mishyn for the feedback!
2017-12-24xml: make name entities static, minor clarificationsHiltjo Posthuma
2016-04-10xml: stricter check of entity: must end with ';', ...Hiltjo Posthuma
... zero output buffer if codepoint length is 0
2015-08-22xml: fix includesHiltjo Posthuma
2015-08-22xml: simplify XML readerHiltjo Posthuma
2015-08-16xml: change xml_parse_string to xml_parse_bufHiltjo Posthuma
In the parser itself allow reading '\0' in the XML itself. Add a length parameter to specify the buffer size.
2015-08-14minor code-style improvementsHiltjo Posthuma
2015-08-14xml: whoops, remove leftover xml_getnext_stdinHiltjo Posthuma
2015-08-14xml: separate reader context from parserHiltjo Posthuma
also: - rename xmlparser_ prefix to xml_. - make xml_parse public, this allows a custom reader like a direct mmap, see: XMLParser.getnext and (optionall) XMLParser.getnext_data. - improve the README text.
2015-08-08xml: move entity to namedentitystr()Hiltjo Posthuma
2015-08-06xml: remove forced __inline__ attributeHiltjo Posthuma
2015-08-06general cleanupsHiltjo Posthuma
2015-08-01xml: only allow full uppercase or full lowercase for entitiesHiltjo Posthuma
2015-07-31xml: fix xml_namedentitytostr loopHiltjo Posthuma
2015-07-31xml: fix missing include strings.h, for strncasecmpHiltjo Posthuma
2015-07-29improve includes (dont include headers in .h), fix build on LinuxHiltjo Posthuma
2015-07-28improve code-style and consistencyHiltjo Posthuma
2015-06-23xml: fix comment issue, improve cdata and comment while encountering separatorHiltjo Posthuma
2015-06-22xml: fix cdata issueHiltjo Posthuma
2015-06-21separate xml specific code into xml.cHiltjo Posthuma
2015-06-21xml.c: fix empty cdata callbackHiltjo Posthuma
2015-05-16xml: only call data handler if setHiltjo Posthuma
2015-05-16xml: call parseHiltjo Posthuma
2015-05-16xml: attrentity handler will be called if setHiltjo Posthuma
it used to be if attrentity is NULL it would call attrdata.
2015-05-16xml: allow to read from fd or string bufferHiltjo Posthuma
+ minor code style.
2014-11-17code-style, ugly test-code (remove later)Hiltjo Posthuma
2014-11-11comment styleHiltjo Posthuma
2014-11-11fix typo in man pageHiltjo Posthuma
2014-06-28xml: fix attribute without data:Hiltjo Posthuma
<input checked /> Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>
2014-06-27small fixesHiltjo Posthuma
reorder static -> public xml functions. Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>
2014-05-08style: linewrap, etcHiltjo Posthuma
Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>
2014-04-02xml: fix cdata parsing, disable markup declaration parsing for nowHiltjo Posthuma
Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>
2014-04-02Makefile: add sfeed_web, use c99 for buildHiltjo Posthuma
Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>
2014-04-01fix crlf newlines, add fp arg to xmlparser_initHiltjo Posthuma
Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>
2014-03-31new versionHiltjo Posthuma
lots of things changed, but cleanup todo. changelog and consistent stream of small updates will come in the future. Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>
2013-05-20update xml parser, many optimizations and dos to unix newlines, much cleanup ↵Hiltjo Posthuma
todo though Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>