summaryrefslogtreecommitdiff
path: root/sfeed_web.c
AgeCommit message (Collapse)Author
2022-03-28compatibility: replace iscntrl with own ISCNTRL macroHiltjo Posthuma
It is unspecified if the C locale iscntrl is compatible with ASCII or not. Noticed when testing on OpenBSD 3.8 which uses extended ASCII and also uses the C1 range for control-characters. This breaks support with UTF-8. Reference: https://en.wikipedia.org/wiki/C0_and_C1_control_codes#C1_control_codes_for_general_use C1 table. Force an own definition of an ASCII-compatible control-character range since sfeed expects input to be UTF-8 (or converted from iconv) and so output to be UTF-8 aswell.
2022-03-15stricter error checking in file streams (input, output)Hiltjo Posthuma
This also makes the programs exit with a non-zero status when a read or write error occurs. This makes checking the exit status more reliable in scripts. A simple example to simulate a disk with no space left: curl -s 'https://codemadness.org/atom.xml' | sfeed > f /mnt/test: write failed, file system is full echo $? 0 Which now produces: curl -s 'https://codemadness.org/atom.xml' | sfeed > f /mnt/test: write failed, file system is full write error: <stdout> echo $? 1 Tested with a small mfs on OpenBSD, fstab entry: swap /mnt/test mfs rw,nodev,nosuid,-s=1M 0 0
2021-06-01portability and standards: add BSD-like err() and errx() functionsHiltjo Posthuma
These are BSD functions. - HaikuOS now compiles without having to use libbsd. - Tested on SerenityOS (for fun), which doesn't have these functions (yet). With a small change to support wcwidth() sfeed works on SerenityOS.
2021-03-01util: improve/refactor URI parsing and formattingHiltjo Posthuma
Removed/rewritten the functions: absuri, parseuri, and encodeuri() for percent-encoding. The functions are now split separately with the following purpose: - uri_format: format struct uri into a string. - uri_hasscheme: quick check if a string is absolute or not. - uri_makeabs: make a URI absolute using a base uri and the original URI. - uri_parse: parse a string into a struct uri. The following URLs are better parsed: - URLs with extra "/"'s in the path prepended are kept as is, no "/" is added either for empty paths. - URLs like "http://codemadness.org" are not changed to "http://codemadness.org/" anymore (paths are kept as is, unless they are non-empty and not start with "/"). - Paths are not percent-encoded anymore. - URLs with userinfo field (username, password) are parsed. like: ftp://user:password@[2001:db8::7]:2121/rfc/rfc1808.txt - Non-authoritive URLs like mailto:some@email.org, magnet URIs, ISBN URIs/urn, like: urn:isbn:0-395-36341-1 are allowed and parsed correctly. - Both local (file:///) and non-local (file://) are supported. - Specifying a base URL with a port will now only use it when the relative URL has no host and port set and follows RFC3986 5.2.2 more closely. - Parsing numeric port: parse as signed long and check <= 0, empty port is allowed. - Parsing URIs containing query, fragment, but no path separator (/) will now parse the component properly. For sfeed: - Parse the baseURI only once (no need to do it every time for making absolute URIs). - If a link/enclosure is absolute already or if there is no base URL specified then just print the link directly. There have also been other small performance improvements related to handling URIs. References: - https://tools.ietf.org/html/rfc3986 - Section "5.2.2. Transform References" have also been helpful.
2020-10-31sfeed_web: improve parsing a <link> if it has no type attributeHiltjo Posthuma
This happens because the previous link type is not reset when a <link> tag starts again, but it is reset when a type attribute starts. Found on the spanish newspaper site: elpais.com Input: <link rel="alternate" href="https://feeds.elpais.com/mrss-s/pages/ep/site/elpais.com/portada" type="application/rss+xml" title="RSS de la portada de El PaĆ­s"/> <link rel="canonical" href="https://elpais.com"/> Would print (second line is incorrect). https://feeds.elpais.com/mrss-s/pages/ep/site/elpais.com/portada application/rss+xml https://elpais.com/ application/rss+xml Now prints: https://feeds.elpais.com/mrss-s/pages/ep/site/elpais.com/portada application/rss+xml Fix: reset it also at the start of a <link> tag in this case (for <base href /> it is still not wanted).
2020-10-22sfeed_web: whoops, fix bug mentioned in the previous commitHiltjo Posthuma
(ascii.jp)
2020-10-22sfeed_web: attribute parsing improvements, improve man pageHiltjo Posthuma
Fix attribute parsing and now decode entities. The following now works (from helsinkitimes.fi): <base href="https://www.helsinkitimes.fi/" /> <link href="/?format=feed&amp;type=rss" rel="alternate" type="application/rss+xml" title="RSS 2.0" /> <link href="/?format=feed&amp;type=atom" rel="alternate" type="application/atom+xml" title="Atom 1.0" /> Properly associate attributes with the actual tag, this now parses properly (from ascii.jp). <link rel="apple-touch-icon-precomposed" href="/img/apple-touch-icon.png" /> <link rel="alternate" type="application/rss+xml" />
2020-10-21sfeed_web: reset feedlink bufferHiltjo Posthuma
Noticed strange output on the site ascii.jp: The site HTML contained: <link rel="apple-touch-icon-precomposed" href="/img/apple-touch-icon.png" /> <link rel="alternate" type="application/rss+xml" /> This would print: "/img/apple-touch-icon.png application/rss+xml" Now it prints: " application/rss+xml"
2020-03-11sfeed_web: fix exit status codeHiltjo Posthuma
- Fix a theoretical issue where "found" can overflow and return a zero exit status when there are many feeds found. - When there are no RSS/Atom feeds this is not an error, so return 0. - Style: change unsigned int to int.
2020-01-24cleanup some includesHiltjo Posthuma
2020-01-18sfeed_web: remove unneeded optimizationHiltjo Posthuma
2019-04-06optimization: define GETNEXT as an inline macroHiltjo Posthuma
This reduces much function call overhead. getnext is defined in xml.h for inline optimization. sfeed only uses one XML parser context per program, this allows further optimizations of the compiler also. On OpenBSD it was noticable because of retpoline etc function call overhead. Using clang and a 500MB test XML file reduces processing time from +- 12s to 5s. Tested using some crazy optimization flags: SFEED_CFLAGS = -O3 -std=c99 -DGETNEXT=getchar_unlocked -fno-ret-protector \ -mno-retpoline -static A GETNEXT macro is also nice for programs which mmap(2) some big XML file. Then you can simply define: #define GETNEXT() (off >= len ? EOF : reg[off++])
2019-02-08short some callback variable names, change "name" to "t" (tag)Hiltjo Posthuma
2018-11-09minor white-space style fixHiltjo Posthuma
2018-09-07fix many undefined behaviour in usage of ctype functionsHiltjo Posthuma
- cast all ctype(3) function argument to (unsigned char) to avoid UB POSIX says: "The c argument is an int, the value of which the application shall ensure is a character representable as an unsigned char or equal to the value of the macro EOF. If the argument has any other value, the behavior is undefined." Many libc cast implicitly the value, but NetBSD does not, which is probably the correct thing to interpret it. - no need to cast for putchar + rename some fputc(..., stdout) to putchar POSIX says: "The fputc() function shall write the byte specified by c (converted to an unsigned char) to the output stream pointed to by stream [...]" Major thanks to Leonardo Taccari <iamleot@gmail.com> for reporting and testing it on NetBSD!
2018-08-22remove stdint.h includeHiltjo Posthuma
the uint* types in XML are not exposed anymore.
2018-03-11include <sys/types.h> for types size_t, ssize_t etcHiltjo Posthuma
This makes sure xml.c in particular can be compiled without further feature macros.
2017-12-24sfeed_web: print relative url now directly if no base url specifiedHiltjo Posthuma
2017-04-27simplify pledge stubHiltjo Posthuma
2016-08-06add USE_PLEDGE, remove pledge dummy functionHiltjo Posthuma
2016-04-10absuri, encodeuri: make encodeuri static, change argument orderHiltjo Posthuma
2016-03-10remove cast of unused variablesHiltjo Posthuma
2016-02-27various improvementsHiltjo Posthuma
- pledge tools and add define to enable it on platforms that support it, currently only OpenBSD 5.9+ - separate getline and parseline functionality. - use murmur3 hash instead of jenkins1: faster and less collisions. - make some error messages a bit more clear, for example with path truncation. - some small cleanups, move printutf8pad to util.
2015-10-04portability: dont use HOST_NAME_MAX, just use 256 as maximumHiltjo Posthuma
2015-08-22xml: simplify XML readerHiltjo Posthuma
2015-08-22use HOST_NAME_MAX for hostnameHiltjo Posthuma
2015-08-16code-style + no need to zero static variablesHiltjo Posthuma
2015-08-16code-style, wrap some lines, etcHiltjo Posthuma
2015-08-14xml: separate reader context from parserHiltjo Posthuma
also: - rename xmlparser_ prefix to xml_. - make xml_parse public, this allows a custom reader like a direct mmap, see: XMLParser.getnext and (optionall) XMLParser.getnext_data. - improve the README text.
2015-08-05sfeed_web: separate by tab, url<tab>contenttype, simplifyHiltjo Posthuma
2015-07-29improve includes (dont include headers in .h), fix build on LinuxHiltjo Posthuma
2015-07-28improve code-style and consistencyHiltjo Posthuma
2015-07-28use new uri parserHiltjo Posthuma
2015-06-21improvementsHiltjo Posthuma
2015-05-16xml: adjust for API change: read from fdHiltjo Posthuma
2015-01-02trim stringHiltjo Posthuma
2015-01-02cleanupHiltjo Posthuma
- dont free at end (not needed in our case). - use 0 and 1 instead of EXIT_SUCCESS, EXIT_FAILURE. - use err (from err.h) instead of custom die().
2014-11-11sfeed_web: just assume rss/atom for application/xml tooHiltjo Posthuma
2014-11-11code style, use actual column width of charHiltjo Posthuma
2014-06-28compile with -Wextra, ignore unused parametersHiltjo Posthuma
Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>
2014-06-27small fixesHiltjo Posthuma
reorder static -> public xml functions. Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>
2014-04-09sfeed_web: show type of feedHiltjo Posthuma
Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>
2014-04-09cleanup, remove javascript hotkeyHiltjo Posthuma
Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>
2014-04-01fix crlf newlines, add fp arg to xmlparser_initHiltjo Posthuma
Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>
2014-03-31small cleanupHiltjo Posthuma
Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>
2014-03-31new versionHiltjo Posthuma
lots of things changed, but cleanup todo. changelog and consistent stream of small updates will come in the future. Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>