Age | Commit message (Collapse) | Author |
|
This was a copy-pasta.
|
|
This is not clearly defined by the C99 standard.
Define ctype-like macros to force it to be ASCII / UTF-8 (not extended ASCII or
something like noticed on OpenBSD 3.8).
(In practise modern libc libraries are all ASCII and UTF-8-compatible. Otherwise
this would break many programs)
|
|
It is unspecified if the C locale iscntrl is compatible with ASCII or not.
Noticed when testing on OpenBSD 3.8 which uses extended ASCII and also uses the
C1 range for control-characters. This breaks support with UTF-8.
Reference:
https://en.wikipedia.org/wiki/C0_and_C1_control_codes#C1_control_codes_for_general_use
C1 table.
Force an own definition of an ASCII-compatible control-character range since
sfeed expects input to be UTF-8 (or converted from iconv) and so output to be
UTF-8 aswell.
|
|
Tested with the scc compiler which is a pure c99 compiler.
sys/types.h is not needed here anymore (it was used for ssize_t).
Side-note: scc can now compile the sfeed parser program!
It requires these changes at the time of writing: Add a strcasecmp and
strncasecmp function and use getchar instead of getchar_unlocked.
|
|
This also makes the programs exit with a non-zero status when a read or write
error occurs.
This makes checking the exit status more reliable in scripts.
A simple example to simulate a disk with no space left:
curl -s 'https://codemadness.org/atom.xml' | sfeed > f
/mnt/test: write failed, file system is full
echo $?
0
Which now produces:
curl -s 'https://codemadness.org/atom.xml' | sfeed > f
/mnt/test: write failed, file system is full
write error: <stdout>
echo $?
1
Tested with a small mfs on OpenBSD, fstab entry:
swap /mnt/test mfs rw,nodev,nosuid,-s=1M 0 0
|
|
This makes atleast feeds with simple ASCII work.
|
|
|
|
These are BSD functions.
- HaikuOS now compiles without having to use libbsd.
- Tested on SerenityOS (for fun), which doesn't have these functions (yet).
With a small change to support wcwidth() sfeed works on SerenityOS.
|
|
Removed/rewritten the functions:
absuri, parseuri, and encodeuri() for percent-encoding.
The functions are now split separately with the following purpose:
- uri_format: format struct uri into a string.
- uri_hasscheme: quick check if a string is absolute or not.
- uri_makeabs: make a URI absolute using a base uri and the original URI.
- uri_parse: parse a string into a struct uri.
The following URLs are better parsed:
- URLs with extra "/"'s in the path prepended are kept as is, no "/" is added
either for empty paths.
- URLs like "http://codemadness.org" are not changed to
"http://codemadness.org/" anymore (paths are kept as is, unless they are
non-empty and not start with "/").
- Paths are not percent-encoded anymore.
- URLs with userinfo field (username, password) are parsed.
like: ftp://user:password@[2001:db8::7]:2121/rfc/rfc1808.txt
- Non-authoritive URLs like mailto:some@email.org, magnet URIs, ISBN URIs/urn,
like: urn:isbn:0-395-36341-1 are allowed and parsed correctly.
- Both local (file:///) and non-local (file://) are supported.
- Specifying a base URL with a port will now only use it when the relative URL
has no host and port set and follows RFC3986 5.2.2 more closely.
- Parsing numeric port: parse as signed long and check <= 0, empty port is
allowed.
- Parsing URIs containing query, fragment, but no path separator (/) will now
parse the component properly.
For sfeed:
- Parse the baseURI only once (no need to do it every time for making absolute
URIs).
- If a link/enclosure is absolute already or if there is no base URL specified
then just print the link directly. There have also been other small performance
improvements related to handling URIs.
References:
- https://tools.ietf.org/html/rfc3986
- Section "5.2.2. Transform References" have also been helpful.
|
|
sfeed_gopher must be able to write in the current directory, but does not need
write permissions outside it. It could read from any place in the filesystem
(to read feed files).
Prompted by a suggestion from vejetaryenvampir, thanks!
|
|
Fields with multiple values are separated by '|'. In the future multiple
enclosure support might be added.
The categories tags are now parsed. This feature is useful for filtering and
categorizing.
Parsing of nested tags such as <author><name> has been improved. This code has
been refactored.
RSS <guid> isPermaLink is now handled differently also and will now prefer a
permalink with "true" (link) over the ID. In practise multiple <guid> in an
item does not happen.
|
|
- remove a check that has no use/can never happen.
- remove the return value as it's unused and the input size is known.
- fix an old comment that doesn't reflect what the function does anymore.
|
|
|
|
|
|
This is useful for example for podcasts (audio attachment), newsposts (usually
some image) or comic strips (link to page, image as enclosure).
thanks leot for the feedback!
|
|
|
|
|
|
keep sfeed_tail until sfeed is reworked to support tail -f (eventually)
|
|
|
|
|
|
|
|
Remove type of feed per item, it is not that interesting. sfeed(1) can parse
both RSS and Atom feeds.
|
|
|
|
|
|
|
|
- pledge tools and add define to enable it on platforms that support it, currently
only OpenBSD 5.9+
- separate getline and parseline functionality.
- use murmur3 hash instead of jenkins1: faster and less collisions.
- make some error messages a bit more clear, for example with path truncation.
- some small cleanups, move printutf8pad to util.
|
|
The overhead for OpenBSD is minimal. I will periodically sync from
OpenBSD libc.
|
|
|
|
|
|
also:
- parse tag media:description for RSS.
- be more strict about using the order of fields, this is more consistent now.
- remove buffer_init: don't allocate buffers on start.
- realloc, be slightly more aggresive with memory allocating: initial buffer size 16 to 64 bytes.
|
|
|
|
|
|
|
|
|
|
|
|
... put specific formatting-logic per program (printcontent()).
|
|
- Only escape characters in "content" field, these can contain newlines.
- Trim newlines and tabs, etc from the title, id and author fields.
- Make decodefield, xmlencode functions easier to "chain" without allocatting
new buffers.
- Move printutf8pad from util (only used by sfeed_plain) to sfeed_plain.
- Update README, still need to update the man-page and improve the documentation
in general.
- Code cleanup.
|
|
- don't print directly but use an internal buffer (also better for testing).
- encode uri when printing (security).
- add some comments.
|
|
|
|
|
|
- remove xerr and xerrx, assume the OS closes and flushes file descriptors
on OS process exit.
- move esnprintf, printcontent to util.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>
|