diff options
author | Hiltjo Posthuma <hiltjo@codemadness.org> | 2020-03-09 19:16:52 +0100 |
---|---|---|
committer | Hiltjo Posthuma <hiltjo@codemadness.org> | 2020-03-10 23:40:17 +0100 |
commit | 991008dc460854b5f2f978a87946f9d90e3e5ee5 (patch) | |
tree | 45a2295bb25d5f13834f924523b127180db81cf4 | |
parent | e3461920b100b12eaaca9664ce161519966b58a9 (diff) |
sfeed_plain: optimize utf8-decoding and column position calculation
Optimize for the common-case: assuming ASCII.
The input is assumed to be valid UTF-8 input (output of sfeed).
This saves 2 function calls for determining the width of a single ASCII
character, which of course is 1.
Ranges:
< 32 are control-characters and are skipped.
< 127 is typical ASCII and is 1 column wide.
>= 127 is the normal path (control-character and multi-byte UTF-8).
Tested on OpenBSD and Linux with various compilers (clang, gcc, pcc and tcc).
On OpenBSD and Linux glibc much improvement. On Linux musl (almost) no change.
In a common-case upto 40% performance improvement.
In the worst-case negligible performance degration (<1%).
-rw-r--r-- | util.c | 11 |
1 files changed, 8 insertions, 3 deletions
@@ -247,10 +247,15 @@ printutf8pad(FILE *fp, const char *s, size_t len, int pad) slen = strlen(s); for (i = 0; i < slen; i += rl) { - if ((rl = mbtowc(&wc, s + i, slen - i < 4 ? slen - i : 4)) <= 0) - break; - if ((w = wcwidth(wc)) == -1) + rl = w = 1; + if ((unsigned char)s[i] < 32) continue; + if ((unsigned char)s[i] >= 127) { + if ((rl = mbtowc(&wc, s + i, slen - i < 4 ? slen - i : 4)) <= 0) + break; + if ((w = wcwidth(wc)) == -1) + continue; + } if (col + w > len || (col + w == len && s[i + rl])) { fputs("\xe2\x80\xa6", fp); col++; |