This sounds interesting. Could you give an example where you rewrote a pipeline ...

ketanmaheshwari · 2025-11-14T20:32:53 1763152373

This pipeline may be significantly reduced by replacing cut's with awk, accommodating grep within awk and using awk's gsub in place of tr.

rbonvall · 2025-11-14T20:44:51 1763153091

Example of replacing grep+cut with a single awk invokation:

    $ echo token:abc:def | grep -E ^token | cut -d: -f2
    abc
    
    $ echo token:abc:def | awk -F: '/^token/ { print $2 }'
    abc

Conditions don't have to be regular expressions. For example:

    $ echo $CSV
    foo:24
    bar:15
    baz:49
    
    $ echo $CSV | awk -F: '$2 > 20 { print $1 }'
    foo
    baz

dietrichepp · 2025-11-14T20:56:28 1763153788

Somebody wanted to set breakpoints in their C code by marking them with a comment (note “d” for “debugger”):

//d

You can get a list of them with a single Awk line.

  awk -F'//d[[:space:]]*' 'NF > 1 {print FILENAME ":" FNR " " $2}' source/*.c

You can even create a GDB script, pretty easily.

(IMO, easier still to configure your editor to support breakpoints, but I’m not the one who chose to do it this way.)

kazinator · 2025-11-14T21:00:30 1763154030

Why are you using the locale-specific [:space:] on source code? In your C source code, are you using spaces other than ASCII 0x20?

Would you have //d<0xA0>rest of comment?

Or some fancy Unicode space made using several UTF-8 bytes?

wtallis · 2025-11-14T21:43:54 1763156634

Tab characters can also be found in source code.

kazinator · 2025-11-14T22:12:30 1763158350

Since you control the \\d format, why would you allow/support anything but a space as a separator? That's just to distinguish it from a comment like "\\delete empty nodes" that is not the \\d debug notation.

If tabs are supported,

  [ \t]

is still shorter than

  [[:space:]]

and if we include all the "isspace" characters from ASCII (vertical tab, form feed, embedded carriage return) except for the line feed that would never occur due to separating lines, we just break even on pure character count:

  [_\t\v\f\r]

TVFR all fall under the left hand, backspace under the right, and nothing requires Shift.

The resulting character class does exactly the same thing under any locale.

nerdponx · 2025-11-15T02:30:28 1763173828

There's also [:blank:], which is just space and tab. Both I think are perfectly readable and reasonable options that communicate intent nicely.

kazinator · 2025-11-15T03:20:36 1763176836

ISO C99 says, of the isblank function (to which [:blank:] is related:

The isblank function tests for any character that is a standard blank character or is one of a locale-specific set of characters for which isspace is true and that is used to separate words within a line of text. The standard blank characters are the following: space (’ ’), and horizontal tab (’\t’). In the "C" locale, isblank returns true only for the standard blank characters.

[:blank:] is only the same thing as [\t ] (tab space) if you run your scripts and Awk and everything in the "C" locale.

nerdponx · 2025-11-15T19:40:56 1763235656

Interesting, the GNU Grep manual describes both character classes as behaving as if you are in the C locale. I shouldn't have assumed it was the same as in the C standard!

dietrichepp · 2025-11-15T17:10:19 1763226619

> Why are you using the locale-specific [:space:] on source code?

Because it’s the one I remembered first, it worked, and I didn’t think that it needed any improvement. In fact, I still don’t think it needs any improvement.