Sed - 3：高级功能 | Cloud's Blog

GNU sed独有的替换字符串标记

\l : sed将\l标记之后的单个字符输出为小写
\L : sed将\L标记之后所有的字符串作输出为小写
\u : 对应\l标记，sed将\u之后的单个字符输出为大写
\U : 对应\L标记，sed将\U标记之后所有的字符串作输出为大写

这些字符串标记一般配合正则表达式分组使用。

$ cat employee.txt
101,John Doe,CEO
102,Jason Smith,IT Manager
103,Raj Reddy,Sysadmin
104,Anand Ram,Developer
105,Jane Miller,Sales Manager

# 将第1和第2分组交换位置，将第2分组内容改为大写，第3分组内容改为小写
$ sed 's/\([^,]*\),\([^,]*\),\(.*\).*/\U\2\E,\1,\L\3/g' employee.txt
JOHN DOE,101,ceo
JASON SMITH,102,it manager
RAJ REDDY,103,sysadmin
ANAND RAM,104,developer
JANE MILLER,105,sales manage

多行命令

sed编辑器命令默认都是针对单行数据进行操作的，sed每次在一个文本行上运行定义好的命令，然后移到下一行重复同样的操作，直到完成所有输入的文本行。但是有时候针对单个数据行可能并不能很好的定义操作的逻辑，比如用户需要查找的字符串中有换行符，这时就需要跨多行进行数据操作。因此sed包含了可以用来处理多行文本的特殊命令：

N: 将下一行文本添加到模式空间
D: 删除多行模式空间中的第一行
P: 打印多样模式空间中的第一行

$ sed -e '=' employee.txt  # '='命令打印行号
1
101,John Doe,CEO
2
102,Jason Smith,IT Manager
3
103,Raj Reddy,Sysadmin
4
104,Anand Ram,Developer
5

# 'N' 将行号和内容同时读入模式空间，替换换行符\n未空格，合并为一行
$ sed -e '=' employee.txt | sed -e '{N;s/\n/ /}'
1 101,John Doe,CEO
2 102,Jason Smith,IT Manager
3 103,Raj Reddy,Sysadmin
4 104,Anand Ram,Developer
5 105,Jane Miller,Sales Manager

单行删除‘d’和打印‘p’命令的作用都是针对模式空间中当前内容，如果和N一起使用就会把模式空间中的两行都删除或打印出来，有时可能并不是用户需要的行为，因此sed总有与之对应的D和P命令，只删除或打印模式空间中\n为止。

保持空间

sed中除了模式空间（pattern space），还提供了用来临时保存文本行的保持空间（hold space）。

保持空间命令：

h : 将模式空间复制到保持空间
H : 将模式空间附加到保持空间
g : 将保持空间复制到模式空间
G : 将保持空间附加到模式空间
x : 交换模式空间和保持空间的内容

$ cat lines.txt 
This is line number 1.
This is line number 2.
This is line number 3.
This is line number 4.

$ sed -n '{1!G; h; $p}' lines.txt
This is line number 4.
This is line number 3.
This is line number 2.
This is line number 1.

上面的操作实现了将倒序打印lines.txt文件的所有行。其中涉及了多个命令调用：保持空间命令G、h，打印相关命令-n、p，寻址（第一行、最后一行），排除命令!。下面的python代码模拟了这个sed命令，执行逻辑和输出结果都是一样的，注释中标明了对应的sed命令，相信对码农来说比任何自然语言和流程图都描述的清楚（当然这段代码并不是用python完成这个任务的最好方法）。

#!/usr/bin/python2
hold_space = []
lines = open('./lines.txt').readlines()
for num, line in enumerate(lines):
    pattern_space = line               
    if num != 0:                        # 1!  ：除了第一行
        pattern_space += hold_space     # G   ：保持空间附加到模式空间
    hold_space = pattern_space          # h   ：模式空间复制到保持空间
    #print pattern_space,               # -n  ：不默认打印模式空间
    if num == len(lines) - 1:           # $   ：如果是最后一行
        print pattern_space,            # p   ：打印模式空间

流程控制

跳转命令

跳转（branch）命令b提供了基于寻址（地址，地址模式，地址空间）排除一整块命令的方法，格式如下：

1	[address]b[:label]

address决定了哪行或哪些数据会触发跳转命令
:label 定义了要跳转的位置
b命令将脚本跳转到label位置
如果未指定label，b命令将直接跳转到命令脚本末尾

$ echo "This, is, a, test, to, remove, commas," | sed -n '{
> :start
> s/,//1p
> /,/b start
> }'
This is, a, test, to, remove, commas,
This is a, test, to, remove, commas,
This is a test, to, remove, commas,
This is a test to, remove, commas,
This is a test to remove, commas,
This is a test to remove commas,
This is a test to remove commas

测试命令

1	[address]t[:label]

测试（test）命令提供了类似条件分支if-then的功能。如果替换命令成功匹配并替换了一个模式，测试命令就会跳转到指定的标签。如果未能匹配指定模式，测试命令就不会跳转。如果没有指定:label，如果测试成功，sed会跳转到脚本的结尾。

$ echo "This, is, a, test, to, remove, commas," | sed -n '{
> :start
> s/,//1p
> t start
> }'
This is, a, test, to, remove, commas,
This is a, test, to, remove, commas,
This is a test, to, remove, commas,
This is a test to, remove, commas,
This is a test to remove, commas,
This is a test to remove commas,
This is a test to remove commas