三剑客-grep (文本过滤)

1、简介

grep：英文Global search Regular Expression and print out the line,全面搜索正则表达式并打印匹配行
功能：输入文件的每一行中查找字符串,搜索字符串建议使用单引号或双引号括起来
作用：全面搜索正则表达式并把找到的行打印出来
注意: grep或egrep在检查正则表达式之前会把最后的换行符去掉,因此字符结尾不存在\n
# grep 家族有命令grep、egrep、fgrep组成
  **grep支持： BREs、EREs、PREs        **egrep支持：BREs、EREs
    grep 不使用参数-E表示使用"BREs"      egrep 不使用参数表示使用"EREs"
    grep -E 表示使用"EREs"(等价egrep)    egrep -P 表示使用"PREs"
    grep -P 表示使用"PREs"               egrep -G 表示使用"EREs"
    grep -G 表示使用"BREs"
    grep -F 表示使用无正则表达式元字符集(等价:fgrep,正则表达式不会被特殊处理,他们只匹配自己)

2、grep查找数据返回状态码

1、找到匹配模式,grep返回退出状态是0
2、没有找到匹配模式,返回退出状态是1
3、匹配时发现查找文件不存在,返回退出状态是2
[root@ /cdly/grep]# grep '^root' /etc/passwd     # 输出： root:x:0:0:root:/root:/bin/bash
[root@ /cdly/grep]# echo $?                      # 输出：0 (找到匹配内容)
[root@ /cdly/grep]# grep '^roooot' /etc/passwd   # 输出：无任何输出
[root@ /cdly/grep]# echo $?                      # 输出：1 (没有找到匹配内容)
[root@ /cdly/grep]# grep '^roooot' /etc/passwda  # 输出：grep: /etc/passwda: 没有那个文件或目录(报错信息)
[root@ /cdly/grep]# echo $?                      # 输出：2 (文件不存在)

3、基本用法

grep [-acinv] [--color=auto] [-A n] [-B n] [-C n] '搜寻字符串' 文件名

4、参数说明

以'--'的选项,是POSIX风格的选项

常用参数: -a -c -E -h -i -n -o -q -v -w

-a 、--binary-files=text          将二进制文档以文本方式处理
-b 、--byte-offset                显示字符偏移量(在匹配的行前加上偏移量,根据上下文定位磁盘块;一般和-o搭档出现)
-c 、--count                      显示匹配次数
-A num 、--after-context=num      (After)的意思,显示匹配字符串后num行的数据
-B num 、--before-context=num     (before)的意思,显示匹配字符串前num行的数据
-C num 、--context=num            显示匹配字符串前后各num行的数据
-D action 、--devices=action      如果输入文件是一个设备,FIFO或是套接字(socket)使用动作action来处理它.默认情况下action是read,意味着设备将视为普通文件那样来读;如果action为skip将不处理而直接跳过设备
-d action 、--directories=action  如果输入文件是一个目录,使用动作action来处理.默认情况下action是read,意味着目录将视为普通文件那样来读;如果action是skip将不处理而直接跳过目录;如果action是recurse,grep将递归地读每一目录下的所有文件,这样做和-r选项等价
-e PATTERN 、--regexp=PATTERN     允许多个条件同时顺序查找
-E 、--extended-regexp            采用扩展表达式去解释样式(碰到?时-E是贪婪匹配) (等价: grep -E <==> egrep) 
-F 、--fixed-strings              将模式解释为固定字符串的列表,用换行符分隔,任何一个都要匹配
-f file 、--file=file             将grep表达式写入一个文件file内,使用-f引用
-G 、--basic-regexp               将模式作为一个基本的正则表达式来解释,这是默认值
-h 、--no-filename                不输出文件名
-H 、--with-filename              输出每个匹配文件名(默认输出就是按照此方式显示,可以直接省略此参数)
-i 、--ignore-case                忽略大小写
-I 、--binary-files=without-match 处理一个二进制文件,但是认为它不包含匹配的内容
-l 、--files-with-matches         输出匹配的文件名(每个文件名只输出一次,注意：-c或-n和-l同时使用时,只有-l生效)
-L 、--files-without-match        输出不匹配的文件名(对每个文件的扫描在遇到第一个匹配的时候就会停止)
-m num 、--max-count=num          将匹配结果按照num数量输出,比如匹配10行内容,此时设置3,故将只输出前三行
-n 、--line-number                在输出的每行前面加上它所在文件中的行号
-o 、--only-matching              只显示匹配的内容
-P 、--perl-regexp                表示使用perl正则匹配(碰到?时-P是懒惰匹配)(零宽断言-大写字母P)
-q 、--quiet、--silent            只返回匹配的状态,只能使echo $?,0表示找到匹配的行,非0表示未找到匹配的行
-r 、--recursive                  递归查询,到子目录中搜索,符号链接文件被跳过(等价: -d recurse)
-R 、--dereference-recursive      递归查询,到子目录中搜索,符号链接文件会被搜索
-s 、--no-messages                屏蔽不存在的文件输出信息 
-u 、--unix-byte-offsets          报告UNIX风格的字节偏移量;选项仅在同时使用-b的情况下才有效,仅有MS-DOS和MS-Windows支持该选项
-U 、--binary                     将文件作为二进制文件处理;仅有MS-DOS和MS-Windows支持该选项
-v 、--invert-match               反向查找,匹配的内容不显示,只显示不匹配的内容
-V 、--version                    查看版本信息(大写字母V)
-w 、--word-regexp                执行单词搜索,完全符合该单词的行才会被列出,相当于\<\> (单词：由字母、数字、下划线组成)
-x 、--line-regexp                表示整行完全匹配才输出
-y                                -i的同义词,废弃不用
-z                                只打印一次文件名,而-Z每匹配一次打印一次文件名(主要针对文件做处理)
-Z 、--null                       输出以0字节作为终结符的文件名,一般和-l结合使用(主要针对文件做处理)(可以结合命令: find -print0、perl -0、sort -z、xargs -0?一起使用)
--mmap                            如果可能的话,使用mmap(2)系统调用来读取输入,而不是默认的read(2)系统调用;在一些情况下 --mmap 提供较好的性能.如果一个输入文件在grep正在操作时大小发生变化,或者如果发生了一个I/O错误--mmap可能导致不可知的行为(包括core dumps)
--help                            输出一个简短的帮助信息
--label=label                     将实际上来自标准输入的输入视为来自输入文件label;这对于 zgrep 这样的工具非常有用 (例如: gzip -cd file.gz |grep --label=file test)
--color[=WHEN]                    以特定颜色高亮显示匹配关键字,WHEN可以是never、always、auto(alias grep='grep --color=auto')(最好在.bashrc或者.bash_profile文件中加入,setenv GREP_color 32 重新设置默认颜色为绿色)
--line-buffering                  使用行缓冲
--include=FILE_PATTERN            (目录)仅从匹配的文件中去搜索
--exclude=FILE_PATTERN            (目录)跳过匹配的文件
--exclude-from=FILE               (文件)跳过匹配的文件,来自文件模式
--exclude-dir=PATTERN             (目录)跳过匹配的目录

注意：方便grep测试，查询结果带颜色显示,可以将命令进行定义别名

[root@ /cdly/grep]# cat >> ~/.bashrc <<EOF
alias egrep='egrep --color=auto'
alias grep='grep --color=auto'
EOF

5、参数习题解释说明

[root@ /cdly/grep]# grep -c '^$' file                # 统计空格数
[root@ /cdly/grep]# grep -c '^ *$' file              # 匹配空行
[root@ /cdly/grep]# echo cdly|grep 'cd\B'            # 匹配"cd"结尾还存在字符的单词,结果是:cdly (grep '\Bcd' --> 无结果,因为cd是单词的开始位置,因此无法正常匹配)
[root@ /cdly/grep]# grep 'cdly' file                 # 打印所有包含正则表达式 cdly 的行
[root@ /cdly/grep]# grep 'cdly' d*                   # 打印所有以d开头的文件中且包含正则表达式cdly的行
[root@ /cdly/grep]# grep '^c' file                   # 打印所有以c开头的行 (^表示锚定行的开头)
[root@ /cdly/grep]# grep 'y$' file                   # 打印所有以y结束的行 ($表示锚定行的结尾)
[root@ /cdly/grep]# grep '5\..' file                 # 第一个是5,紧跟着一个点,再后是任意一个字符
[root@ /cdly/grep]# grep '\.5' file                  # 打印所有包含字符串".5"的行
[root@ /cdly/grep]# grep '^[we]' file                # 打印所有以w或者e开头的行
[root@ /cdly/grep]# grep '[^0-9] ' file              # 括号内的^表示任意一个不在括号范围内的字符
[root@ /cdly/grep]# grep '[A-Z][A-Z] [A-Z]' file     # 打印所有包含前两个字符是大写字母,后面紧跟着一个空格及一个大写字母的字符串的行
[root@ /cdly/grep]# grep 'ss* ' file                 # 打印所有包含一个或者多个"s"且后面跟有一个空格的字符串的行
[root@ /cdly/grep]# grep '[a-z]\{9\}' file           # 打印所有包含每个字符串至少有9个连续小写字符串的行
[root@ /cdly/grep]# grep '\(3\)\.[0-9].*\1 *\1' file # 第一个字符是3,跟一个点,任意一个数字,然后是任意个字符,然后是一个3,然后是任意个制表符,然后又是一个3 (因为3在一对圆括号中,它可以被后面的"\1"引用匹配类型:"3.2cdly3  3"或"3.5aaa3 3")
[root@ /cdly/grep]# grep '\<north\>' file            # 所有包含以north开始的单词的行
[root@ /cdly/grep]# grep '\bnorth\b' file            # 所有包含以north开始的单词的行
[root@ /cdly/grep]# grep '^n\w*\W' file              # 第一个字符是n,紧跟着是任意个字母或者数字字符,然后是一个非字母数字字符,"\w"和"\W"都是标准的单词匹配符
[root@ /cdly/grep]# grep '\<[a-z].*n\>' file         # 第一个字符是一个小写字母,紧跟着是任意个字符,然后以字符n结束;注意.*,它表示任意字符,包括空格
[root@ /cdly/grep]# ls –l|grep '^[^d]'               # 不匹配开头是d的目录,表示去掉所有目录
[root@ /cdly/grep]# echo gnu is not unix | grep -b -o "not" # 打印模式匹配所位于的字符或字节偏移:(7:not)
[root@ /cdly/grep]# echo cdly |grep -w 'cd'          # 打印匹配单词是cd的行
[root@ /cdly/grep]# echo cdly|grep -v cd             # 打印不包含cd的行
[root@ /cdly/grep]# grep -V                          # 打印grep版本信息

# -A num 、-B num 、-C num
[root@ /cdly/grep]# seq -w 10 | grep -n -A 2 '03'  # -A 2显示匹配的行,并且将匹配的下两行也显示出来
3:03
4-04
5-05
[root@ /cdly/grep]# seq -w 10 | grep -n -B 2 '03'  # -B 2显示匹配的行,并且将匹配的上两行也显示出来
1-01
2-02
3:03
[root@ /cdly/grep]# seq -w 10 | grep -n -C 2 '03'  # -C 2显示匹配的行,并且将匹配的前后两行也显示出来
1-01
2-02
3:03
4-04
5-05

# -e
[root@ /cdly/grep]# grep -e '^root' -e 'sshd' /etc/passwd       
root:x:0:0:root:/root:/bin/bash
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin

# -l 、-L
[root@ /cdly/grep]# echo 1111 > a1.txt
[root@ /cdly/grep]# echo -e "aaa\n222" > a2.txt
[root@ /cdly/grep]# grep -l '^[a-z]' *      # 输出：a2.txt (-l 显示匹配内容的文件名称)
[root@ /cdly/grep]# grep -L '^[a-z]' *      # 输出：a1.txt (-L 显示没有符合的文件名称)

# -q
[root@ /cdly/grep]# seq -w 10 | grep -q '08';echo $?  # 输出：0  (核查到匹配的结果,故返回状态0)
[root@ /cdly/grep]# seq -w 10 | grep -q '088';echo $? # 输出：1  (未核查到匹配的结果,故返回状态非0)

# -n
[root@ /cdly/grep]# grep -n '^root' /etc/passwd       # 输出：1:root:x:0:0:root:/root:/bin/bash

# -r
[root@ /cdly/grep]# grep '5' -r *                     # 递归核查文件内容
a/b/file.txt:5
file:5

# -s
[root@ /cdly/grep]# grep 'q' file12   # 输出：grep: file12: 没有那个文件或目录
[root@ /cdly/grep]# grep -s 'q' file12  # 输出：无任何输出
[root@ /cdly/grep]# echo $?       # 输出：2 (文件不存在,返回状态码2)

# -f file
[root@ /cdly/grep]# cat file      # 匹配条件存放指定位置,可以使用多个条件依次匹配
root
^mysql
[root@ /cdly/grep]# grep -f file /etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin
mysql:x:27:27:MySQL Server:/var/lib/mysql:/bin/bash

# -h 、-H
[root@ /cdly/grep]# grep '.' file*
file1:1
file2:1
[root@ /cdly/grep]# grep -h '.' file*  # ( -h )
1
1
[root@ /cdly/grep]# grep -H '.' file*  # ( -H )
file1:1
file2:1

# -m num
[root@ /cdly/grep]# seq 15|grep '1'     
1
10
11
12
13
14
15
[root@ /cdly/grep]# seq 20|grep -m 3 '1'  # ( -m )
1
10
11

# -i
[root@ /cdly/grep]# echo CDly|grep 'cd'   # 输出：无任何输出
[root@ /cdly/grep]# echo CDly|grep -i 'cd'  # 输出：CDly

# -x
[root@ /cdly/grep]# echo chen dong|grep 'chen'         # 输出：chen dong
[root@ /cdly/grep]# echo chen dong|grep -x 'chen'      # 输出：无任何输出 (整行不匹配,无输出)
[root@ /cdly/grep]# echo chen dong|grep -x 'chen dong' # 输出：chen dong  (整行匹配,输出结果)

# -l -z -Z
[root@ /cdly/grep]# grep 'a' file*
file1:aaa
file1:abc
file2:a1
file2:aa
[root@ /cdly/grep]# grep -l 'a' file* # 只显示匹配的文件名称
file1
file2
[root@ /cdly/grep]# grep -z 'a' file* # 内容无论被匹配一次或者是多次,只会显示一次文件名称
file1:aaa
abc
bbb
ccc
file2:a1
b1
aa
cc
[root@ /cdly/grep]# grep -Z 'a' file* # 内容被匹配几次就显示几次名称(并且文件名称和匹配内容之间使用的是"0字节"分隔符)
file1aaa
file1abc
file2a1
file2aa
[root@ /cdly/grep]# grep -Z 'a' file* |xargs -0
file1 aaa
file1 abc
file2 a1
file2 aa
[root@ /cdly/grep]# grep -lZ 'a' file*             # 输出：file1file3
[root@ /cdly/grep]# grep -lZ 'a' file* |xargs  -0  # 输出：file1 file3

# -r 、--include 、--exclude
[root@ /cdly/grep]# grep -r '127.0.0.1' /etc --include *.conf # 递归搜索/etc目录下包含ip的conf后缀文件
[root@ /cdly/grep]# grep -r '127.0.0.1' /etc --exclude *.conf # 递归搜索/etc目录下包含ip的文件,并排除搜索conf后缀的文件

6、"\t"的特殊处理

# \t 正则实际是匹配Tab的,在匹配字符内如果出现了t字符就会导致\t不生效,此次会单独匹配"\"和"t",这个时候就的需要针对\t进行单独处理,处理方式如下:
[root@ /cdly/grep]# echo "abctest file"|egrep -o '^[^ \t]+'      # 输出：abc
[root@ /cdly/grep]# echo "abctest file"|egrep -o $'^[^ \t]+'     # 输出：abctest
[root@ /cdly/grep]# echo "abctest file"|egrep -o '^[^ '$'\t'']+' # 输出：abctest

标签：grep,--,cdly,file,匹配,文本,root,三剑客
From： https://blog.51cto.com/cdly/5896046