Chapter13: 字符串

mac2026-02-02 2

13.6 正则表达式

正则表达式很久之前就已经整合到标准Unix工具集中了，像sed、awk。Java对字符串简单操作，常规的有String、StringBuffer、StringTokenizer（用于分割字符串，用法可参考https://www.runoob.com/w3cnote/java-stringtokenizer-intro.html）。

应用正则表达式，最简单的是String类内建的功能，比如split()、replaceFirst()、replaceAll()等。要更强大的，就得用Pattern类和Matcher类。下面简单了解String类有关的正则的一些方法：

（1）String.split()有个重载版本，允许限制字符串分割次数。看文档：

The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter. If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.（意思是，大于0的，分割n - 1次；等于0，分割尽可能多，然后尾部的空元素都去掉；小于0的，分割尽可能多，同时包含了空元素）

The string "boo:and:foo", for example, yields the following results with these parameters:

RegexLimitResult:2{ "boo", "and:foo" }:5{ "boo", "and", "foo" }:-2{ "boo", "and", "foo" }o5{ "b", "", ":and:f", "", "" }o-2{ "b", "", ":and:f", "", "" }o0{ "b", "", ":and:f" }

（2）String.replaceFirst()和String.replaceAll()

public class Main{ public static void main(String[] args) { String str = "Java now has regular expression"; // 下划线替换所有元音字母 System.out.println(str.replaceAll("[aeiou]", "_")); // J_v_ n_w h_s r_g_l_r _xpr_ss__n // 下划线替换第一个元音字母 System.out.println(str.replaceFirst("[aeiou]", "_")); // J_va now has regular expression } }

说下Java中正则的基本知识，可参考https://www.cnblogs.com/zery/p/3438845.html：

（1）正则字符串得双反斜杠，后面得字符才有特殊意义，如\\d表示一位数字。普通反斜杠则是\\\\。特殊的，换行、制表符只需单反斜杠\n\t

（2）?表示前一个字符有0-1个，如-？表示可以有-也可以没有-。

再说下CharSequence，是java.lang包下的一个接口，是从CharBuffer、String、StringBuffer、StringBuilder类之中抽象出字符序列得一般化定义（故这些类都实现该接口）；CharSequence和String都可以定义字符串，但是String定义的字符串只能读，CharSequence定义的字符串是可读可写的；对于抽象类或者接口来说不可以直接使用new的方式创建对象，但是可以直接给它赋值。（可参考https://www.cnblogs.com/dqsBK/p/5342298.html关于涉及到CharSequence接口的String方法）

说下Pattern和Matcher。Pattern对象表示编译后的正则表达式。两者常见的方法：

import java.util.regex.Matcher; import java.util.regex.Pattern; public class Main { public static void main(String[] args) { String str = "Java now"; Pattern p = Pattern.compile("[aeiou]"); Matcher m = p.matcher(str); // find(int start) 从start下标开始去find int i = 0; while (m.find(i)) { System.out.println(m.group()); // a a a a o o o i++; } // find() 找到匹配的一个位置 // group() 返回上次匹配（find()）的整个匹配结果 // start() 匹配成功时，返回上次匹配到的起始位置索引 // end() 匹配成功时，返回上次匹配的最后字符的索引加一 while (m.find()) { System.out.println(m.group()); // a a o System.out.println(m.start()); // 1 3 6 System.out.println(m.end()); // 2 4 7 } // lookingAt() str是否以正则表达式代表的字符串为开头 System.out.println(m.lookingAt()); // Ja则为true // matches() str是否就是正则表达式代表的字符串 System.out.println(m.matches()); // ^J.*w$则为true Matcher matcherGroup = Pattern.compile("I (.*) You").matcher("I Love You"); // group()默认全部，组0； // group(int i)则获取组i。如A(B(C))D 组0是ABCD，组1是BC，组2是C // groupCount()获取组数，不包括组0 if (matcherGroup.find()) { System.out.println(matcherGroup.groupCount()); // 1 System.out.println(matcherGroup.group()); // I Love You System.out.println(matcherGroup.group(1)); // Love } // Pattern.complie(String regex, int flag) Matcher matcher = Pattern.compile("e{1,}", Pattern.CASE_INSENSITIVE).matcher("J2EE"); while (matcher.find()) { System.out.println(matcher.group()); // EE } String multilineString = "java is good\njava is nice\n" + "JAVA is excellent"; // Pattern.MULTILINE实现多行匹配（无视\n），|实现组合多个标记 matcher = Pattern.compile("^java", Pattern.CASE_INSENSITIVE | Pattern.MULTILINE) .matcher(multilineString); while (matcher.find()) { System.out.println(matcher.group()); // java java JAVA } } }

正则除了可以检查字段是否符合格式（比如要求中文、数字、字母大小写、位数等），以及用来分割字符串，还可以用来替换文本。java默认.是匹配新行的，也就是说：默认.是无视/n的。

具体例子如下：

public class Main { public static void main(String[] args) { String str = "Java now"; Pattern p = Pattern.compile("[aeiou]"); Matcher m = p.matcher(str); // String的replaceAll()方法实际上也就是调用Matcher的replaceAll()方法 System.out.println(m.replaceFirst("m")); // Jmva now System.out.println(m.replaceAll("m")); // Jmvm nmw // appendReplacement(StringBuffer sbuf, String replacement)采用渐进式的替换，可处理匹配到的内容，并存入sbuf // reset()将Matcher对象重新设置到当前字符序列的起始位置; // reset("abc")则将Matcher对象应用到新的字符序列abc m.reset(); StringBuffer sbuf = new StringBuffer(); while(m.find()) { m.appendReplacement(sbuf, m.group().toUpperCase()); // 将匹配到的结果大写 } // 把剩余未处理的部分存入sbuf中 m.appendTail(sbuf); // 如果这句没有，则结果是JAvA nO，即到最后匹配项就停止了 System.out.println(sbuf); // JAvA nOw } }

更深入的正则表达式学习，可看Jeffrey E.F.Friedl的《精通正则表达式（第2版）》

最新回复(0)