java使用Guava **CharMatcher** 进行字符串处理

java使用Guava进行字符串处理

从字符串中删除特定的字符

1
2
3
4
5
6
7
8
@Test
public void whenRemoveSpecialCharacters_thenRemoved(){
String input = "H*el.lo,}12";
CharMatcher matcher = CharMatcher.JAVA_LETTER_OR_DIGIT;
String result = matcher.retainFrom(input);

assertEquals("Hello12", result);
}

从字符串中删除不是字符串

1
2
3
4
5
6
7
8
9
10
@Test
public void whenRemoveNonASCIIChars_thenRemoved() {
String input = "あhello₤";

String result = CharMatcher.ASCII.retainFrom(input);
assertEquals("hello", result);

result = CharMatcher.inRange('0', 'z').retainFrom(input);
assertEquals("hello", result);
}

删除不在字符集中的字符

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
@Test
public void whenRemoveCharsNotInCharset_thenRemoved() {
Charset charset = Charset.forName("cp437");
CharsetEncoder encoder = charset.newEncoder();

Predicate<Character> inRange = new Predicate<Character>() {
@Override
public boolean apply(Character c) {
return encoder.canEncode(c);
}
};

String result = CharMatcher.forPredicate(inRange)
.retainFrom("helloは");
assertEquals("hello", result);
}

验证字符串

1
2
3
4
5
6
7
8
9
10
11
12
13
@Test
public void whenValidateString_thenValid(){
String input = "hello";

boolean result = CharMatcher.JAVA_LOWER_CASE.matchesAllOf(input);
assertTrue(result);

result = CharMatcher.is('e').matchesAnyOf(input);
assertTrue(result);

result = CharMatcher.JAVA_DIGIT.matchesNoneOf(input);
assertTrue(result);
}

去除字符串空格

1
2
3
4
5
6
7
8
9
10
11
12
13
@Test
public void whenTrimString_thenTrimmed() {
String input = "---hello,,,";

String result = CharMatcher.is('-').trimLeadingFrom(input);
assertEquals("hello,,,", result);

result = CharMatcher.is(',').trimTrailingFrom(input);
assertEquals("---hello", result);

result = CharMatcher.anyOf("-,").trimFrom(input);
assertEquals("hello", result);
}

折叠字符串

1
2
3
4
5
6
7
8
9
10
@Test
public void whenCollapseFromString_thenCollapsed() {
String input = " hel lo ";

String result = CharMatcher.is(' ').collapseFrom(input, '-');
assertEquals("-hel-lo-", result);

result = CharMatcher.is(' ').trimAndCollapseFrom(input, '-');
assertEquals("hel-lo", result);
}

替换字符串

1
2
3
4
5
6
7
8
9
10
@Test
public void whenReplaceFromString_thenReplaced() {
String input = "apple-banana.";

String result = CharMatcher.anyOf("-.").replaceFrom(input, '!');
assertEquals("apple!banana!", result);

result = CharMatcher.is('-').replaceFrom(input, " and ");
assertEquals("apple and banana.", result);
}

统计字串串出现的次数

1
2
3
4
5
6
7
8
9
10
@Test
public void whenCountCharInString_thenCorrect() {
String input = "a, c, z, 1, 2";

int result = CharMatcher.is(',').countIn(input);
assertEquals(4, result);

result = CharMatcher.inRange('a', 'h').countIn(input);
assertEquals(2, result);
}

总结: 使用CharMatcher处理字符串,简单方便.

分享到

java中将异常栈跟踪信息转字符串

1. 使用原生java代码

1
2
3
StringWriter sw = new StringWriter();
PrintWriter pw = new PrintWriter(sw);
e.printStackTrace(pw);

2. 使用Common lang工具在类

1
String stacktrace = ExceptionUtils.getStacktrace(e);

总结 获取跟踪栈异常默认不能用String,在java9中添加了StackWalking API支持栈异常跟踪

分享到

java中将列表转换成字符串

1. 直接打印

1
2
3
4
5
6
@Test
public void whenListToString_thenPrintDefault() {
List<Integer> intLIst = Arrays.asList(1, 2, 3);

System.out.println(intLIst); //这个方式,对简单类型对象适用
}

2. 实用stream方式转换

1
2
3
4
5
6
7
8
9
@Test
public void whenCollectorsJoining_thenPrintCustom() {
List<Integer> intList = Arrays.asList(1, 2, 3);
String result = intList.stream()
.map(n -> String.valueOf(n))
.collect(Collectors.joining("-", "{", "}"));

System.out.println(result);
}

3. 实用Common lang 工具

1
2
3
4
5
6
@Test
public void whenStringUtilsJoin_thenPrintCustom() {
    List<Integer> intList = Arrays.asList(1, 2, 3);
  
    System.out.println(StringUtils.join(intList, "|"));
}

总结: 简单的类型可以直接打印,但是复杂自定义对象建议使用流方式打印.

分享到

java中实用apache Text字符串处理

首字母大写

1
2
3
4
5
6
7
@Test
public void whenCapitalized_thenCorrect() {
String toBeCapitalized = "to be capitalized!";
String result = WordUtils.capitalize(toBeCapitalized);

assertEquals("To Be Capitalized!", result);
}

判断字符串中存在的字符

1
2
3
4
5
6
7
@Test
public void whenContainsWords_thenCorrect() {
boolean containsWords = WordUtils
.containsAllWords("String to search", "to", "search");

assertTrue(containsWords);
}

使用StrSubstitutor建立字符串模板

1
2
3
4
5
6
7
8
9
10
11
@Test
public void whenSubstituted_thenCorrect() {
Map<String, String> substitutes = new HashMap<>();
substitutes.put("name", "John");
substitutes.put("college", "University of Stanford");
String templateString = "My name is ${name} and I am a student at the ${college}.";
StrSubstitutor sub = new StrSubstitutor(substitutes);
String result = sub.replace(templateString);

assertEquals("My name is John and I am a student at the University of Stanford.", result);
}

使用StrBuilder替代原生StirngBuilder 方便替换字串串内容

1
2
3
4
5
6
7
8
9
@Test
public void whenReplaced_thenCorrect() {
StrBuilder strBuilder = new StrBuilder("example StrBuilder!");
strBuilder.replaceAll("example", "new");

assertEquals(new StrBuilder("new StrBuilder!"), strBuilder);
//清理StrBuilder
strBuilder.clear()
}

比较字符串不同次数

1
2
3
4
5
6
7
8
@Test
public void whenEditScript_thenCorrect() {
StringsComparator cmp = new StringsComparator("ABCFGH", "BCDEFG");
EditScript<Character> script = cmp.getScript();
int mod = script.getModifications();

assertEquals(4, mod);
}

使用text.similarily方便比较字符串的不同程度

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// 得到相同数量
@Test
public void whenCompare_thenCorrect() {
LongestCommonSubsequence lcs = new LongestCommonSubsequence();
int countLcs = lcs.apply("New York", "New Hampshire");

assertEquals(5, countLcs);
}
// 得到不同数量
@Test
public void whenCalculateDistance_thenCorrect() {
    LongestCommonSubsequenceDistance lcsd = new LongestCommonSubsequenceDistance();
    int countLcsd = lcsd.apply("New York", "New Hampshire");
     
    assertEquals(11, countLcsd);
}

自定义传唤字符串

1
2
3
4
5
6
7
@Test
public void whenTranslate_thenCorrect() {
UnicodeEscaper ue = UnicodeEscaper.above(0);
String result = ue.translate("ABCD");

assertEquals("\\u0041\\u0042\\u0043\\u0044", result);
}
分享到

java中字符串转字节流

1. 使用chars()方法 返回IntStream

1
2
3
4
5
String testString = "String";
IntStream intStream = testString.chars(); // 获取 数字流 ,所以我们需要转换

Stream<Character> characterStream = testString.chars()
.mapToObj(c -> (char) c); // 使用mapToObj强转

2. 使用codePoint()方法 得到点码,再强转.

1
2
3
4
5
6
Stream<Character> characterStream2  = testString.codePoints()
.mapToObj(c -> (char) c);

// 也可以转换成单个字符串流
Stream<String> stringStream = testString.codePoints()
.mapToObj(c -> String.valueOf((char) c));

总结: 通过字符串转换整型流,也可以转换字节流,还可以转换单个字符串流.

分享到

java中十六进制转ASCII编码

将字符串转换成十六进制

  1. 将每个字符串转换成数组
  2. 将数组中字符转换成整型
  3. 使用Integer.toHexString()方法转换成十六进制
1
2
3
4
5
6
7
8
9
private static String asciiToHex(String asciiStr) {
char[] chars = asciiStr.toCharArray();
StringBuilder hex = new StringBuilder();
for (char ch : chars) {
hex.append(Integer.toHexString((int) ch));
}

return hex.toString();
}
  1. 截断一个十六进制为两个字符组
  2. 使用Integer.parseInt(hex, 16)方法强转字符
  3. 把每个字符添加的StringBuilder
1
2
3
4
5
6
7
8
9
10
private static String hexToAscii(String hexStr) {
StringBuilder output = new StringBuilder("");

for (int i = 0; i < hexStr.length(); i += 2) {
String str = hexStr.substring(i, i + 2);
output.append((char) Integer.parseInt(str, 16));
}

return output.toString();
}

测试

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
@Test
public static void whenHexToAscii() {
String asciiString = "www.matosiki.com";
String hexEquivalent =
"3737373737373265363236313635366336343735366536373265363336663664";

assertEquals(asciiString, hexToAscii(hexEquivalent));
}

@Test
public static void whenAsciiToHex() {
String asciiString = "www.matosiki.com";
String hexEquivalent =
"3737373737373265363236313635366336343735366536373265363336663664";

assertEquals(hexEquivalent, asciiToHex(asciiString));
}

总结: 以上同16进制与ascii码相互转换.

分享到

java中String类型转Integer或int类型

1. 使用 Integer.parseInt()

1
2
3
4
5
6
7
8
@Test
public void givenString_whenParsingInt_shouldConvertToInt() {
String givenString = "42";

int result = Integer.parseInt(givenString);

assertThat(result).isEqualTo(42);
}

2. 使用 Integer.valueOf() ,(不建议使用)内部使用缓存机制

1
2
3
4
5
6
7
8
@Test
public void givenString_whenCallingIntegerValueOf_shouldConvertToInt() {
String givenString = "42";

Integer result = Integer.valueOf(givenString);

assertThat(result).isEqualTo(new Integer(42));
}

3. 使用Integer构造方法

1
2
3
4
5
6
7
8
@Test
public void givenString_whenCallingIntegerConstructor_shouldConvertToInt() {
String givenString = "42";

Integer result = new Integer(givenString);

assertThat(result).isEqualTo(new Integer(42));
}

4. 使用 Integer.decode()方法

1
2
3
4
5
6
7
8
@Test
public void givenString_whenCallingIntegerDecode_shouldConvertToInt() {
String givenString = "42";

int result = Integer.decode(givenString);

assertThat(result).isEqualTo(42);
}

以上方法如果转换错误会抛出NumberFormatException异常

1
2
3
4
5
@Test(expected = NumberFormatException.class)
public void givenInvalidInput_whenParsingInt_shouldThrow() {
String givenString = "nan";
Integer.parseInt(givenString);
}

5. 使用guava 工具 ,如果解析失败,会跳过返回空值

1
2
3
4
5
6
7
8
@Test
public void givenString_whenTryParse_shouldConvertToInt() {
String givenString = "42";

Integer result = Ints.tryParse(givenString);

assertThat(result).isEqualTo(42);
}

总结: 使用java原生方式简单,但每次要考虑到解析异常也挺烦的,建议使用guava 的Ints.tryParse方法

分享到

java中StringTokenizer使用

使用特殊分割符分割String类型一般使用StringTokenize()

1. 使用逗号分割,再用枚举递归

1
2
3
4
5
6
7
8
public List<String> getTokens(String str) {
List<String> tokens = new ArrayList<>();
StringTokenizer tokenizer = new StringTokenizer(str, ",");
while (tokenizer.hasMoreElements()) {
tokens.add(tokenizer.nextToken());
}
return tokens;
}

2. 使用java8

1
2
3
4
5
public List<String> getTokensWithCollection(String str) {
return Collections.list(new StringTokenizer(str, ",")).stream()
.map(token -> (String) token) // 注意这里返回类型为Object类型需要强转
.collect(Collectors.toList());
}

3. 自定义分割符

1
tokens.add(tokenizer.nextToken("e"));

读取CVS文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
public List<String> getTokensFromFile( String path , String delim ) {
List<String> tokens = new ArrayList<>();
String currLine = "";
StringTokenizer tokenizer;
try (BufferedReader br = new BufferedReader(
new InputStreamReader(Application.class.getResourceAsStream(
"/" + path )))) {
while (( currLine = br.readLine()) != null ) {
tokenizer = new StringTokenizer( currLine , delim );
while (tokenizer.hasMoreElements()) {
tokens.add(tokenizer.nextToken());
}
}
} catch (IOException e) {
e.printStackTrace();
}
return tokens;
}

测试

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
public class TokenizerTest {

private MyTokenizer myTokenizer = new MyTokenizer();
private List<String> expectedTokensForString = Arrays.asList(
"Welcome" , "to" , "baeldung.com" );
private List<String> expectedTokensForFile = Arrays.asList(
"1" , "IND" , "India" ,
"2" , "MY" , "Malaysia" ,
"3", "AU" , "Australia" );

@Test
public void givenString_thenGetListOfString() {
String str = "Welcome,to,baeldung.com";
List<String> actualTokens = myTokenizer.getTokens( str );

assertEquals( expectedTokensForString, actualTokens );
}

@Test
public void givenFile_thenGetListOfString() {
List<String> actualTokens = myTokenizer.getTokensFromFile(
"data.csv", "|" );

assertEquals( expectedTokensForFile , actualTokens );
}
}
分享到

java8新增StringJoiner用法

使用java8的StringJoiner连接,附带分割和前缀和后缀

1. 添加元素

1
2
3
4
5
6
7
8
9
@Test
public void whenAddingElements_thenJoinedElements() {
StringJoiner joiner = new StringJoiner(",", PREFIX, SUFFIX);
joiner.add("Red")
.add("Green")
.add("Blue");

assertEquals(joiner.toString(), "[Red,Green,Blue]");
}

2. 使用for循环添加内容

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
@Test
public void whenAddingListElements_thenJoinedListElements() {
List<String> rgbList = new ArrayList<>();
rgbList.add("Red");
rgbList.add("Green");
rgbList.add("Blue");

StringJoiner rgbJoiner = new StringJoiner(
",", PREFIX, SUFFIX);

for (String color : rgbList) {
rgbJoiner.add(color);
}

assertEquals(rgbJoiner.toString(), "[Red,Green,Blue]");
}

使用构造方式

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
private String PREFIX = "[";
private String SUFFIX = "]";

@Test
public void whenEmptyJoinerWithoutPrefixSuffix_thenEmptyString() {
StringJoiner joiner = new StringJoiner(",");

assertEquals(0, joiner.toString().length());
}

@Test
public void whenEmptyJoinerJoinerWithPrefixSuffix_thenPrefixSuffix() {
StringJoiner joiner = new StringJoiner(
",", PREFIX, SUFFIX);

assertEquals(joiner.toString(), PREFIX + SUFFIX);
}

合并Joiner

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
@Test
public void whenMergingJoiners_thenReturnMerged() {
StringJoiner rgbJoiner = new StringJoiner(
",", PREFIX, SUFFIX);
StringJoiner cmybJoiner = new StringJoiner(
"-", PREFIX, SUFFIX);

rgbJoiner.add("Red")
.add("Green")
.add("Blue");
cmybJoiner.add("Cyan")
.add("Magenta")
.add("Yellow")
.add("Black");

rgbJoiner.merge(cmybJoiner);

assertEquals(
rgbJoiner.toString(),
"[Red,Green,Blue,Cyan-Magenta-Yellow-Black]");
}

使用流

1
2
3
4
5
6
7
8
9
@Test
public void whenUsedWithinCollectors_thenJoined() {
List<String> rgbList = Arrays.asList("Red", "Green", "Blue");
String commaSeparatedRGB = rgbList.stream()
.map(color -> color.toString())
.collect(Collectors.joining(","));

assertEquals(commaSeparatedRGB, "Red,Green,Blue");
}

总结: 构造一个简单分割的字符串,使用StringJoiner方式很不错,也可以使用流的方式.

分享到

模式名称

  1. what
  2. who
  3. when
  4. where
  5. why
  6. how to do
  7. how much

代理模式

what:

分享到