Splitting Strings on Spaces, Preserving Quoted Substrings in Java
Java's string splitting is a versatile tool, but it can stumble upon challenges when encountering quoted substrings. To efficiently split a string based on spaces while treating quoted substrings as a single word, consider the following method:
Using regular expressions, the solution leverages a specific pattern that identifies tokens as either non-quoted sequences of non-whitespace characters or quoted sequences containing any number of characters. The result is a list of tokenized elements that accurately preserve the quoted substrings.
Pattern pattern = Pattern.compile("([^\"]\S*|\".+?\")\s*"); Matcher matcher = pattern.matcher(str); List<String> tokens = new ArrayList<>(); while (matcher.find()) { String token = matcher.group(1); tokens.add(token.replace("\"", "")); // Remove surrounding quotes if needed }
In this example, the string "Location "Welcome to india" Bangalore Channai "IT city" Mysore" will be tokenized as:
Location Welcome to india Bangalore Channai IT city Mysore
This method elegantly handles the preservation of quoted substrings, ensuring that meaningful phrases like "Welcome to india" or "IT city" are maintained as single tokens.
The above is the detailed content of How to Split Strings on Spaces While Preserving Quoted Substrings in Java?. For more information, please follow other related articles on the PHP Chinese website!