String segmentation in Python
In Python, you can use the str.split() method to split the string according to the delimiter Or regular expression for word segmentation. By default, str.split() will split the string according to whitespace characters (including spaces, tabs, and newlines).
Use default delimiter
The following code demonstrates how to split a string into a list of words using default delimiter:
text = "many fancy word \nhello \thi" words = text.split() print(words) # 输出:['many', 'fancy', 'word', 'hello', 'hi']
In this example, the string text is split into the following word list: ['many', 'fancy', 'word', 'hello', 'hi'].
Use regular expression delimiter
You can also specify a regular expression as the delimiter. This allows you to tokenize strings based on more complex patterns.
The following code demonstrates how to use regular expressions to split a string into a list of words, where whitespace characters or multiple consecutive spaces are considered delimiters:
import re text = "many fancy word \nhello \thi" white_space_regex = r"\s+" words = re.split(white_space_regex, text) print(words) # 输出:['many', 'fancy', 'word', 'hello', 'hi']
Here In this case, the regular expression r"s" matches one or more whitespace characters, so it splits the string into a list of words, each of which has at least one whitespace character between them.
Notes
The above is the detailed content of How do I split a string into words using Python?. For more information, please follow other related articles on the PHP Chinese website!