In fact, for Chinese, especially Chinese in this format, I don’t recommend using regular expressions, although it can be achieved with difficulty:
# coding: utf8
import re
filename = '2.txt'
patern = re.compile(r'^\d+ (\S+).*?(\S+)')
with open(filename) as f:
for i in f:
result = patern.findall(i[:-1])
if result and len(result[0]) == 2:
print result[0][0], result[0][1]
# 输出:
男 北京
女 河北
男 山东
You can also use the split method (suggestion):
# coding: utf8
filename = '2.txt'
with open(filename) as f:
for i in f:
result = i.split()
print result[1], result[-1]
# 输出:
男 北京
女 河北
男 山东
In fact, for Chinese, especially Chinese in this format, I don’t recommend using regular expressions, although it can be achieved with difficulty:
You can also use the
split
method (suggestion
):