json - python中用正则表达式去掉字符串中的冒号
黄舟
黄舟 2017-04-18 10:30:40
0
4
1016

初学python,最近尝试爬数据,json字符串的value中有冒号,需要去掉。我的代码如下。
a和b都是value中会有冒号的字符串

import re
a = "Title:'Intern: Customer Experience + Innovation (CX+I) Intern Brands'"
b = "cmp:'Adecco: USA',cmpesc:'Adecco: USA'"
result = re.sub('^(?:Title|cmp|cmpesc):.+(\:)','', a)

代码执行结果是只剩 Customer Experience + Innovation (CX+I) Intern Brands',之前的内容全被删除了,而我想要的效果是只删intern之后的那个冒号(title后的冒号要保留)。
请问大家该如何修改?

黄舟
黄舟

人生最曼妙的风景,竟是内心的淡定与从容!

Antworte allen(4)
大家讲道理
import re
result = re.sub('^(Title|cmp|cmpesc:)(.+):(.*)',
                '\\1\\2\\3',
                "Title:'Intern: Customer Experience + Innovation (CX+I) Intern Brands'")

print(result) # Title:'Intern Customer Experience + Innovation (CX+I) Intern Brands'
PHPzhong

这样的话:

''.join(re.split('(?<![Title|cmp|cmpesc]):',a))

就好了

巴扎黑

果然是我看错题目了....

小葫芦

不用去掉冒号,直接变成字典就行了~

>>> a = "Title:'Intern: Customer Experience + Innovation (CX+I) Intern Brands'";\
b = "cmp:'Adecco: USA',cmpesc:'Adecco: USA'"
>>> dict([s.split(':',1) for s in a.split(',')])
{'Title': "'Intern: Customer Experience + Innovation (CX+I) Intern Brands'"}
>>> dict([s.split(':',1) for s in b.split(',')])
{'cmpesc': "'Adecco: USA'", 'cmp': "'Adecco: USA'"}
>>> 

写成函数

a = "Title:'Intern: Customer Experience + Innovation (CX+I) Intern Brands'"
b = "cmp:'Adecco: USA',cmpesc:'Adecco: USA'"

def fn(x):
    return dict((s.split(':',1) for s in x.replace("'","").split(',')))

print(fn(a))
print(fn(b))

# {'Title': 'Intern: Customer Experience + Innovation (CX+I) Intern Brands'}
# {'cmp': 'Adecco: USA', 'cmpesc': 'Adecco: USA'}
Neueste Downloads
Mehr>
Web-Effekte
Quellcode der Website
Website-Materialien
Frontend-Vorlage