Fast Punctuation Removal with Pandas
Punctuation removal is a common text cleaning task. While pandas str.replace is a widely used method, it may not be sufficiently performant for large datasets.
Alternatives to str.replace:
Benchmarks:
Considerations:
Code:
import pandas as pd import re # Regex.sub df['text'] = [re.compile(r'[^\w\s]+').sub('', x) for x in df['text'].tolist()] # str.translate punct = '!"#$%&\'()*+,-./:;<=>?@[\]^_`{|}~' transtab = str.maketrans(dict.fromkeys(punct, '')) df['text'] = '|'.join(df['text'].tolist()).translate(transtab).split('|')
The above is the detailed content of What is the Fastest Way to Remove Punctuation from a Pandas DataFrame?. For more information, please follow other related articles on the PHP Chinese website!