Python에서 단어 수 계산 아이디어에 대한 자세한 설명-파이썬 튜토리얼-php.cn

Python에서 단어 수 계산 아이디어에 대한 자세한 설명

不言

풀어 주다： 2018-05-08 16:18:49

원래의

3384명이 탐색했습니다.

이 기사에서는 Python에서 단어 수 계산에 대한 자세한 아이디어를 주로 소개합니다. 또한 타사 모듈을 사용하지 않는 솔루션도 제공합니다. 관심 있는 친구들은 함께 살펴보세요

문제 설명:

Python 사용 문자열 s와 숫자 n을 가져와 s에서 가장 자주 나타나는 n개의 단어를 반환하는 count_words() 함수를 구현하세요. 반환 값은 발생 빈도가 가장 높은 n 단어와 해당 시간을 포함하는 튜플 목록입니다. 즉, [(, ), (, )입니다. ;), ... ], 발생 순서대로 정렬됩니다.

모든 입력은 소문자이고 구두점이나 기타 문자가 포함되어 있지 않다고 가정할 수 있습니다(문자와 단일 공백만). 발생 횟수가 동일한 경우 알파벳 순서로 정렬됩니다.

예:

print count_words("betty bought a bit of butter but the butter was bitter",3)

로그인 후 복사

출력:

[('butter', 2), ('a', 1), ('betty', 1)]

해결 아이디어 문제 ：

1. 문자열 s를 공백으로 분할하여 ['betty', 'bought', 'a', 'bit', 'of', 'butter', ' but', 'the', 'butter', 'was', 'bitter']

2. 맵리스트를 생성하고 Split_s를 다음과 같은 튜플 목록으로 변환합니다: [('betty', 1), ( ' 샀다', 1), ('a', 1), ('bit', 1), ('of', 1), ('버터', 1), ('but', 1), ('the ' , 1), ('butter', 1), ('was', 1), ('bitter', 1)]

3. 튜플의 첫 번째 인덱스 값이 동일한 경우 맵리스트의 요소를 병합합니다. , 두 번째 인덱스 값이 추가됩니다.

// 참고: defaultdict 사용을 준비하세요. 획득한 데이터는 다음과 같습니다: {'betty': 1, 'bought': 1, 'a': 1, 'bit': 1, 'of': 1, 'butter': 2, 'but': 1, 'the': 1, 'was': 1, 'bitter': 1}

4. 키를 기준으로 알파벳순으로 정렬하면 다음과 같은 결과가 나옵니다: [('a', 1), ('betty', 1) , ( '비트', 1), ('쓴', 1), ('구입', 1), ('그러나', 1), ('버터', 2), ('의', 1), ( 'the ', 1), ('was', 1)]

5. 2차 정렬을 수행하고 값을 기준으로 정렬하여 다음을 얻습니다: [('butter', 2), ('a', 1), ( '베티', 1), ('비트', 1), ('쓴', 1), ('샀다', 1), ('그러나', 1), ('의', 1), ('the ', 1), ('was', 1)]

6. 슬라이싱을 사용하여 더 높은 빈도로 * 데이터 그룹을 추출합니다

요약: defaultdict가 없는 정렬 결과는 python3에서도 정확하지만 python2에서는 올바르지 않습니다. defaultdict 자체에는 순서가 없습니다. 목록을 구별하려면 정렬해야 합니다.

타사 모듈을 사용하지 않고 직접 작성해 볼 수도 있습니다.

해결책 1(defaultdict 사용):

from collections import defaultdict
"""Count words."""
def count_words(s, n):
  """Return the n most frequently occuring words in s."""
  split_s = s.split()
  map_list = [(k,1) for k in split_s]
  output = defaultdict(int)
  for d in map_list:
    output[d[0]] += d[1]
  output1 = dict(output)
  top_n = sorted(output1.items(), key=lambda pair:pair[0], reverse=False)
  top_n = sorted(top_n, key=lambda pair:pair[1], reverse=True)
  return top_n[:n]
def test_run():
  """Test count_words() with some inputs."""
  print(count_words("cat bat mat cat bat cat", 3))
  print(count_words("betty bought a bit of butter but the butter was bitter", 4))
if __name__ == &#39;__main__&#39;:
  test_run()

로그인 후 복사

해결책 2(카운터 사용)

from collections import Counter
"""Count words."""
def count_words(s, n):
  """Return the n most frequently occuring words in s."""
  split_s = s.split()
  split_s = Counter(name for name in split_s)
  print(split_s)
  top_n = sorted(split_s.items(), key=lambda pair:pair[0], reverse=False)
  print(top_n)
  top_n = sorted(top_n, key=lambda pair:pair[1], reverse=True)
  print(top_n)
  return top_n[:n]
def test_run():
  """Test count_words() with some inputs."""
  print(count_words("cat bat mat cat bat cat", 3))
  print(count_words("betty bought a bit of butter but the butter was bitter", 4))
if __name__ == &#39;__main__&#39;:
  test_run()

로그인 후 복사