python - 读取txt文件寻找匹配的混拼单词

Question

有一个类似字典的txt文件，读取后输入一段混乱的字母寻找匹配的单词（Word Jumble）。
并且print在这么多数量的单词中有几个是匹配的
输出结果应该类似

ringa_lee · Answer

首先，string.split() 會將字串分割成列表, list.append() 則將整個參數作為一個元素添加到列表中，因此，你程式碼中的

wordlist.append(line.split(','))

會讓wordlist成為列表的列表，也就是wordlist中每個元素都是一個列表，而不是你期望的單字，你應該用：

wordlist.extend(line.split(','))

或

wordlist += line.split(',')

其次，readlines() 傳回的字串包含行尾的換行符，如下程式碼所示：

>>> 'abbe
'.split()
['abbe']
>>> 'abbe
'.split(',')
['abbe
']

應該把line.split(',')改为 line.split()。

參考代碼如下：

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import argparse

def jumbler(jumble, dict_file_name):
    """
    supply an excellent docstring here
    """

    # first you must open the file

    # second you must read each word from the file and perform an
    # appropriate comparison of each with 'jumble'; you need to count the
    # number of lines read from the file

    # if a word matches 'jumble', you are to print the word on a line by itself

    # after you have read each word from the file and compared, you need to
    # close the file

    # assume that there were MATCHES words that matched, and NLINES in the file
    # if there was a single match, you need to print
    # "1 match in NLINES words", where NLINES is replaced by the value of NLINES
    # if there were two or more matches, you need to print
    # "MATCHES matches in NLINES words"
    # if there were no matches, you need to print
    # "No matches"
    line_count = 0
    match_count = 0
    dictionary = open(dict_file_name,"r")
    for line in dictionary.readlines():
        line_count += 1
        for word in line.split():
            if sorted(str(jumble)) == sorted(str(word)):
                match_count += 1
                print(word)
    if match_count == 0:
        print("No matches")
    else:
        print('%d matches in %d words' %(match_count, line_count))
    dictionary.close()



def main():
    """
    collect command arguments and invoke jumbler()
    inputs:
        none, fetches arguments using argparse
    effects:
        calls jumbler()
    """
    parser = argparse.ArgumentParser(description="Solve a jumble (anagram)")
    parser.add_argument("jumble", type=str, help="Jumbled word (anagram)")
    parser.add_argument('wordlist', type=str,
                        help="A text file containing dictionary words, one word per line.")
    args = parser.parse_args()  # gets arguments from command line
    jumble = args.jumble
    wordlist = args.wordlist
    jumbler(jumble, wordlist)

if __name__ == "__main__":
    main()

天蓬老师 · Answer

原因很簡單，問題出在這裡：

for line in dictornary.readlines():
    wordlist.append(line.split(','))    # line.split()得到的是一个list，所以wrodlist最终会是一个二维列表

你稍微修改一下程式碼應該就可以：

wordlist.extend(line.split(','))

不過你把所有的單字都儲存在list里是很耗费内存的，如果词库文件特别大的话……
个人建议你只用把匹配的词存在list裡，不匹配的完全沒有必要理會。

我根据两位评论修改了list.extend,但是结果还是满满的no matches..不是很理解