首页 > 后端开发 > Python教程 > 使用 Hugging Face 的 BART 模型总结文本

使用 Hugging Face 的 BART 模型总结文本

DDD
发布: 2025-01-07 07:28:40
原创
840 人浏览过

Summarizing Text Using Hugging Face

在当今快节奏的世界中,无论是快速浏览文章还是突出研究论文中的要点,将长篇内容压缩为简洁的摘要都是至关重要的。 Hugging Face 提供了一个强大的文本摘要工具:BART 模型。在本文中,我们将探讨如何利用 Hugging Face 的预训练模型,特别是 facebook/bart-large-cnn 模型来总结长文章和文本。

开始使用 Hugging Face 的 BART 模型

Hugging Face 为文本分类、翻译和摘要等 NLP 任务提供了多种模型。最流行的摘要模型之一是 BART(双向和自回归变压器),它经过训练可以从大型文档生成连贯的摘要。

第 1 步:安装 Hugging Face Transformers 库

要开始使用 Hugging Face 模型,您需要安装 Transformer 库。您可以使用 pip 来执行此操作:

pip install transformers
登录后复制

步骤 2:导入摘要管道

安装库后,您可以轻松加载预先训练的模型进行摘要。 Hugging Face 的管道 API 提供了使用 facebook/bart-large-cnn 等模型的高级接口,该模型已针对摘要任务进行了微调。

from transformers import pipeline

# Load the summarization model
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
登录后复制

第 3 步:运行摘要器

现在您已准备好摘要生成器,您可以输入任何长文本来生成摘要。以下是使用有关英国著名女演员玛吉·史密斯夫人的示例文章的示例。

ARTICLE = """ Dame Margaret Natalie Smith (28 December 1934 – 27 September 2024) was a British actress. Known for her wit in both comedic and dramatic roles, she had an extensive career on stage and screen for over seven decades and was one of Britain's most recognisable and prolific actresses. She received numerous accolades, including two Academy Awards, five BAFTA Awards, four Emmy Awards, three Golden Globe Awards and a Tony Award, as well as nominations for six Olivier Awards. Smith is one of the few performers to earn the Triple Crown of Acting.
Smith began her stage career as a student, performing at the Oxford Playhouse in 1952, and made her professional debut on Broadway in New Faces of '56. Over the following decades Smith established herself alongside Judi Dench as one of the most significant British theatre performers, working for the National Theatre and the Royal Shakespeare Company. On Broadway, she received the Tony Award for Best Actress in a Play for Lettice and Lovage (1990). She was Tony-nominated for Noël Coward's Private Lives (1975) and Tom Stoppard's Night and Day (1979).
Smith won Academy Awards for Best Actress for The Prime of Miss Jean Brodie (1969) and Best Supporting Actress for California Suite (1978). She was Oscar-nominated for Othello (1965), Travels with My Aunt (1972), A Room with a View (1985) and Gosford Park (2001). She portrayed Professor Minerva McGonagall in the Harry Potter film series (2001–2011). She also acted in Death on the Nile (1978), Hook (1991), Sister Act (1992), The Secret Garden (1993), The Best Exotic Marigold Hotel (2012), Quartet (2012) and The Lady in the Van (2015).
Smith received newfound attention and international fame for her role as Violet Crawley in the British period drama Downton Abbey (2010–2015). The role earned her three Primetime Emmy Awards; she had previously won one for the HBO film My House in Umbria (2003). Over the course of her career she was the recipient of numerous honorary awards, including the British Film Institute Fellowship in 1993, the BAFTA Fellowship in 1996 and the Society of London Theatre Special Award in 2010. Smith was made a dame by Queen Elizabeth II in 1990.
"""

# Generate the summary
summary = summarizer(ARTICLE, max_length=130, min_length=30, do_sample=False)

# Print the summary
print(summary)
登录后复制

输出:

[{'summary_text': 'Dame Margaret Natalie Smith (28 December 1934 – 27 September 2024) was a British actress. Known for her wit in both comedic and dramatic roles, she had an extensive career on stage and screen for over seven decades. She received numerous accolades, including two Academy Awards, five BAFTA Awards, four Emmy Awards, three Golden Globe Awards and a Tony Award.'}]
登录后复制

正如您从输出中看到的,摘要器将文章的要点浓缩为简短、可读的格式,突出显示了关键事实,例如她的职业生涯寿命和荣誉。

另一种方法:总结文件中的文本

在某些用例中,您可能希望从文件而不是硬编码字符串中读取文本。下面是一个更新的 Python 脚本,它从文本文件中读取文章并生成摘要。

from transformers import pipeline

# Load the summarizer pipeline
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

# Function to read the article from a text file
def read_article_from_file(file_path):
    with open(file_path, 'r') as file:
        return file.read()

# Path to the text file containing the article
file_path = 'article.txt'  # Change this to your file path

# Read the article from the file
ARTICLE = read_article_from_file(file_path)

# Get the summary
summary = summarizer(ARTICLE, max_length=130, min_length=30, do_sample=False)

# Print the summary
print(summary)
登录后复制

文件输入:

在这种情况下,您需要将文章保存到文本文件(示例中为article.txt),脚本将读取内容并对其进行总结。

结论

Hugging Face 的 BART 模型是自动文本摘要的绝佳工具。无论您是在处理长文章、研究论文还是任何大量文本,该模型都可以帮助您将信息提炼成简洁的摘要。

本文演示了如何将 Hugging Face 的预训练摘要模型集成到您的项目中,包括硬编码文本和文件输入。只需几行代码,您就可以在 Python 项目中启动并运行高效的摘要管道。

以上是使用 Hugging Face 的 BART 模型总结文本的详细内容。更多信息请关注PHP中文网其他相关文章!

来源:dev.to
本站声明
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系admin@php.cn
热门教程
更多>
最新下载
更多>
网站特效
网站源码
网站素材
前端模板