問題:
您有一個文字字段,用戶可以在其中輸入任意內容文本,您需要提取所有YouTube 影片URL 及其對應的ID。
解決方案:
要使用正規表示式從字串中擷取YouTube 影片ID,請依照下列步驟操作:
定義正規表示式模式:
https?:\/\/(?:[0-9A-Z-]+\.)?(?:youtu\.be\/|youtube(?:-nocookie)?\.com\S*?[^\w\s-])([\w-]{11})(?=[^\w-]|$)(?![?=&+%\w.-]*(?:['"][^<>]*>|</a>))[?=&+%\w.-]*
說明:
使用正規表示式解析文字:
使用 re.findall 函數搜尋所有 YouTube 影片URL這text.
import re def find_video_ids(text): pattern = r'https?:\/\/(?:[0-9A-Z-]+\.)?(?:youtu\.be\/|youtube(?:-nocookie)?\.com\S*?[^\w\s-])([\w-]{11})(?=[^\w-]|$)(?![?=&+%\w.-]*(?:['"][^<>]*>|</a>))[?=&+%\w.-]*' return re.findall(pattern, text)
提取影片 ID:
re.findall 函數傳回符合的影片 URL 清單。您可以使用 [:11] 從每個網址存取影片 ID(YouTube 影片 ID 長度為 11 個字元)。
def get_video_ids(text): video_urls = find_video_ids(text) return [url[:11] for url in video_urls]
範例:
text = """ Lorem Ipsum is simply dummy text. https://www.youtube.com/watch?v=DUQi_R4SgWo of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. https://www.youtube.com/watch?v=A_6gNZCkajU&feature=relmfu It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.""" video_ids = get_video_ids(text) print(video_ids) # Output: ['DUQi_R4SgWo', 'A_6gNZCkajU']
以上是如何使用正規表示式從字串中提取 YouTube 影片 ID?的詳細內容。更多資訊請關注PHP中文網其他相關文章!