Problem:
You have a text field where users can input arbitrary text, and you need to extract all YouTube video URLs and their corresponding IDs.
Solution:
To extract YouTube video IDs from strings using a regular expression, follow these steps:
Define the Regex Pattern:
https?:\/\/(?:[0-9A-Z-]+\.)?(?:youtu\.be\/|youtube(?:-nocookie)?\.com\S*?[^\w\s-])([\w-]{11})(?=[^\w-]|$)(?![?=&+%\w.-]*(?:['"][^<>]*>|</a>))[?=&+%\w.-]*
Explanation:
Use the Regex to Parse the Text:
Use the re.findall function to search for all YouTube video URLs in the text.
import re def find_video_ids(text): pattern = r'https?:\/\/(?:[0-9A-Z-]+\.)?(?:youtu\.be\/|youtube(?:-nocookie)?\.com\S*?[^\w\s-])([\w-]{11})(?=[^\w-]|$)(?![?=&+%\w.-]*(?:['"][^<>]*>|</a>))[?=&+%\w.-]*' return re.findall(pattern, text)
Extract the Video IDs:
The re.findall function returns a list of matched video URLs. You can access the video IDs from each URL using [:11] (YouTube video IDs are 11 characters long).
def get_video_ids(text): video_urls = find_video_ids(text) return [url[:11] for url in video_urls]
Example:
text = """ Lorem Ipsum is simply dummy text. https://www.youtube.com/watch?v=DUQi_R4SgWo of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. https://www.youtube.com/watch?v=A_6gNZCkajU&feature=relmfu It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.""" video_ids = get_video_ids(text) print(video_ids) # Output: ['DUQi_R4SgWo', 'A_6gNZCkajU']
The above is the detailed content of How to Extract YouTube Video IDs from Strings Using Regular Expressions?. For more information, please follow other related articles on the PHP Chinese website!