How to extract domain name using python regular expression
淡淡烟草味
淡淡烟草味 2017-06-22 11:51:53
0
2
980
<script type="application/ld+json">{
    "@context": "http://schema.org",
    "@type": "SaleEvent",
    "name": "10% Off First Orders",
    "url": "https://www.myvouchercodes.co.uk/coggles",
    "image": "https://mvp.tribesgds.com/dyn/oh/Ow/ohOwXIWglMg/_/mQR5xLX5go8/m0Ys/coggles-logo.png",
    "startDate": "2017-02-17",
    "endDate": "2017-12-31",
    "location": {
        "@type": "Place",
        "name": "Coggles",
        "url": "coggles.co.uk",
        "address": "Coggles"
    },
    "description": "Get the top branded fashion items from Coggles at discounted prices. Apply this code and enjoy savings on your purchase.",
    "eventStatus": "EventScheduled"
}</script>

How to use python regular expression to extract the coggles.co.uk domain name from this script? I hope experts from all walks of life can show me their skills...

淡淡烟草味
淡淡烟草味

reply all(2)
ringa_lee

When implementing regularization, just make sure that your calibration/features are unique. But the symbol "url" is not the only one. At this time @prolifes' method is very good.

If you must implement regular implementation, you need to use zero-width assertions. Of course, the translation of this word is relatively straightforward, which leads to many misunderstandings. It actually means matching at the specified position, and the width of the position is 0.

Here we can see the "url" we need in "location", which can be used as location information.

The code is as follows:

re.search('(?<=location).+?"url": "([^"]+)"', string, re.DOTALL).group(1)

Let me explain a little bit,
(?<=location)This place means that there must be a location in front. If there is any later, write it like this: (?=location)
re.DOTALLThis is necessary because these strings have crossed lines. Its function is to expand the string matching range of . to include newlines.
"([^"]+)"This place is my habit, [^"] means all characters that are not ", which matches all strings in double quotes.

世界只因有你

This is a pretty standard json, if you want to be more rough, convert it directly into json

import json

str = '''
<script type="application/ld+json">{
    "@context": "http://schema.org",
    "@type": "SaleEvent",
    "name": "10% Off First Orders",
    "url": "https://www.myvouchercodes.co.uk/coggles",
    "image": "https://mvp.tribesgds.com/dyn/oh/Ow/ohOwXIWglMg/_/mQR5xLX5go8/m0Ys/coggles-logo.png",
    "startDate": "2017-02-17",
    "endDate": "2017-12-31",
    "location": {
        "@type": "Place",
        "name": "Coggles",
        "url": "coggles.co.uk",
        "address": "Coggles"
    },
    "description": "Get the top branded fashion items from Coggles at discounted prices. Apply this code and enjoy savings on your purchase.",
    "eventStatus": "EventScheduled"
}</script>
'''

d = json.loads(re.search('({[\s\S]*})', str).group(1))
print d['location']['url']
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template