Home Operation and Maintenance Safety Use lexical analysis to extract domain names and IPs

Use lexical analysis to extract domain names and IPs

Dec 25, 2019 pm 01:08 PM
ip domain name extract lexical analysis

Use lexical analysis to extract domain names and IPs

Background

When analyzing the logs, I found that some log parameters contained other URLs, for example:

Use lexical analysis to extract domain names and IPs

##Extract the URL (xss.ha.ckers.org) in the request parameters, and then compare it with the threat intelligence database. If it hits the blacklist, it will be blacklisted. If it is not in the blacklist or the company's whitelist, you can mark it first and focus on analysis later.

Extract URL

There are many articles on the Internet about URL extraction, most of which use regular expressions. The method is simple but not very accurate. I provide a method here: use lexical analysis to extract domain names and IPs. The idea is borrowed from this article:

https://blog.csdn.net/breaksoftware/article/details/7009209. If you are interested, you can take a look. Facts have proved that following the master really improves your posture.

The original text is in C version, here I wrote a similar one in Python for your reference.

Common URL classification

Use lexical analysis to extract domain names and IPs

Observation can be seen: the IP form of URL structure is the simplest: 4 numbers less than 255 are divided by.; domain form comparison Complex, but they have something in common: they all have the top-level domain name .com.

Define legal characters:

Use lexical analysis to extract domain names and IPs

Top-level domain name list:

Use lexical analysis to extract domain names and IPs

Domain name form extraction: such as

www.baidu.com.

Use lexical analysis to extract domain names and IPs

Use lexical analysis to extract domain names and IPs

IP format extraction: such as 192.168.1.1.

Use lexical analysis to extract domain names and IPs

while (i < len(z) and z[i].isdigit()):
                i = i + 1
                ip_v1 = True
                reti = i            if i < len(z) and z[i] == &#39;.&#39;:
                i = i + 1
                reti = i            else:
                tokenType = TK_OTHER
                reti = 1while (i < len(z) and z[i].isdigit()):
                i = i + 1
                ip_v2 = True
            if i < len(z) and z[i] == &#39;.&#39;:
                i = i + 1
            else:                if tokenType != TK_DOMAIN:
                    tokenType = TK_OTHER
                    reti = 1while (i < len(z) and z[i].isdigit()):
                i = i + 1
                ip_v3 = True
            if i < len(z) and z[i] == &#39;.&#39;:
                i = i + 1
            else:                if tokenType != TK_DOMAIN:
                    tokenType = TK_OTHER
                    reti = 1while (i < len(z) and z[i].isdigit()):
                i = i + 1
                ip_v4 = True

            if i < len(z) and z[i] == &#39;:&#39;:
                i = i + 1
            while (i < len(z) and z[i].isdigit()):
                i = i + 1

            if ip_v1 and ip_v2 and ip_v3 and ip_v4:                
                self.urls.append(z[0:i])                
                return reti, tokenType            
            else:                
                if tokenType != TK_DOMAIN:
                    tokenType = TK_OTHER
                    reti = 1
Copy after login

Mixed form extraction: such as 1234.com.

Scan the first half of 1234, which conforms to the characteristics of the IP form, but it is found that the code will report an exception, so the IP processing code segment needs to be added to determine whether the suffix is ​​a top-level domain name:

Use lexical analysis to extract domain names and IPs

Result test

Test data:

Use lexical analysis to extract domain names and IPs

Running result:

Use lexical analysis to extract domain names and IPs

This is just a preliminary version, please correct me if there are any bugs.

Conclusion

In the past, I only focused on writing code with my head down, ignoring the thinking and summary afterwards. Now I’m trying to change it, and while working, I’m refining and summarizing it. When I encounter something that feels good, I try to write it as a tool and open source it to share with everyone.

Code Portal:

https://github.com/skskevin/UrlDetect/blob/master/tool/domainExtract/domainExtract.py

Recommended related article tutorials:

Web server security

The above is the detailed content of Use lexical analysis to extract domain names and IPs. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

How do websites set black/whitelist IP restrictions and country and city IP access restrictions through nginx? How do websites set black/whitelist IP restrictions and country and city IP access restrictions through nginx? Jun 01, 2023 pm 05:27 PM

1. Black/white list IP restricted access configuration nginx There are several ways to configure black and white lists. Here are only two commonly used methods. 1. The first method: allow, denydeny and allow instructions belong to ngx_http_access_module. nginx loads this module by default, so it can be used directly. This method is the simplest and most direct. The setting is similar to the firewall iptable. How to use: Add directly to the configuration file: #Whitelist settings, followed by allow is accessible IPlocation/{allow123.13.123.12;allow23.53.32.1/100;denyall;}#Blacklist settings,

What does binding ip and mac mean? What does binding ip and mac mean? Mar 09, 2023 pm 04:44 PM

IP and mac binding refers to associating a specific IP address with a specific MAC address, so that only the device using the MAC address can use the IP address for network communication. Binding ip and mac can prevent the IP address of the bound host from being spoofed. Prerequisites: 1. The MAC address is unique and cannot be spoofed; it can only be bound to hosts on the network directly connected to the router (that is, The host's gateway is on the router).

The requested control is invalid NET HELPMSG 2191: 2 simple fixes The requested control is invalid NET HELPMSG 2191: 2 simple fixes Apr 15, 2023 am 09:13 AM

In the TCP/IP protocol suite, Domain Name System is one of the protocols that provides name resolution services for mapping computer names to IP addresses. However, sometimes it malfunctions, resulting in errors such as The requested control is not valid for this service NETHELPMSG2191. DNS clients and servers work together to provide computer name to IP address mapping name resolution services for computers and users. After installing Windows, client and server versions of the operating system have the client service enabled by default. Once you specify the server's IP address in your TCP/IP network configuration, the DNS client queries the server to discover domain controllers and resolve computer names to IP addresses. only in service

How to check IP address on WeChat How to check IP address on WeChat May 31, 2023 am 09:16 AM

How to check the IP address on WeChat: 1. Log in to the computer version of WeChat, right-click the taskbar at the bottom of the screen, and click "Task Manager"; 2. When the task manager pops up, click "Details" in the lower left corner; 3. Task management Enter the "Performance" option of the browser and click "Open Resource Monitor"; 4. Select "Network" and check the WeChat process "Wechat.exe"; 5. Click "TCP Connection" below to monitor the WeChat network IP related situation. Sending a message and getting a reply will reveal the other person's IP address.

How to set directory whitelist and ip whitelist in nginx How to set directory whitelist and ip whitelist in nginx May 18, 2023 pm 03:52 PM

1. Set the directory whitelist: There is no restriction on the specified request path. If there is no restriction on the request path to the api directory, it can be written as server{location/app{proxy_passhttp://192.168.1.111:8095/app ;limit_connconn20;limit_rate500k;limit_reqzone=fooburst=5nodelay;}location/app/api{proxy_passhttp://192.168.1.111:8095/app/api}}#Because nginx will give priority to accurate matching

How to extract RAR files on iPhone How to extract RAR files on iPhone Jul 12, 2023 pm 07:53 PM

Many times, very large files are difficult to share between devices, especially smartphones and the like. Therefore, these files are first archived/compressed into RAR files and then sent to another device for sharing. But the problem is that RAR files are not easy to extract on iPhone. To extract a zip file, it only takes one tap. Not many people know the process of extracting RAR files on iPhone, and for beginners, the steps can be confusing. This can be done using the default apps on your iPhone called Shortcuts. Here we explain step by step how to extract any RAR file on iPhone using Shortcuts app. How to Extract RAR Files on iPhone Step 1: First, you

How does NGINX count the PV, UV, and independent IP of the website? How does NGINX count the PV, UV, and independent IP of the website? May 19, 2023 am 09:13 AM

Concept: uv (uniquevisitor): unique visitor, each independent Internet computer (based on cookies) is regarded as a visitor, and the number of visitors who visit your website within a day (00:00-24:00). Visits to the same cookie within a day are only counted once PV (pageview): visits, that is, page views or clicks, each visit to the website by the user is recorded once. When a user visits the same page multiple times, the total number of visits is counted. Independent IP: The same IP address is only counted once within 00:00-24:00. Friends who do website optimization are most concerned about this. Let me first state the environment. This run nginx version 1.7, the backend tomcat runs dynamic

What should I do if my wifi shows no IP allocation? What should I do if my wifi shows no IP allocation? Aug 30, 2023 am 11:58 AM

Solution to wifi showing no IP allocation: 1. Restart the device and router, turn off the Wi-Fi connection on the device, turn off the device, turn off the router, wait a few minutes, then reopen the router to connect to wifi; 2. Check the router settings and restart DHCP, make sure the DHCP function is enabled; 3. Reset network settings, which will delete all saved WiFi networks and passwords. Please make sure they are backed up before performing this operation; 4. Update the router firmware, log in to the router management interface, and find the firmware Update options and follow the prompts.

See all articles