Table of Contents
给定a、b两个文件,各存放50亿个url,每个url各占用64字节,内存限制是4G,如何找出a、b文件共同的url?,50亿4g
Home php教程 php手册 给定a、b两个文件,各存放50亿个url,每个url各占用64字节,内存限制是4G,如何找出a、b文件共同的url?,50亿4g

给定a、b两个文件,各存放50亿个url,每个url各占用64字节,内存限制是4G,如何找出a、b文件共同的url?,50亿4g

Jun 13, 2016 am 09:19 AM
url common Memory occupy how store find out document yes of given limit

给定a、b两个文件,各存放50亿个url,每个url各占用64字节,内存限制是4G,如何找出a、b文件共同的url?,50亿4g

可以估计每个文件的大小为5G*64=300G,远大于4G。所以不可能将其完全加载到内存中处理。考虑采取分而治之的方法。 
遍历文件a,对每个url求取hash(url)%1000,然后根据所得值将url分别存储到1000个小文件(设为a0,a1,...a999)当中。这样每个小文件的大小约为300M。遍历文件b,采取和a相同的方法将url分别存储到1000个小文件(b0,b1....b999)中。这样处理后,所有可能相同的url都在对应的小文件(a0 vs b0, a1 vs b1....a999 vs b999)当中,不对应的小文件(比如a0 vs b99)不可能有相同的url。然后我们只要求出1000对小文件中相同的url即可。 
比如对于a0 vs b0,我们可以遍历a0,将其中的url存储到hash_map当中。然后遍历b0,如果url在hash_map中,则说明此url在a和b中同时存在,保存到文件中即可。 
如果分成的小文件不均匀,导致有些小文件太大(比如大于2G),可以考虑将这些太大的小文件再按类似的方法分成小小文件即可

 

昨天百度面试官问道的 今天研究一下

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
WWE 2K25: How To Unlock Everything In MyRise
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Large memory optimization, what should I do if the computer upgrades to 16g/32g memory speed and there is no change? Large memory optimization, what should I do if the computer upgrades to 16g/32g memory speed and there is no change? Jun 18, 2024 pm 06:51 PM

For mechanical hard drives or SATA solid-state drives, you will feel the increase in software running speed. If it is an NVME hard drive, you may not feel it. 1. Import the registry into the desktop and create a new text document, copy and paste the following content, save it as 1.reg, then right-click to merge and restart the computer. WindowsRegistryEditorVersion5.00[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SessionManager\MemoryManagement]"DisablePagingExecutive"=d

What to do if the 0x80004005 error code appears. The editor will teach you how to solve the 0x80004005 error code. What to do if the 0x80004005 error code appears. The editor will teach you how to solve the 0x80004005 error code. Mar 21, 2024 pm 09:17 PM

When deleting or decompressing a folder on your computer, sometimes a prompt dialog box "Error 0x80004005: Unspecified Error" will pop up. How should you solve this situation? There are actually many reasons why the error code 0x80004005 is prompted, but most of them are caused by viruses. We can re-register the dll to solve the problem. Below, the editor will explain to you the experience of handling the 0x80004005 error code. Some users are prompted with error code 0X80004005 when using their computers. The 0x80004005 error is mainly caused by the computer not correctly registering certain dynamic link library files, or by a firewall that does not allow HTTPS connections between the computer and the Internet. So how about

Sources say Samsung Electronics and SK Hynix will commercialize stacked mobile memory after 2026 Sources say Samsung Electronics and SK Hynix will commercialize stacked mobile memory after 2026 Sep 03, 2024 pm 02:15 PM

According to news from this website on September 3, Korean media etnews reported yesterday (local time) that Samsung Electronics and SK Hynix’s “HBM-like” stacked structure mobile memory products will be commercialized after 2026. Sources said that the two Korean memory giants regard stacked mobile memory as an important source of future revenue and plan to expand "HBM-like memory" to smartphones, tablets and laptops to provide power for end-side AI. According to previous reports on this site, Samsung Electronics’ product is called LPWide I/O memory, and SK Hynix calls this technology VFO. The two companies have used roughly the same technical route, which is to combine fan-out packaging and vertical channels. Samsung Electronics’ LPWide I/O memory has a bit width of 512

Samsung announced the completion of 16-layer hybrid bonding stacking process technology verification, which is expected to be widely used in HBM4 memory Samsung announced the completion of 16-layer hybrid bonding stacking process technology verification, which is expected to be widely used in HBM4 memory Apr 07, 2024 pm 09:19 PM

According to the report, Samsung Electronics executive Dae Woo Kim said that at the 2024 Korean Microelectronics and Packaging Society Annual Meeting, Samsung Electronics will complete the verification of the 16-layer hybrid bonding HBM memory technology. It is reported that this technology has passed technical verification. The report also stated that this technical verification will lay the foundation for the development of the memory market in the next few years. DaeWooKim said that Samsung Electronics has successfully manufactured a 16-layer stacked HBM3 memory based on hybrid bonding technology. The memory sample works normally. In the future, the 16-layer stacked hybrid bonding technology will be used for mass production of HBM4 memory. ▲Image source TheElec, same as below. Compared with the existing bonding process, hybrid bonding does not need to add bumps between DRAM memory layers, but directly connects the upper and lower layers copper to copper.

Micron: HBM memory consumes 3 times the wafer volume, and production capacity is basically booked for next year Micron: HBM memory consumes 3 times the wafer volume, and production capacity is basically booked for next year Mar 22, 2024 pm 08:16 PM

This site reported on March 21 that Micron held a conference call after releasing its quarterly financial report. At the conference, Micron CEO Sanjay Mehrotra said that compared to traditional memory, HBM consumes significantly more wafers. Micron said that when producing the same capacity at the same node, the current most advanced HBM3E memory consumes three times more wafers than standard DDR5, and it is expected that as performance improves and packaging complexity intensifies, in the future HBM4 This ratio will further increase. Referring to previous reports on this site, this high ratio is partly due to HBM’s low yield rate. HBM memory is stacked with multi-layer DRAM memory TSV connections. A problem with one layer means that the entire

Lexar launches Ares Wings of War DDR5 7600 16GB x2 memory kit: Hynix A-die particles, 1,299 yuan Lexar launches Ares Wings of War DDR5 7600 16GB x2 memory kit: Hynix A-die particles, 1,299 yuan May 07, 2024 am 08:13 AM

According to news from this website on May 6, Lexar launched the Ares Wings of War series DDR57600CL36 overclocking memory. The 16GBx2 set will be available for pre-sale at 0:00 on May 7 with a deposit of 50 yuan, and the price is 1,299 yuan. Lexar Wings of War memory uses Hynix A-die memory chips, supports Intel XMP3.0, and provides the following two overclocking presets: 7600MT/s: CL36-46-46-961.4V8000MT/s: CL38-48-49 -1001.45V In terms of heat dissipation, this memory set is equipped with a 1.8mm thick all-aluminum heat dissipation vest and is equipped with PMIC's exclusive thermal conductive silicone grease pad. The memory uses 8 high-brightness LED beads and supports 13 RGB lighting modes.

How to remove comment restrictions on video accounts? What is the word limit for comments on a video account? How to remove comment restrictions on video accounts? What is the word limit for comments on a video account? Mar 22, 2024 pm 02:11 PM

With the popularity of video accounts on social media, more and more people are beginning to use video accounts to share their daily lives, insights and stories. However, some users may experience comments being restricted, which can leave them confused and dissatisfied. 1. How to remove comment restrictions on video accounts? To lift the restriction on commenting on a video account, you must first ensure that the account has been properly registered and real-name authentication has been completed. Video accounts have requirements for comments. Only accounts that have completed real-name authentication can lift comment restrictions. If there are any abnormalities in the account, these issues need to be resolved before comment restrictions can be lifted. 2. Comply with the community standards of the video account. Video accounts have certain standards for comment content. If the comment involves illegal content, you will be restricted from speaking. To lift comment restrictions, you need to abide by the community of the video account

Kingbang launches new DDR5 8600 memory, offering CAMM2, LPCAMM2 and regular models to choose from Kingbang launches new DDR5 8600 memory, offering CAMM2, LPCAMM2 and regular models to choose from Jun 08, 2024 pm 01:35 PM

According to news from this site on June 7, GEIL launched its latest DDR5 solution at the 2024 Taipei International Computer Show, and provided SO-DIMM, CUDIMM, CSODIMM, CAMM2 and LPCAMM2 versions to choose from. ▲Picture source: Wccftech As shown in the picture, the CAMM2/LPCAMM2 memory exhibited by Jinbang adopts a very compact design, can provide a maximum capacity of 128GB, and a speed of up to 8533MT/s. Some of these products can even be stable on the AMDAM5 platform Overclocked to 9000MT/s without any auxiliary cooling. According to reports, Jinbang’s 2024 Polaris RGBDDR5 series memory can provide up to 8400

See all articles