Twitter Memes Dataset Overview with PageRank-Mysql Tutorial-php.cn

This is the last of three blog posts from this summer internship project showcasing how to answer questions concerning big datasets stored in MongoDB using MongoDBs frameworks and connectors. Once wed familiarized ourselves with a sizable

This is the last of three blog posts from this summer internship project showcasing how to answer questions concerning big datasets stored in MongoDB using MongoDB’s frameworks and connectors.

Once we’d familiarized ourselves with a sizable amount of data from the Flights dataset, the next step was to move towards really BIG data: Twitter Memes. SNAP (Stanford Network Analysis Project) provides free large network data sets. In particular, we were interested in the Twitter Memes dataset from the 2008 presidential election generated by MemeTracker, a combined project between Stanford and Cornell which created news cycle maps from news articles and blog posts. This dataset is a compilation of blogs, news articles, and other media postings that pertain to the presidential election. In particular, the project focused on the time lapse between the first mention of a quote and the time it took for the quote to be picked up by mass media. However, our analysis focuses again on the PageRank and importance of both the individual URLs and websites.

The entire dataset contains over 96 million documents and over 418 million links between them. To begin, we focused solely on the memes during the month of April in 2009, which had over 15 million documents. The goal was to run PageRank on this dataset (where all web pages form a graph). The higher the PageRank of a web page, the more important the page. Web pages with a relatively high PageRank usually have a high ratio of incoming links to outgoing links compared to all other pages in the graph.

Disclaimer: As we’d learn through this project, the pages with the most PageRank do not necessarily have to be related to the 2008 presidential election.

Importing the Data

The source file quotes_2009-04.txt was 11G. It came in this continuous format:

P       http://blogs.abcnews.com/politicalpunch/2008/09/obama-says-mc-1.html
T       2008-09-09 22:35:24
Q       that's not change
Q       you know you can put lipstick on a pig
Q       what's the difference between a hockey mom and a pit bull lipstick
Q       you can wrap an old fish in a piece of paper called change
L       http://reuters.com/article/politicsnews/idusn2944356420080901?pagenumber=1&virtualbrandchannel=10112
L       http://cbn.com/cbnnews/436448.aspx
L       http://voices.washingtonpost.com/thefix/2008/09/bristol_palin_is_pregnant.html?hpid=topnews

Copy after login

P denotes the URL of the document.
T represents the time of the post.
Q is a quote found in the post.
L is a link that exists in the post.

This was not an ideal schema for MongoDB. With the use of inputMongo.py, the above input was converted into documents resembling the following:

{
    "_id" : ObjectId("51c8a3f200a7f40aae706e86"),
    "url" : "http://blogs.abcnews.com/politicalpunch/2008/09/obama-says-mc-1.html",
    "quotes" : [
        "that's not change",
                "you know you can put lipstick on a pig", 
                "what's the difference between a hockey mom and a pit bull lipstick", 
                "you can wrap an old fish in a piece of paper called change"
    ],
    "links" : [
        "http://reuters.com/article/politicsnews/idusn2944356420080901?pagenumber=1&virtualbrandchannel=10112", 
                "http://cbn.com/cbnnews/436448.aspx", 
                "http://voices.washingtonpost.com/thefix/2008/09/bristol_palin_is_pregnant.html?hpid=topnews"
    ],
    "time" : ISODate("2008-09-09T22:35:24Z")
}

Copy after login

This resulted in 15,312,738 documents. We also utilized bulk insertion instead of individual document insertions. It took about 1 hour and 48 minutes to insert all of these documents into a collection in the database.

Notice that we still generated an ObjectId as _id. We’d later realize that this is superfluous as the url is unique per document.

Preparing the Dataset for PageRank

Theoretically, for PageRank to produce more accurate and reflective results, there must be no dead ends and the graph must be strongly connected (every node must be able to travel to any other node in the graph and back). Dead ends are nodes in the graph that have incoming links but no outgoing links. The presence of dead ends in a dataset leaks the PageRank in the graph so that the sum of PageRank of nodes in the graph will slowly converge to zero.

There are 2 ways to fix this problem:

Use a taxation parameter, also called the random surfer, to pass a portion of PageRank from every node to every other node. This is a fix more for a graph that is not strongly connected than for graphs that have dead ends.
Recursively remove dead ends.

We decided to use method (2) first because the resulting graph will be much cleaner and will theoretically give more reasonable results. To make the graph even more strongly connected, we also used a taxation parameter after erasing all dead ends.

Removing dead ends from the dataset proved to more involved than we had initially thought. Here’s what we tried:

Attempt #1: Patience, Young Grasshoppers

The first solution created new collections of nodes that were not dead ends in the last collection. If the size of the links array is zero for a doc/node, then do not add this doc to the next iteration. Else, iterate through each link in the links array for each doc. Remove each link which points to a doc that does not exist in the last iteration collection. This is only bounded by the 16MB document limit.

Although we had an index on the url field, after 5 hours of running the script, we realized that the projected elapsed time would be over 20 days. This prompted us to explore other options.

Attempt #2: Further Optimizing URL search

We originally thought the reason for this lag was due to the script searching for a link in the entire collection, although we did have an index on it already. Instead, we created a more optimized solution to create another collection titled char8 which has as its _id, the first 8 characters of the url. Then in each char8 doc, there is a url array that starts with those 8 characters. So instead of searching for link in each url in the original memes collection, we’d only search through the links of the char8 collection (which is indexed by the first 8 characters).

This was now projected to finish in 5 days, still too inefficient.

Attempt #3: Migrating to Hadoop/Pig

However, we were undeterred; we still wanted to compute PageRank over the entire dataset with no dead ends. The single script approach was extremely inefficient and slow. It wasn’t fit for BIG DATA. Instead, we turned to Hadoop and Amazon Elastic MapReduce.

Hadoop is an open-source MapReduce framework that handles intensive data computations and often operates over large clusters, although it could also be used locally. It supports many languages and tools to ease usage. Amazon Elastic MapReduce is a service provided by Amazon that hosts the Hadoop framework on its EC2 instances that would be created solely for each MapReduce task and connects to S3 Simple Storage for data input and output. Mongo-Hadoop was the necessary library which would allow us to connect MongoDB as the input/output source for these MapReduce jobs.

We submitted jobs to Elastic MapReduce through both the UI and the command line tool elastic-mapreduce, with our access ID and secret key.

First, we wrote a Pig script, explained below, to eliminate dead ends.

-- Registers the mongo and hadoop jars from our bucket
REGISTER s3://memes-bson/mongo-2.10.1.jar
REGISTER s3://memes-bson/mongo-hadoop-pig_1.1.2-1.1.0.jar
REGISTER s3://memes-bson/mongo-hadoop-core_1.1.2-1.1.0.jar
REGISTER s3://memes-bson/mongo-hadoop_1.1.2-1.1.0.jar
-- There's a User Defined Function in myudf.MYBAG which needs to be used
REGISTER s3://memes-bson/myudf.jar
original =  LOAD "$INPUT"
            USING com.mongodb.hadoop.pig.BSONLoader;
outs =      FOREACH original GENERATE $0#"_id" AS url, $0#"value"#"links" AS links;
-- Dead ends will not appear in showing as links is empty. url is the origin of the link while 
-- single is the destination. myudf.MYBAG turns the tuple of links to a bag of tuple of each link.
showing =   FOREACH outs GENERATE url, FLATTEN(myudf.MYBAG(links)) AS single;
-- JOIN will eliminate links that go to a single which doesn't exist in our dataset
joined =    JOIN outs BY url, showing BY single;
project =   FOREACH joined GENERATE showing::url AS url, showing::single AS single;
-- Group together the urls such that they form the original scheme of a url and an array of links 
together =  GROUP project BY url;
result =    FOREACH together GENERATE $0 AS url, $1.single AS links;
STORE result INTO "$OUTPUT"
    USING com.mongodb.hadoop.pig.BSONStorage;

Copy after login

The Pig script above removed all dead ends from the current dataset. The problem with this, is that removing dead ends could create new ones. For example, in the simple graph of

A -> B -> C -> D

D is the only dead end in the graph. But when D is removed, we have

A -> B -> C

So we would keep on removing the new dead ends until there were no new dead ends. In this particular “pathological” graph, the entire graph would be removed because all of the nodes in the graph would eventually be dead ends. Fortunately, most datasets aren’t linear graphs.

We had to figure out a way to repeatedly run the Pig script above until there are no more dead ends. Elastic MapReduce only allows for a single Pig script execution, so we wrote Bash script removeAllDeadEnds.sh that kept running the above Pig script until the output filesize stopped decreasing. The script utilized s3cmd to check file size on S3 and elastic-mapreduce-ruby to submit jobs. In theory, the output filesize will decrease if and only if some dead ends have been removed.

After 10 iterations or so, the script would only erase two or three dead ends in each iteration. This continued until we stopped the script at 70 iterations, which took over 9 hours. With 8 normalized instance hours on m1.xlarge, the iterative jobs finished, on average, in 8 minutes. The initial BSON size of the dataset with dead ends was 606.9MB and the final BSON size with only 2 dead ends was 448.2MB. We decided that the result of running more iterations would only be trivial, and thus we could simply move ahead. We ended up with 1,113,524 total nodes.

PageRank in Elastic MapReduce on the dataset

At this point, we had two collections:

NO_DEAD_ENDS which had 1,113,524 nodes, 2 of which were dead ends.
ORIGINAL which had 36,814,086 nodes, around 75% of which were dead ends.

Whereas we were able to run MongoDB’s inherent MapReduce for the flights dataset to quickly converge, the sheer size of these collections drove us to use Amazon’s Elastic MapReduce to compute the PageRank of the nodes in the graph.

NO_DEAD_ENDS

Preformat

First, we had to preformat the graph to suit the PageRank program. This involved changing the schema, using a Hadoop Job written in Java, to:

{
    "_id" : "001rian.blogspot.com/2008_07_01_archive.html",
    "ptr" : 0.2,
    "pg" : 8.98049795e-7,
    "links" : [
        "001rian.blogspot.com/2008/07/sanra-dewi-ikutan-bisnis-interne.html",
        "001rian.blogspot.com/2008/07/peduli-dengan-pagerank-dan-traffic.html",
        "001rian.blogspot.com/2008/08/88808-08-2008.html",
        "001rian.blogspot.com/2008/07/jadwal-puasa-dan-imsak-ramadhan-1429-h.html",
        "001rian.blogspot.com/2008/07/buku-elektronik-untuk-sekolah.html"
    ]
}

Copy after login

"pg" : 6.081907480807019e-8 was the initial PageRank of all nodes in a graph, which corresponds to the reciprocal of the total number of nodes in the graph.
ptr was the probability of this node going to any other node it links to, which corresponds to the reciprocal of the length of the links array.

PageRank

The structure of the Elastic MapReduce Java program for PageRank was similar to the MongoDB MapReduce program written for the Flights dataset.

However, instead of setting all of the arguments for the Hadoop job in the Bash script, as each iteration (one Map and one Reduce) was a job, the jobs were continuously created and their different variables needed to be set dynamically. For example, the output of the last job was set as the input of the next job for the number iteration.

last = iteration - 1;
FileInputFormat.addInputPath(job, new Path(outputString + last));
// mapred.output.dir is the current configuration variable for the output path in Mongo-Hadoop. 
job.getConfiguration().set("mapred.output.dir", outputString + iteration);
```
Again, the stopping criterion for PageRank is when the average percentage change of a node, the `residual` or the `diff`, drops below 0.1%. Instead of outputting the individual node diffs in Reduce and aggregating over the entire output to sum up the diff as in the MongoDB MapReduce for Flights, we used the Hadoop counter to sum the residuals per Reduce call. The Hadoop counter is an atomic variable that is accessible by all Mappers and Reducers and it will track any statistic or variable to be read after the job is completed.

Copy after login

// Need this 10E6 because long only takes whole numbers context.getCounter(PageRank.RanCounters.RESIDUAL).increment((long) (residual*10E6)); “`

Therefore, after each job was completed, we viewed the total residual to determine whether it was under the threshold. Our stopping criteria is again when the residual converged to .001 * n where n was the number of elements. In this case, PageRank finished after 16 total iterations in 2 hours and 30 minutes with 7 m1.xlarge instances.

ORIGINAL

Another implementation of the PageRank algorithm didn’t delete all dead ends from the graph, but instead connected all dead ends to every other node in the graph. This created a strongly connected graph but it also increased the noise in the graph. Since the resulting number of nodes after erasing dead ends is only 1 million out of the original 15,312,738, we wanted to see how the PageRank results would change if all of the dead ends were included.

Notice that the original collection from the text file only has 15,312,738 nodes, whereas we accounted for 36,814,086 nodes in the ORIGINAL collection. The extra 21,501,348 nodes are links in the original 15,312,738 nodes but were not documents in the imported collection. Rather than decrease the graph, as with erasing dead ends, making the graph strong connected increased the size of the graph with 21,501,348 extra dead end nodes.

However, there’s no need to actually create edges between all dead end nodes and those that aren’t dead ends (with 27 million dead ends, that would create 27 million * 36 million = 972 billion links). Instead, we simply distributed the summation of the PageRank from all dead ends to every other node. Here are our implementation ideas:

The first (implemented) idea was to add all of the PageRank from nodes with dead ends in the Mapper, and then distribute this PageRank among all nodes in the Reducer when summing up the incoming PageRank; however, this was not feasible as Hadoop counters accessed in the Mapper would be zero in the Reducer. Mappers and Reducers executed simultaneously so the counter values were only finalized after the job was done.
To solve (1) we waited until the end of a job to determine the PageRank of the dead ends. Then the final PageRank and residual was calculated in the Mapper of the next iteration.

The main function, which submits jobs and waits for completion, retrieves the job’s total PageRank from dead ends and passes it as a configuration variable to the next job.

long prevDeadEndsPG = prevJob.getCounters().findCounter(PageRank.RanCounters.DEAD_END_PG).getValue();
currentJob.getConfiguration().setLong("deadEndsPG", prevDeadEndsPG);

Copy after login

Then in the Mapper step, we add this deadEndsPG divided by the total number of nodes (the probability of any dead end jumping to this node). We compute the residual using the previous PageRank value, and add to the counter for residuals. This way, the final PageRank value of anode for that iteration is determined in the Mapper instead of the Reducer.

long deadEndsPG = 0;
// fetch the dead ends PG from the configuration file
context.getConfiguration().getLong("deadEndsPG", deadEndsPG);
// 10E10 for numbers large enough to keep as long
double doubleDeadEndsPG = (double deadEndsPG) / 10E10;
double distributeDeadEndsPG = ((double) deadEndsPG) / (PageRank.totalNodes); 
double beta = 0.9;
currentPG = PageRank.beta * (currentPG + distributeDeadEndsPG) + PageRank.distributedBeta;
double residual = residual = Math.abs(prevpg - currentPG) / prevpg;
context.getCounter(PageRank.RanCounters.RESIDUAL).increment((long) (residual * 10E6));

Copy after login

This original PageRank took 17 iterations in the span of 7 hours and 12 minutes with 8 m1.xlarge instances.

RESULTS

The following is an interpretation of the results obtained after running the PageRank algorithm over the 2 collections above.

NO_DEAD_ENDS

For the dataset with no dead ends, the 10 web pages with the most PageRank are:

1. {"pg":0.020741859913578454
   ,"url":"http://slideshare.net/youtube-in-slideshare"
   ,"quotes":["insert and publish","before slide 1","after slide 1","after slide 2","after slide 3"]}
2. {"pg":0.01490199050574318
   ,"url":"http://slideshare.com/youtube-in-slideshare"
   ,"quotes":["insert and publish","before slide 1","after slide 1","after slide 2","after slide 3"]}
3. {"pg":0.00542114032291505
   ,"url":"http://london.kijiji.ca/f-buy-and-sell-w0qqcatidz10"
   ,"quotes":[]}
4. {"pg":0.005381224128322537
   ,"url":"http://badgerandblade.com/index.php?page=terms"
   ,"quotes":["badger and blade","b amp b"]}
5. {"pg":0.00328534940037117
   ,"url":"http://saintjohn.kijiji.ca/f-buy-and-sell-w0qqcatidz10"
   ,"quotes":[]}
6. {"pg":0.00301961022829243
   ,"url":"http://london.kijiji.ca/c-buy-and-sell-business-industrial-salon-equipment-w0qqadidz115815478"
   ,"quotes":[]}
7. {"pg":0.0028168240288373365
   ,"url":"http://dealsofamerica.com/terms.php"
   ,"quotes":[]}
8. {"pg":0.0025406641926389753
   ,"url":"http://london.kijiji.ca/c-buy-and-sell-cds-dvds-vhs-house-seasons-1-4-w0qqadidz123632361"
   ,"quotes":[]}
9. {"pg":0.0024984791525017504
   ,"url":"http://answerbag.com/c_view/3602"
   ,"quotes":[]}
10. {"pg":0.0021795435717848356
    ,"url":"http://chacha.com/entertainment-arts/music"
    ,"quotes":["up where they play all day in the sun","stay all day in the sun","stay or leave i want you not to go but you did","sometimes goodbye is a second chance"]}

Copy after login

It’s not surprising that http://slideshare.net/youtube-in-slideshare and http://slideshare.com/youtube-in-slideshare have the most PageRank. Around the beginning of 2009, SlideShare released a new feature to enable users to embed youtube videos in their presentations. At the time, this feature was in Beta. FAQs, examples, and other details were posted on both http://www.slideshare.net/youtube-in-slideshare and http://www.slideshare.com/youtube-in-slideshare. Since this was a new feature (that a lot of people were excited about!), lots of reputable bloggers posted a link to these FAQ pages to showcase the power of the new SlideShare feature. As a result, these FAQ pages accumulated most of the PageRank.

Another interesting observation here is that there are 4 web pages with the most PageRank from kijiji, a centralized network of online urban communities for posting local online classified advertisements. The reason kijiji accumulated a lot of PageRank is that there’s a lot of intra-domain (but inter sub-domain) linking. For example, lots of pages on london.kijiji.ca link to vacation.kijiji.ca which links back to london.kijiji.ca. Such intra-domain linking creates a web structure called a spider trap that accumulates a large portion of the PageRank available to the whole system of pages. Furthermore, about 5% fo the entire NO_DEAD_ENDS contains kijiji web pages.

ORIGINAL

1. {"_id" : "craigslist.org/about/scams.html"
   , "pg" : 0.018243114523103326
   , "links" : []}
2. {"_id" : "slideshare.net/youtube-in-slideshare"
   , "pg" : 0.003038463243965542
   , "links" : ["slideshare.net/youtube-in-slideshare"]
   , "quotes" : ["insert and publish","before slide 1","after slide 1","after slide 2","after slide 3"]}
3. {"_id" : "slideshare.com/youtube-in-slideshare"
   , "pg" : 0.002161141838313388
   , "links" : ["slideshare.com/youtube-in-slideshare"]
   , "quotes" : ["insert and publish","before slide 1","after slide 1","after slide 2","after slide 3"]}
4. {"_id" : "ad.doubleclick.net/clk;212944738;23941492;n?goto.canon-asia.com/oic"
   , "pg" : 0.0015214745797758247
   , "links" : []
   , "quotes" : []}
5. {"_id" : "mx.answers.yahoo.com/info/disclaimer"
   , "pg" : 0.0013631525163117727
   , "links" : []
   , "quotes" : []}
6. {"_id" : "ar.answers.yahoo.com/info/disclaimer"
   , "pg" : 0.0013542983079855681
   , "links" : []
   , "quotes" : []}
7. {"_id" : "it.answers.yahoo.com/info/disclaimer"
   , "pg" : 0.0011670409020562926
   , "links" : []
   , "quotes" : []}
8. {"_id" : "fr.answers.yahoo.com/info/disclaimer"
   , "pg" : 0.001083113456512683
   , "links" : []
   , "quotes" : []}
9. {"_id" : "seroundtable.com"
   , "pg" : 0.0009033963316740201
   , "links" : []
   , "quotes" : []}
10. {"_id" : "de.answers.yahoo.com/info/disclaimer"
    , "pg" : 0.0006914069352292967
    , "links" : []
    , "quotes" : []}

Copy after login

These results differed significantly from the results of the NO_DEAD_ENDS but for good reasons.

The PageRank for these web pages are significantly lower than the PageRank of the pages in ORIGINAL because there’s 30x more pages in this graph than NO_DEAD_ENDS.
The PageRank of the Craigslist scams information page is 6x as much as the next highest PageRank value. This is because every Craigslist page links to the scam page, creating a spider trap. Similarly, the Yahoo! Answers disclaimer pages accumulated a lot of PageRank because most Yahoo! forum pages link to one of the disclaimer pages.
The curious entry above is the ad.doubleclick.net URL. This URL is no longer available. But since doubleclick was an online ad platform, it’s possible that this particular ad was either served the most or gathered the most attention.

SubDomains

Looking at subdomains, we see a lot of entries for websites that have no URLs in the top 10 list. Sites like Twitter, Blogspot, and English Wikipedia are ranked among the top 25. It’s reasonable to assume that links to those websites aren’t purely spam or disclaimers. The D3 bubble chart below comprises the subdomains with the top 25 PageRanks.

Twitter Memes Dataset Overview with PageRank

Domains

By summing up the PageRank by domains instead of URLs, Yahoo! surpasses SlideShare. This makes sense as Yahoo! is quite spread out among its subdomains.

Twitter Memes Dataset Overview with PageRank

Lessons Learned

Bulk Insertion Speeds

The Flights dataset was inserted a single document at a time, as we didn’t know about bulk insertions yet. However, pymongo allows for bulk insertions. This is a much preferred and faster method for creating large collections. We utilized bulk insertions on the Twitter Memes dataset with a default batch size of 1000 docs. Here is the amount of time it takes for inputMongo.py to completely finish inserting the original 15,312,736 docs into Twitter Memes:

Bulk Insert Array Size	Elapsed Time in Seconds	Elapsed Time (hr:min:sec)
1	15007.014883	4:10:7
10	7781.75180	2:09:41
100	7332.346791	1:52:12
1000	6493.35044885	1:48:13
10000	Error: Size too big for insert

The elapsed time for inputting the data dropped significantly between single insertions and insertions of size 10 each, but interestingly, the speed tapered off as the size of the insertions increased. The maxMessageSizeByte value is 48 million bytes, and a bulk insertion of 10,000 exceeded this limit. This occurred in the Python driver, but some of the other drivers will split the array into 16MB chunks, which would avoid this error.

PageRank != Relevance

The results above show that the pages with the most PageRank are often the disclaimers and information pages, which probably isn’t what interests people most of the time. It turns out that Google’s Search algorithm is more complex than our simplistic version of PageRank. For instance, Google takes into account which links people click on for a certain query, thus boosting those URLs’ relevance for that query. Google also filters out some spam links and ad sites. In addition, some websites made with many internal links to intentionally boost their PageRank will have their PageRank values docked as part of Google’s campaign to disincentivize the intentional manipulation of PageRank.

Indexing URLs

Since each document has an unique URL, we created a unique index on this field. Fortunately, all the URLs in this dataset are well within the 1024 bytes limit for index entries. An alternative and more optimal way to index URLs is to use hashing. In addition, MongoDB supports hashed indices.

原文地址：Twitter Memes Dataset Overview with PageRank, 感谢原作者分享。

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks ago By DDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks ago By DDD

Where to find the Crane Control Keycard in Atomfall

3 weeks ago By DDD

Assassin's Creed Shadows - How To Find The Blacksmith And Unlock Weapon And Armour Customisation

1 months ago By DDD

Roblox: Dead Rails - How To Complete Every Challenge

3 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7612

CakePHP Tutorial

1387

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

136

Related knowledge

What are the blockchain data analysis tools? Feb 21, 2025 pm 10:24 PM

The rapid development of blockchain technology has brought about the need for reliable and efficient analytical tools. These tools are essential to extract valuable insights from blockchain transactions in order to better understand and capitalize on their potential. This article will explore some of the leading blockchain data analysis tools on the market, including their capabilities, advantages and limitations. By understanding these tools, users can gain the necessary insights to maximize the possibilities of blockchain technology.

PHP crawler practice: crawling data on Twitter Jun 13, 2023 pm 01:17 PM

In the digital age, social media has become an indispensable part of people's lives. Twitter is one of them, with hundreds of millions of users sharing various information on it every day. For some research, analysis, promotion and other needs, it is very necessary to obtain relevant data on Twitter. This article will introduce how to use PHP to write a simple Twitter crawler to crawl some keyword-related data and store it in the database. 1. TwitterAPI provided by Twitter

Laravel development: How to implement third-party login using Laravel Socialite and Twitter? Jun 15, 2023 pm 06:18 PM

With the development of the Internet, third-party login has become an indispensable part of many websites and applications. LaravelSocialite is a very popular social login extension in the Laravel framework, which can easily enable login to social media platforms such as Facebook, Twitter, Google, and GitHub. In this article, we will introduce how to use LaravelSocialite and Twitter to implement third-party

March 8th #AppleEvent special hashtag now live on Twitter Apr 13, 2023 pm 01:10 PM

Apple today announced its first special event of 2022 on March 8, with the tagline "Peek performance." Now, when you post content using the official #AppleEvent tag, the company has added a new tag icon on Twitter. A hashflag is the icon that appears next to some special hashtags on Twitter. This time, the hashflag shows the Apple logo as well as the colors used in the event invitation. It's worth noting that Apple has used this feature several times in the past, such as at a special event in September 2021 when the company launched the iPhone 13 and

Where is the official entrance to DeepSeek? Latest visit guide in 2025 Feb 19, 2025 pm 05:03 PM

DeepSeek, a comprehensive search engine that provides a wide range of results from academic databases, news websites and social media. Visit DeepSeek's official website https://www.deepseek.com/, register an account and log in, and then you can start searching. Use specific keywords, precise phrases, or advanced search options to narrow your search and get the most relevant results.

gateio official website entrance Mar 05, 2025 pm 08:09 PM

The official Gate.io website is accessible through the official application. Fake websites may contain misspelled, design differences, or suspicious security certificates. Protections include avoiding clicking on suspicious links, using two-factor authentication, and reporting fraudulent activity to the official team. Frequently asked questions cover registration, transactions, withdrawals, customer service and fees, while security measures include cold storage, multi-signatures, and KYC compliance. Users should be aware of common fraudulent means of impersonating employees, giving tokens, or asking for personal information.

Bitget Exchange official website login latest entrance Feb 18, 2025 pm 02:54 PM

The Bitget Exchange offers a variety of login methods, including email, mobile phone number and social media accounts. This article details the latest entrances and steps for each login method, including accessing the official website, selecting the login method, entering the login credentials, and completing the login. Users should pay attention to using the official website when logging in and properly keep the login credentials.

How much is the price of MRI coins? The latest price trend of MRI coin Mar 03, 2025 pm 11:48 PM

This cryptocurrency does not really have monetary value, and its value depends entirely on community support. Investors must carefully investigate before investing, because it lacks practical uses and attractive token economic models. Since the token was issued last month, investors can currently only purchase through decentralized exchanges. The real-time price of MRI coin is $0.000045≈¥0.00033MRI coin historical price As of 13:51 on February 24, 2025, the price of MRI coin is $0.000045. The following figure shows the price trend of the token from February 2022 to June 2024. MRI Coin Investment Risk Assessment Currently, MRI Coin has not been listed on any exchange and its price has been reset to zero and cannot be purchased again. Even if the project

See all articles