MongoDB and Hadoop: A Step-by Step Tutorial Using-MySQL 튜토리얼-php.cn

Script 1 - Characterize Collection

Script 2 - MongoDB Schema Generator

Script 3 – Twitter Hourly Coffee Tweets

Next Steps

집

데이터 베이스

MySQL 튜토리얼

MongoDB and Hadoop: A Step-by Step Tutorial Using

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 07, 2016 pm 04:29 PM

and hadoop mongodb

The following is a guest post from Jeremy Karn. This article is excerpted from MongoDB + Hadoop: A Step-by-Step Tutorial. Jeremy is a cofounder at Mortar Data, a Hadoop-as-a-service provider, and creator of mortar, an open source framework

The following is a guest post from Jeremy Karn. This article is excerpted from ‘MongoDB + Hadoop: A Step-by-Step Tutorial’. Jeremy is a cofounder at Mortar Data, a Hadoop-as-a-service provider, and creator of mortar, an open source framework for data processing.

People who are worried about scalability often find themselves looking at two tools: MongoDB for storing large amounts of data easily and Hadoop for processing that data. But a common question is: “How do I combine these two to really get the most out of my data?”

Here’s a step-by-step tutorial that will get you up and running with MongoDB and Hadoop in a matter of minutes. And the best part about this tutorial is that at the end you’ll be ready to jump right into using your own MongoDB data with Hadoop.

For this tutorial you’ll be using Apache Pig, a high-level data flow language that compiles down into Hadoop MapReduce jobs. It was designed to be easy to learn and simple to write. If you’ve written SQL, Pig will feel familiar, it is like procedural SQL.

To run your Hadoop jobs, you’re going to use a free Mortar account. Mortar provides Hadoop as a service, which means you can run your jobs without worrying about how to set up and manage a multi-node Hadoop cluster.

To get started, we’ve already set up a small MongoDB instance on MongoLab, populated it with a random sampling of Twitter data from a single day (around 120,000 tweets), and created a read-only user for you.

We’ve also set up a public Github repo with a Mortar project that has three Pig scripts ready to run. Here’s what you need to do:

If you don’t already have a free Github account - create one.? You’ll need a github username in step 4.

Sign into (or create) your free Mortar account.
After you receive the confirmation email, log into Mortar at https://app.mortardata.com.
Install?the Mortar Development Framework:?
```
gem install mortar
```
로그인 후 복사
Clone the example git project and register it as a mortar project:?
```
git clone git@github.com:mortardata/mongo-pig-examples.git
```
로그인 후 복사
```
cd mongo-pig-examples
```
로그인 후 복사
```
mortar register mongo-pig-examples
```
로그인 후 복사

Script 1 - Characterize Collection

If you’re like most MongoDB users, you may not have a great sense of the different fields, data types, or values in your collection. We built characterize_collection.pig to deeply inspect your collection to extract that information.

From the base directory of the mongo-pig-examples project you just cloned take a look at pigscripts/characterize_collection.pig. It loads all the data in the collection as a map, sends the map to Python (udfs/python/mongo_util.py) to gather a bunch of metadata, calculates some basic information about the collection, and then it writes the results out to an S3 bucket.

To see this script in action let’s run it on a 4 node Hadoop cluster. In your terminal (from the base directory of your mongo-pig-examples project) run:

mortar run characterize_collection --clustersize 4

로그인 후 복사

This job will take about 10 minutes to finish. You can monitor the job’s status on the command line or by going to https://app.mortardata.com/jobs?

Once the job has finished, you’ll receive an email with a link to your job results. Clicking on this link will bring you into the Mortar web app, where you can download the results from s3. The output is described at the top of the characterize_collection script but as an example you can scroll down the output and find:

…
user.is_translator	2	false	unicode	118806
user.is_translator	2	true	unicode	31
user.lang	26	en	unicode	114108
user.lang	26	es	unicode	3462
user.lang	26	fr	unicode	532
user.lang	26	pt	unicode	281
user.lang	26	ja	unicode	79
user.listed_count	398	0	int	73757
user.listed_count	398	1	int	18518

로그인 후 복사

Looking at the values for user.lang - we see that there are 26 unique values for the field in our dataset. The most common was “en” with 114108 occurrences, the next most common was “es” with 3462 occurrences, and so on. To see the full results without running the job you can view the output file here.

Script 2 - MongoDB Schema Generator

It can be tricky to properly declare MongoDB’s highly nested schemas in Pig. Now, Pig is graceful—it can roll without a schema, or with inconsistent, or incorrect schemas. But it’s easier to read and write your Pig code if you have a schema because it allows you (and the Pig optimizer) to focus on just the relevant data.

So this next script automatically generates a Pig schema by examining your MongoDB collection. If you don’t need the whole schema, you can easily edit it to keep just the fields you want.

Running this script is similar to running the previous one. If you ran the Characterize Collection script in the past hour, the same cluster you used for that job should still be running. In that case, you can just run:

mortar run mongo_schema_generator

로그인 후 복사

If you don’t have a cluster that’s still running, just run the job on a new 4 node cluster like this:

mortar run mongo_schema_generator --clustersize 4

로그인 후 복사

Script 3 – Twitter Hourly Coffee Tweets

Using a Twitter coffee tweets script (pigscripts/hourly_coffee_tweets.pig), we’re going to demonstrate how we can use a small subset of the fields in our MongoDB collection. For our example, we’ll look at how often the word “coffee” is tweeted throughout the day. As with the Mongo Schema Generator script, you can run this job on an existing cluster or start up a new one.

Next Steps

If you already have a mongo instance/cluster based in US-East EC2, the first two example scripts should run on one of your collections with only minor modifications. You’ll just need to:

Update the MongoLoader connection strings in the pig scripts to connect to your MongoDB collections with one of your own users. If your mongo instance is on a non-standard port (any port other than 27017), just email us at support@mortardata.com to allow your Mortar account to access that port.
If you’d like your jobs to write to one of your own S3 buckets, you can update the AWS keys associated with your Mortar account by following these instructions to enable s3 access.
If you run out of free cluster hours with Mortar, you can upgrade your account to get additional free hours each month.
You can find more resources for learning Pig here
If you have any questions or feedback, please contact us at support@mortardata.com or ping us on in-app chat at app.mortardata.com

原文地址：MongoDB and Hadoop: A Step-by Step Tutorial Using , 感谢原作者分享。

본 웹사이트의 성명

본 글의 내용은 네티즌들의 자발적인 기여로 작성되었으며, 저작권은 원저작자에게 있습니다. 본 사이트는 이에 상응하는 법적 책임을 지지 않습니다. 표절이나 침해가 의심되는 콘텐츠를 발견한 경우 admin@php.cn으로 문의하세요.

핫 AI 도구

Undresser.AI Undress

사실적인 누드 사진을 만들기 위한 AI 기반 앱

AI Clothes Remover

사진에서 옷을 제거하는 온라인 AI 도구입니다.

Undress AI Tool

무료로 이미지를 벗다

Clothoff.io

AI 옷 제거제

Video Face Swap

완전히 무료인 AI 얼굴 교환 도구를 사용하여 모든 비디오의 얼굴을 쉽게 바꾸세요!

뜨거운 도구

메모장++7.3.1

사용하기 쉬운 무료 코드 편집기

SublimeText3 중국어 버전

중국어 버전, 사용하기 매우 쉽습니다.

스튜디오 13.0.1 보내기

강력한 PHP 통합 개발 환경

드림위버 CS6

시각적 웹 개발 도구

SublimeText3 Mac 버전

신 수준의 코드 편집 소프트웨어(SublimeText3)

뜨거운 주제

자바 튜토리얼

1663

Cakephp 튜토리얼

1420

라라벨 튜토리얼

1313

PHP 튜토리얼

1266

C# 튜토리얼

1238

Related knowledge

Composer를 사용하여 권장 시스템의 딜레마를 해결하십시오 : Andres-Montanez/권장 사항-펀들 Apr 18, 2025 am 11:48 AM

전자 상거래 웹 사이트를 개발할 때 어려운 문제가 발생했습니다. 사용자에게 개인화 된 제품 권장 사항을 제공하는 방법. 처음에는 간단한 권장 알고리즘을 시도했지만 결과는 이상적이지 않았으며 사용자 만족도에도 영향을 미쳤습니다. 추천 시스템의 정확성과 효율성을 향상시키기 위해보다 전문적인 솔루션을 채택하기로 결정했습니다. 마지막으로 Composer를 통해 Andres-Montanez/Residations-Bundle을 설치하여 문제를 해결했을뿐만 아니라 추천 시스템의 성능을 크게 향상 시켰습니다. 다음 주소를 통해 작곡가를 배울 수 있습니다.

데비안에서 MongoDB 자동 확장을 구성하는 방법 Apr 02, 2025 am 07:36 AM

이 기사는 데비안 시스템에서 MongoDB를 구성하여 자동 확장을 달성하는 방법을 소개합니다. 주요 단계에는 MongoDB 복제 세트 및 디스크 공간 모니터링 설정이 포함됩니다. 1. MongoDB 설치 먼저 MongoDB가 데비안 시스템에 설치되어 있는지 확인하십시오. 다음 명령을 사용하여 설치하십시오. sudoaptupdatesudoaptinstall-imongb-org 2. MongoDB Replica 세트 MongoDB Replica 세트 구성은 자동 용량 확장을 달성하기위한 기초 인 고 가용성 및 데이터 중복성을 보장합니다. MongoDB 서비스 시작 : sudosystemctlstartMongodsudosys

데비안에서 MongoDB의 고 가용성을 보장하는 방법 Apr 02, 2025 am 07:21 AM

이 기사는 데비안 시스템에서 고도로 사용 가능한 MongoDB 데이터베이스를 구축하는 방법에 대해 설명합니다. 우리는 데이터 보안 및 서비스가 계속 운영되도록하는 여러 가지 방법을 모색 할 것입니다. 주요 전략 : ReplicaSet : ReplicaSet : 복제품을 사용하여 데이터 중복성 및 자동 장애 조치를 달성합니다. 마스터 노드가 실패하면 복제 세트는 서비스의 지속적인 가용성을 보장하기 위해 새 마스터 노드를 자동으로 선택합니다. 데이터 백업 및 복구 : MongoDump 명령을 정기적으로 사용하여 데이터베이스를 백업하고 데이터 손실의 위험을 처리하기 위해 효과적인 복구 전략을 공식화합니다. 모니터링 및 경보 : 모니터링 도구 (예 : Prometheus, Grafana) 배포 MongoDB의 실행 상태를 실시간으로 모니터링하고

MongoDB 데이터베이스 비밀번호를 보는 Navicat의 방법 Apr 08, 2025 pm 09:39 PM

해시 값으로 저장되기 때문에 MongoDB 비밀번호를 Navicat을 통해 직접 보는 것은 불가능합니다. 분실 된 비밀번호 검색 방법 : 1. 비밀번호 재설정; 2. 구성 파일 확인 (해시 값이 포함될 수 있음); 3. 코드를 점검하십시오 (암호 하드 코드 메일).

Centos Mongodb 백업 전략은 무엇입니까? Apr 14, 2025 pm 04:51 PM

CentOS 시스템 하에서 MongoDB 효율적인 백업 전략에 대한 자세한 설명이 기사는 CentOS 시스템에서 MongoDB 백업을 구현하기위한 다양한 전략을 자세히 소개하여 데이터 보안 및 비즈니스 연속성을 보장 할 것입니다. Docker 컨테이너 환경에서 수동 백업, 시간이 정해진 백업, 자동 스크립트 백업 및 백업 메소드를 다루고 백업 파일 관리를위한 모범 사례를 제공합니다. 수동 백업 : MongoDump 명령을 사용하여 Manual 전체 백업을 수행하십시오 (예 : Mongodump-HlocalHost : 27017-U username-P password-d 데이터베이스 이름 -o/백업 디렉토리이 명령은 지정된 데이터베이스의 데이터 및 메타 데이터를 지정된 백업 디렉토리로 내보내게됩니다.

Debian MongoDB에서 데이터를 암호화하는 방법 Apr 12, 2025 pm 08:03 PM

데비안 시스템에서 MongoDB 데이터베이스를 암호화하려면 다음 단계에 따라 필요합니다. 1 단계 : 먼저 MongoDB 설치 먼저 Debian 시스템이 MongoDB가 설치되어 있는지 확인하십시오. 그렇지 않은 경우 설치를위한 공식 MongoDB 문서를 참조하십시오 : https://docs.mongodb.com/manual/tutorial/install-mongodb-ondodb-on-debian/step 2 : 암호화 키 파일 생성 암호화 키를 포함하는 파일을 만듭니다.

Centos에서 Gitlab 용 데이터베이스를 선택하는 방법 Apr 14, 2025 pm 04:48 PM

CentOS 시스템의 GitLab 데이터베이스 배포 안내서 올바른 데이터베이스를 선택하는 것은 GitLab을 성공적으로 배포하는 데 중요한 단계입니다. Gitlab은 MySQL, PostgreSQL 및 MongoDB를 포함한 다양한 데이터베이스와 호환됩니다. 이 기사는 이러한 데이터베이스를 선택하고 구성하는 방법을 자세히 설명합니다. 데이터베이스 선택 권장 사항 MySQL : 널리 사용되는 RDBMS (Relational Database Management System). PostgreSQL : 강력한 오픈 소스 RDBM은 복잡한 쿼리 및 고급 기능을 지원하며 대형 데이터 세트를 처리하는 데 적합합니다. MongoDB : 인기있는 NOSQL 데이터베이스, 바다 취급에 능숙합니다

MongoDB에서 사용자를 설정하는 방법 Apr 12, 2025 am 08:51 AM

MongoDB 사용자를 설정하려면 다음 단계를 따르십시오. 1. 서버에 연결하고 관리자 사용자를 만듭니다. 2. 사용자에게 액세스 권한을 부여 할 데이터베이스를 작성하십시오. 3. CreateUser 명령을 사용하여 사용자를 생성하고 자신의 역할 및 데이터베이스 액세스 권한을 지정하십시오. 4. GetUsers 명령을 사용하여 생성 된 사용자를 확인하십시오. 5. 선택적으로 다른 컬렉션에 대한 다른 권한을 설정하거나 사용자 권한을 부여합니다.

See all articles

MongoDB and Hadoop: A Step-by Step Tutorial Using

Script 1 - Characterize Collection

Script 2 - MongoDB Schema Generator

Script 3 – Twitter Hourly Coffee Tweets

Next Steps

핫 AI 도구

Undresser.AI Undress

AI Clothes Remover

Undress AI Tool

Clothoff.io

Video Face Swap

인기 기사

뜨거운 도구

메모장++7.3.1

SublimeText3 중국어 버전

스튜디오 13.0.1 보내기

드림위버 CS6

SublimeText3 Mac 버전

뜨거운 주제