Spark as a Service之JobServer初测-Mysql Tutorial-php.cn

Table of Contents

特性

安装并启动jobServer

测试job执行

预先启动Context

配置文件

Home

Database

Mysql Tutorial

Spark as a Service之JobServer初测

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 07, 2016 pm 04:39 PM

service spa spark

spark-jobserver提供了一个用于提交和管理Apache Spark作业(job)、jar文件和作业上下文（SparkContext）的RESTful接口。该项目位于git（https://github.com/ooyala/spark-jobserver），当前为0.4版本。特性 Spark as a Service: 简单的面向job和context管理

spark-jobserver提供了一个用于提交和管理Apache Spark作业(job)、jar文件和作业上下文（SparkContext）的RESTful接口。该项目位于git（https://github.com/ooyala/spark-jobserver），当前为0.4版本。

特性

“Spark as a Service”: 简单的面向job和context管理的REST接口
通过长期运行的job context支持亚秒级低延时作业(job)
可以通过结束context来停止运行的作业(job)
分割jar上传步骤以提高job的启动
异步和同步的job API，其中同步API对低延时作业非常有效
支持Standalone Spark和Mesos
Job和jar信息通过一个可插拔的DAO接口来持久化
命名RDD以缓存，并可以通过该名称获取RDD。这样可以提高作业间RDD的共享和重用

安装并启动jobServer

jobServer依赖sbt，所以必须先装好sbt。

rpm -ivh https://dl.bintray.com/sbt/rpm/sbt-0.13.6.rpm
yum install git
# 下面clone这个项目
SHELL$ git clone https://github.com/ooyala/spark-jobserver.git
# 在项目根目录下，进入sbt  
SHELL$ sbt
......
[info] Set current project to spark-jobserver-master (in build file:/D:/Projects
/spark-jobserver-master/)
>
#在本地启动jobServer（开发者模式）
>re-start --- -Xmx4g
......
#此时会下载spark-core，jetty和liftweb等相关模块。
job-server Starting spark.jobserver.JobServer.main()
[success] Total time: 545 s, completed 2014-10-21 19:19:48

Copy after login

然后访问http://localhost:8090 可以看到Web UI
job

测试job执行

这里我们直接使用job-server的test包进行测试

SHELL$ sbt job-server-tests/package
......
[info] Compiling 5 Scala sources to /root/spark-jobserver/job-server-tests/target/classes...
[info] Packaging /root/spark-jobserver/job-server-tests/target/job-server-tests-0.4.0.jar ...
[info] Done packaging.

Copy after login

编译完成后，将打包的jar文件通过REST接口上传
REST接口的API如下：
GET /jobs 查询所有job
POST /jobs 提交一个新job
GET /jobs/<jobid></jobid> 查询某一任务的结果和状态
GET /jobs/<jobid>/config</jobid>

SHELL$ curl --data-binary @job-server-tests/target/job-server-tests-0.4.0.jar localhost:8090/jars/test
OK
# 查看提交的jar
SHELL$ curl localhost:8090/jars/
{
  "test": "2014-10-22T15:15:04.826+08:00"
}
# 提交job
提交的appName为test，class为spark.jobserver.WordCountExample
SHELL$  curl -d "input.string = hello job server" 'localhost:8090/jobs?appName=test&classPath=spark.jobserver.WordCountExample'
{
  "status": "STARTED",
  "result": {
    "jobId": "34ce0666-0148-46f7-8bcf-a7a19b5608b2",
    "context": "eba36388-spark.jobserver.WordCountExample"
  }
}
# 通过job-id查看结果和配置信息
SHELL$ curl localhost:8090/jobs/34ce0666-0148-46f7-8bcf-a7a19b5608b2
{
  "status": "OK",
  "result": {
    "job": 1,
    "hello": 1,
    "server": 1
  }
SHELL$ curl localhost:8090/jobs/34ce0666-0148-46f7-8bcf-a7a19b5608b2/config
{
    "input" : {
        "string" : "hello job server"
}
# 提交一个同步的job，当执行命令后，terminal会hang住直到任务执行完毕。
SHELL$ curl -d "input.string = hello job server" 'localhost:8090/jobs?appName=test&classPath=spark.jobserver.WordCountExample'&sync=true
{
  "status": "OK",
  "result": {
    "job": 1,
    "hello": 1,
    "server": 1
  }

Copy after login

在Web UI上也可以看到Completed Jobs相应的信息。

预先启动Context

和Context相关的API
GET /contexts ?查询所有预先建立好的context
POST /contexts ?建立新的context
DELETE ?/contexts/<name></name> ?删除此context，停止运行于此context上的所有job

SHELL$ curl -d "" 'localhost:8090/contexts/test-context?num-cpu-cores=4&mem-per-node=512m'
OK
# 查看现有的context
curl localhost:8090/contexts
["test-context", "feceedc3-spark.jobserver.WordCountExample"]
接下来在这个context上执行job
curl -d "input.string = a b c a b see" 'localhost:8090/jobs?appName=test&classPath=spark.jobserver.WordCountExample&context=test-context&sync=true'
{
  "status": "OK",
  "result": {
    "a": 2,
    "b": 2,
    "c": 1,
    "see": 1
  }

Copy after login

配置文件

打开配置文件，可以发现master设置为local[4],可以将其改为我们的集群地址。

vim spark-jobserver/config/local.conf.template
master = "local[4]"

Copy after login

此外，关于数据对象的存储方法和路径：

jobdao = spark.jobserver.io.JobFileDAO
    filedao {
      rootdir = /tmp/spark-job-server/filedao/data
    }

Copy after login

默认context设置，该设置可以被
下面再次在sbt中启动REST接口的中的参数覆盖。

# universal context configuration.  These settings can be overridden, see README.md
  context-settings {
    num-cpu-cores = 2           # Number of cores to allocate.  Required.
    memory-per-node = 512m         # Executor memory per node, -Xmx style eg 512m, #1G, etc.
    # in case spark distribution should be accessed from HDFS (as opposed to being installed on every mesos slave)
    # spark.executor.uri = "hdfs://namenode:8020/apps/spark/spark.tgz"
    # uris of jars to be loaded into the classpath for this context
    # dependent-jar-uris = ["file:///some/path/present/in/each/mesos/slave/somepackage.jar"]
  }

Copy after login

基本的使用到此为止，jobServer的部署和项目使用将之后介绍。顺便期待下一个版本SQL Window的功能。

原文地址：Spark as a Service之JobServer初测, 感谢原作者分享。

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks ago By DDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

WWE 2K25: How To Unlock Everything In MyRise

4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7469

CakePHP Tutorial

1376

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

Related knowledge

Laravel development: How to implement SPA and API authentication using Laravel Sanctum? Jun 13, 2023 pm 12:36 PM

LaravelSanctum is a lightweight authentication package that allows you to easily implement API authentication and SPA (Single Page Application) authentication in Laravel applications. In this article, we will explore how to use LaravelSanctum to implement SPA and API authentication. First, let's look at what SPA and API authentication are. SPA authentication refers to a single page application that does not reload the entire page but uses AJAX to request information from the web server to

Building an SPA example using Python and React Jun 17, 2023 pm 12:38 PM

With the continuous development of Internet technology, more and more websites are beginning to adopt the SPA (SinglePageApplication) architecture. SPA refers to presenting all or most of the content through one page and dynamically updating the page content through the client, rather than using the traditional multi-page method. In this article, we will use Python and React to build a simple SPA example to demonstrate the basic idea and implementation method of SPA. 1. Environment setup Before starting to build, we

Ten commonly used libraries for AI algorithms Java version Jun 13, 2023 pm 04:33 PM

ChatGPT has been popular for more than half a year this year, and its popularity has not dropped at all. Deep learning and NLP have also returned to everyone's attention. Some friends in the company are asking me, as a Java developer, how to get started with artificial intelligence. It is time to take out the hidden Java library for learning AI and introduce it to everyone. These libraries and frameworks provide a wide range of tools and algorithms for machine learning, deep learning, natural language processing, and more. Depending on the specific needs of your AI project, you can choose the most appropriate library or framework and start experimenting with different algorithms to build your AI solution. 1.Deeplearning4j It is an open source distributed deep learning library for Java and Scala. Deeplearning

SPA example built with Django and Vue.js Jun 18, 2023 pm 07:27 PM

In recent years, SPA (SinglePageApplication) has become a popular model for web development. Compared with traditional multi-page applications, SPA is faster and smoother, and it is also more friendly and convenient for developers. This article will share an SPA example built based on Django and Vue.js, hoping to provide you with some reference and inspiration. Django is a well-known Python Web framework with powerful back-end development capabilities. Vue.js rules

Use Spark in Go language to achieve efficient data processing Jun 16, 2023 am 08:30 AM

With the advent of the big data era, data processing has become increasingly important. For various data processing tasks, different technologies have emerged. Among them, Spark, as a technology suitable for large-scale data processing, has been widely used in various fields. In addition, Go language, as an efficient programming language, has also received more and more attention in recent years. In this article, we will explore how to use Spark in Go language to achieve efficient data processing. We will first introduce some basic concepts and principles of Spark

Explore the application of Java in the field of big data: understanding of Hadoop, Spark, Kafka and other technology stacks Dec 26, 2023 pm 02:57 PM

Java big data technology stack: Understand the application of Java in the field of big data, such as Hadoop, Spark, Kafka, etc. As the amount of data continues to increase, big data technology has become a hot topic in today's Internet era. In the field of big data, we often hear the names of Hadoop, Spark, Kafka and other technologies. These technologies play a vital role, and Java, as a widely used programming language, also plays a huge role in the field of big data. This article will focus on the application of Java in large

How to solve the problem that Linux service cannot use system environment variables May 16, 2023 pm 07:28 PM

Linuxservice cannot use system environment variables. Detailed description: When doing a MySQL multi-instance installation before, the installation was successful and Linux could be started successfully; however, support-files/mysqld_multi.server was moved to the /etc/init.d/ directory and set to boot. Startup (chkconfigxxxon) failed; problem exploration found the problem and started to solve it. It was found that servicexxxstart could not start multiple instances of mysql, but mysqld_multistart could be used; then after various attempts, it was found that the /etc/profile settings could be printed out in a normal environment. m

Getting Started with PHP: PHP and Spark May 20, 2023 am 08:41 AM

PHP is a very popular server-side programming language because it is easy to learn, open source, and cross-platform. Currently, many large companies use PHP language to build applications, such as Facebook and WordPress. Spark is a fast and lightweight development framework for building web applications. It is based on Java Virtual Machine (JVM) and works with PHP. This article will introduce how to build web applications using PHP and Spark. What is PHP? PH

See all articles