Spark as a Service之JobServer初测
spark-jobserver提供了一个用于提交和管理Apache Spark作业(job)、jar文件和作业上下文(SparkContext)的RESTful接口。该项目位于git(https://github.com/ooyala/spark-jobserver),当前为0.4版本。 特性 Spark as a Service: 简单的面向job和context管理
spark-jobserver提供了一个用于提交和管理Apache Spark作业(job)、jar文件和作业上下文(SparkContext)的RESTful接口。该项目位于git(https://github.com/ooyala/spark-jobserver),当前为0.4版本。
特性
“Spark as a Service”: 简单的面向job和context管理的REST接口
通过长期运行的job context支持亚秒级低延时作业(job)
可以通过结束context来停止运行的作业(job)
分割jar上传步骤以提高job的启动
异步和同步的job API,其中同步API对低延时作业非常有效
支持Standalone Spark和Mesos
Job和jar信息通过一个可插拔的DAO接口来持久化
命名RDD以缓存,并可以通过该名称获取RDD。这样可以提高作业间RDD的共享和重用
安装并启动jobServer
jobServer依赖sbt,所以必须先装好sbt。
rpm -ivh https://dl.bintray.com/sbt/rpm/sbt-0.13.6.rpm yum install git # 下面clone这个项目 SHELL$ git clone https://github.com/ooyala/spark-jobserver.git # 在项目根目录下,进入sbt SHELL$ sbt ...... [info] Set current project to spark-jobserver-master (in build file:/D:/Projects /spark-jobserver-master/) > #在本地启动jobServer(开发者模式) >re-start --- -Xmx4g ...... #此时会下载spark-core,jetty和liftweb等相关模块。 job-server Starting spark.jobserver.JobServer.main() [success] Total time: 545 s, completed 2014-10-21 19:19:48
然后访问http://localhost:8090 可以看到Web UI
?
测试job执行
这里我们直接使用job-server的test包进行测试
SHELL$ sbt job-server-tests/package ...... [info] Compiling 5 Scala sources to /root/spark-jobserver/job-server-tests/target/classes... [info] Packaging /root/spark-jobserver/job-server-tests/target/job-server-tests-0.4.0.jar ... [info] Done packaging.
编译完成后,将打包的jar文件通过REST接口上传
REST接口的API如下:
GET /jobs
查询所有job
POST /jobs
提交一个新job
GET /jobs/<jobid></jobid>
查询某一任务的结果和状态
GET /jobs/<jobid>/config</jobid>
SHELL$ curl --data-binary @job-server-tests/target/job-server-tests-0.4.0.jar localhost:8090/jars/test OK # 查看提交的jar SHELL$ curl localhost:8090/jars/ { "test": "2014-10-22T15:15:04.826+08:00" } # 提交job 提交的appName为test,class为spark.jobserver.WordCountExample SHELL$ curl -d "input.string = hello job server" 'localhost:8090/jobs?appName=test&classPath=spark.jobserver.WordCountExample' { "status": "STARTED", "result": { "jobId": "34ce0666-0148-46f7-8bcf-a7a19b5608b2", "context": "eba36388-spark.jobserver.WordCountExample" } } # 通过job-id查看结果和配置信息 SHELL$ curl localhost:8090/jobs/34ce0666-0148-46f7-8bcf-a7a19b5608b2 { "status": "OK", "result": { "job": 1, "hello": 1, "server": 1 } SHELL$ curl localhost:8090/jobs/34ce0666-0148-46f7-8bcf-a7a19b5608b2/config { "input" : { "string" : "hello job server" } # 提交一个同步的job,当执行命令后,terminal会hang住直到任务执行完毕。 SHELL$ curl -d "input.string = hello job server" 'localhost:8090/jobs?appName=test&classPath=spark.jobserver.WordCountExample'&sync=true { "status": "OK", "result": { "job": 1, "hello": 1, "server": 1 }
在Web UI上也可以看到Completed Jobs相应的信息。
预先启动Context
和Context相关的API
GET /contexts
?查询所有预先建立好的context
POST /contexts
?建立新的context
DELETE ?/contexts/<name></name>
?删除此context,停止运行于此context上的所有job
SHELL$ curl -d "" 'localhost:8090/contexts/test-context?num-cpu-cores=4&mem-per-node=512m' OK # 查看现有的context curl localhost:8090/contexts ["test-context", "feceedc3-spark.jobserver.WordCountExample"] 接下来在这个context上执行job curl -d "input.string = a b c a b see" 'localhost:8090/jobs?appName=test&classPath=spark.jobserver.WordCountExample&context=test-context&sync=true' { "status": "OK", "result": { "a": 2, "b": 2, "c": 1, "see": 1 }
配置文件
打开配置文件,可以发现master设置为local[4],可以将其改为我们的集群地址。
vim spark-jobserver/config/local.conf.template master = "local[4]"
此外,关于数据对象的存储方法和路径:
jobdao = spark.jobserver.io.JobFileDAO filedao { rootdir = /tmp/spark-job-server/filedao/data }
默认context设置,该设置可以被
下面再次在sbt中启动REST接口的中的参数覆盖。
# universal context configuration. These settings can be overridden, see README.md context-settings { num-cpu-cores = 2 # Number of cores to allocate. Required. memory-per-node = 512m # Executor memory per node, -Xmx style eg 512m, #1G, etc. # in case spark distribution should be accessed from HDFS (as opposed to being installed on every mesos slave) # spark.executor.uri = "hdfs://namenode:8020/apps/spark/spark.tgz" # uris of jars to be loaded into the classpath for this context # dependent-jar-uris = ["file:///some/path/present/in/each/mesos/slave/somepackage.jar"] }
基本的使用到此为止,jobServer的部署和项目使用将之后介绍。顺便期待下一个版本SQL Window的功能。
^^
原文地址:Spark as a Service之JobServer初测, 感谢原作者分享。

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



LaravelSanctum is a lightweight authentication package that allows you to easily implement API authentication and SPA (Single Page Application) authentication in Laravel applications. In this article, we will explore how to use LaravelSanctum to implement SPA and API authentication. First, let's look at what SPA and API authentication are. SPA authentication refers to a single page application that does not reload the entire page but uses AJAX to request information from the web server to

With the continuous development of Internet technology, more and more websites are beginning to adopt the SPA (SinglePageApplication) architecture. SPA refers to presenting all or most of the content through one page and dynamically updating the page content through the client, rather than using the traditional multi-page method. In this article, we will use Python and React to build a simple SPA example to demonstrate the basic idea and implementation method of SPA. 1. Environment setup Before starting to build, we

ChatGPT has been popular for more than half a year this year, and its popularity has not dropped at all. Deep learning and NLP have also returned to everyone's attention. Some friends in the company are asking me, as a Java developer, how to get started with artificial intelligence. It is time to take out the hidden Java library for learning AI and introduce it to everyone. These libraries and frameworks provide a wide range of tools and algorithms for machine learning, deep learning, natural language processing, and more. Depending on the specific needs of your AI project, you can choose the most appropriate library or framework and start experimenting with different algorithms to build your AI solution. 1.Deeplearning4j It is an open source distributed deep learning library for Java and Scala. Deeplearning

In recent years, SPA (SinglePageApplication) has become a popular model for web development. Compared with traditional multi-page applications, SPA is faster and smoother, and it is also more friendly and convenient for developers. This article will share an SPA example built based on Django and Vue.js, hoping to provide you with some reference and inspiration. Django is a well-known Python Web framework with powerful back-end development capabilities. Vue.js rules

With the advent of the big data era, data processing has become increasingly important. For various data processing tasks, different technologies have emerged. Among them, Spark, as a technology suitable for large-scale data processing, has been widely used in various fields. In addition, Go language, as an efficient programming language, has also received more and more attention in recent years. In this article, we will explore how to use Spark in Go language to achieve efficient data processing. We will first introduce some basic concepts and principles of Spark

Java big data technology stack: Understand the application of Java in the field of big data, such as Hadoop, Spark, Kafka, etc. As the amount of data continues to increase, big data technology has become a hot topic in today's Internet era. In the field of big data, we often hear the names of Hadoop, Spark, Kafka and other technologies. These technologies play a vital role, and Java, as a widely used programming language, also plays a huge role in the field of big data. This article will focus on the application of Java in large

Linuxservice cannot use system environment variables. Detailed description: When doing a MySQL multi-instance installation before, the installation was successful and Linux could be started successfully; however, support-files/mysqld_multi.server was moved to the /etc/init.d/ directory and set to boot. Startup (chkconfigxxxon) failed; problem exploration found the problem and started to solve it. It was found that servicexxxstart could not start multiple instances of mysql, but mysqld_multistart could be used; then after various attempts, it was found that the /etc/profile settings could be printed out in a normal environment. m

PHP is a very popular server-side programming language because it is easy to learn, open source, and cross-platform. Currently, many large companies use PHP language to build applications, such as Facebook and WordPress. Spark is a fast and lightweight development framework for building web applications. It is based on Java Virtual Machine (JVM) and works with PHP. This article will introduce how to build web applications using PHP and Spark. What is PHP? PH
