Elasticsearch is an advanced, high-performance, scalable open source search engine that provides full-text search and real-time analysis of structured and unstructured data.
Its feature is that it can use RESTful API through HTTP, which can be easily integrated into the existing web architecture. Therefore, in the case of high concurrency, we can use nginx reverse proxy to load balance to multiple Elasticsearch servers.Architecture diagram:
So what are the benefits of using nginx?
1. Record the log of each API access request. (ElasticSearch itself does not support this function, only slowLog and service logs)
2. Supports a large number of client connections. The official ES blog recommends using keep-alives and using long connections between nginx and ES. My understanding is that under normal circumstances, ES is the bottom layer of the architecture, and access to it is usually fixed upper-layer services. This situation is suitable for using keep-alive. (In fact, nginx can support a larger number of client connections regardless of whether keep-alive is used or not)
3. Load-balanced requests to the Elasticsearch server.
4. Cache data to reduce the need to request the Elasticsearch server again for the same content.
5. Provide active health detection (only nginx plus), constantly detect whether the back-end Elasticsearch server is normal, and actively switch. (When an ES hangs up, nginx does not distribute requests to this node. When the node returns to normal, it will automatically return to its original location)
6. Report rich monitoring indicators (only nginx plus), providing monitoring and management .
7. Security verification. Only clients with account names and passwords are allowed to access the ES cluster.
8. Restrict access to special interfaces such as "_shutdown". (This function is quite practical)
9. Access control with roles (for example, the user role has data access rights, and the admin role has cluster management and control rights)
====I am the dividing line of the configuration example====
A simple nginx configuration is as follows:
upstream elasticsearch_servers { zone elasticsearch_servers 64K; server 192.168.187.132:9200; server 192.168.187.133:9200; keepalive 40 ; } match statusok { status 200; header Content-Type ~ "application/json"; body ~ '"status" : 200'; } server { listen 9200; status_zone elasticsearch; location / { proxy_pass http://elasticsearch_servers; proxy_http_version 1.1; proxy_set_header Connection ""; proxy_cache elasticsearch; proxy_cache_valid 200 302 10m; proxy_cache_valid 404 1m; proxy_connect_timeout 5s; proxy_read_timeout 10s; proxy_set_header Connection "Keep-Alive"; proxy_set_header Proxy-Connection "Keep-Alive"; health_check interval=5s fails=1 passes=1 uri=/ match=statusok; } # redirect server error pages to the static page /50x.html error_page 500 502 503 504 /50x.html; location = /50x.html { root /usr/share/nginx/html; } access_log logs/es_access.log combined; } server { listen 8080; root /usr/share/nginx/html; location / { index status.html; } location =/status { status; } }
====I am the dividing line of the security verification configuration====
A configuration with security verification is as follows:
events { worker_connections 1024; } http { upstream elasticsearch { server 127.0.0.1:9200; } server { listen 8080; auth_basic "Protected Elasticsearch"; auth_basic_user_file passwords; location / { proxy_pass http://elasticsearch; proxy_redirect off; } } }
$ printf "john:$(openssl passwd -crypt s3cr3t)n" > passwords
$ curl -i localhost:8080 # HTTP/1.1 401 Unauthorized # ...
$ curl -i john:s3cr3t@localhost:8080 # HTTP/1.1 200 OK # ...
====I am the dividing line of access restriction configuration====
location / { if ($request_filename ~ _shutdown) { return 403; break; } proxy_pass http://elasticsearch; proxy_redirect off; }
After doing this configuration, direct access to _shutdown will be denied:
$ curl -i -X POST john:s3cr3t@localhost:8080/_cluster/nodes/_shutdown # HTTP/1.1 403 Forbidden # ....
For my current project, upper-layer applications only need access Data in ES, so API interfaces such as cluster and node should deny access to upper-layer applications. At the same time, -DELETE on resources that should not be deleted should also be prohibited. This is a security guarantee for the ES cluster, otherwise the cluster configuration can be easily modified or a large amount of data can be deleted.
====I am the dividing line for multi-role configuration====
events { worker_connections 1024; } http { upstream elasticsearch { server 127.0.0.1:9200; } # Allow access to /_search and /_analyze for authenticated "users" # server { listen 8081; auth_basic "Elasticsearch Users"; auth_basic_user_file users; location / { return 403; } location ~* ^(/_search|/_analyze) { proxy_pass http://elasticsearch; proxy_redirect off; } } # Allow access to anything for authenticated "admins" # server { listen 8082; auth_basic "Elasticsearch Admins"; auth_basic_user_file admins; location / { proxy_pass http://elasticsearch; proxy_redirect off; } } }
The cost of multi-role access restrictions is that each role uses a different port number to access the cluster, which is architecturally reasonable - a client only needs to have one role and correspond to one access port.
Using lua can carry out more detailed URL permission control. nginx also supports the embedding of lua very well and concisely. We will not go into more in-depth exploration here. If you are interested, you can find out.
Reference documents:
http://www.ttlsa.com/nginx/nginx-elasticsearch/
https://www.elastic.co/blog/playing-http-tricks-nginx
Copyright Statement : This article is an original article by the blogger and may not be reproduced without the blogger's permission.
The above introduces ElasticSearch: What benefits can Nginx bring to the ElasticSearch cluster? , including relevant content, I hope it will be helpful to friends who are interested in PHP tutorials.