Flask を使用して ES 検索エンジンを構築する方法を段階的に説明します (実践)-Python チュートリアル-php.cn

Flask を使用して ES 検索エンジンを構築する方法を段階的に説明します (実践)

Go语言进阶学习

リリース： 2023-07-25 17:24:52

転載

1317 人が閲覧しました

Flask の使用を開始しますビルド ES 検索。

Flask を使用して ES 検索エンジンを構築する方法を段階的に説明します (実践)

##構成ファイル

#Config.py

#coding:utf-8
import os
DB_USERNAME = &#39;root&#39;
DB_PASSWORD = None # 如果没有密码的话
DB_HOST = &#39;127.0.0.1&#39;
DB_PORT = &#39;3306&#39;
DB_NAME = &#39;flask_es&#39;

class Config:
    SECRET_KEY ="随机字符" # 随机 SECRET_KEY
    SQLALCHEMY_COMMIT_ON_TEARDOWN = True # 自动提交
    SQLALCHEMY_TRACK_MODIFICATIONS = True # 自动sql
    DEBUG = True # debug模式
    SQLALCHEMY_DATABASE_URI = &#39;mysql+pymysql://%s:%s@%s:%s/%s&#39; % (DB_USERNAME, DB_PASSWORD,DB_HOST, DB_PORT, DB_NAME) #数据库URL

    MAIL_SERVER = &#39;smtp.qq.com&#39;
    MAIL_POST = 465
    MAIL_USERNAME = &#39;3417947630@qq.com&#39;
    MAIL_PASSWORD = &#39;邮箱授权码&#39;
    FLASK_MAIL_SUBJECT_PREFIX=&#39;M_KEPLER&#39;
    FLASK_MAIL_SENDER=MAIL_USERNAME # 默认发送人
    # MAIL_USE_SSL = True
    MAIL_USE_TLS = False
    MAIL_DEBUG = False
    ENABLE_THREADS=True

ログイン後にコピー

これは比較的単純な

Flask Config

ファイルです。もちろん、現在のプロジェクトにはデータベース接続は必要ありません。補助的な目的で

Mysql

を使用しているだけです。パートナーは、接続データベース、ES で十分です。電子メール通知は個人のニーズに応じて異なります...

##2

#ログ

Logger.pyログモジュールはエンジニアリングアプリケーションの重要な部分であり、さまざまな運用環境に応じてログファイルを出力することが非常に必要です。江湖の格言を使用すると、「ログファイルがなければ、死ぬ方法を知らずに死ぬことになります...」

# coding=utf-8
import os
import logging
import logging.config as log_conf
import datetime
import coloredlogs

coloredlogs.DEFAULT_FIELD_STYLES = {&#39;asctime&#39;: {&#39;color&#39;: &#39;green&#39;}, &#39;hostname&#39;: {&#39;color&#39;: &#39;magenta&#39;}, &#39;levelname&#39;: {&#39;color&#39;: &#39;magenta&#39;, &#39;bold&#39;: False}, &#39;name&#39;: {&#39;color&#39;: &#39;green&#39;}}

log_dir = os.path.dirname(os.path.dirname(__file__)) + &#39;/logs&#39;
if not os.path.exists(log_dir):
    os.mkdir(log_dir)
today = datetime.datetime.now().strftime("%Y-%m-%d")

log_path = os.path.join(log_dir, today + ".log")

log_config = {
    &#39;version&#39;: 1.0,

    # 格式输出
    &#39;formatters&#39;: {
        &#39;colored_console&#39;: {
                        &#39;format&#39;: "%(asctime)s - %(name)s - %(levelname)s - %(message)s",
                        &#39;datefmt&#39;: &#39;%H:%M:%S&#39;
        },
        &#39;detail&#39;: {
            &#39;format&#39;: &#39;%(asctime)s - %(name)s - %(levelname)s - %(message)s&#39;,
            &#39;datefmt&#39;: "%Y-%m-%d %H:%M:%S"  #时间格式
        },
    },

    &#39;handlers&#39;: {
        &#39;console&#39;: {
            &#39;class&#39;: &#39;logging.StreamHandler&#39;, 
            &#39;level&#39;: &#39;DEBUG&#39;,
            &#39;formatter&#39;: &#39;colored_console&#39;
        },
        &#39;file&#39;: {
            &#39;class&#39;: &#39;logging.handlers.RotatingFileHandler&#39;,  
            &#39;maxBytes&#39;: 1024 * 1024 * 1024,  
            &#39;backupCount&#39;: 1, 
            &#39;filename&#39;: log_path, 
            &#39;level&#39;: &#39;INFO&#39;,  
            &#39;formatter&#39;: &#39;detail&#39;,  # 
            &#39;encoding&#39;: &#39;utf-8&#39;,  # utf8 编码  防止出现编码错误
        },
    },

    &#39;loggers&#39;: {
        &#39;logger&#39;: {
            &#39;handlers&#39;: [&#39;console&#39;],  
            &#39;level&#39;: &#39;DEBUG&#39;, 
        },

    }
}

log_conf.dictConfig(log_config)
log_v = logging.getLogger(&#39;log&#39;)

coloredlogs.install(level=&#39;DEBUG&#39;, logger=log_v)


# # Some examples.
# logger.debug("this is a debugging message")
# logger.info("this is an informational message")
# logger.warning("this is a warning message")
# logger.error("this is an error message")
# logger.critical("this is a critical message")

ログイン後にコピー

これは、私がよく使用するもののコピーです。 log 設定ファイルは、一般的に使用されるログ形式として使用できます。直接呼び出して、さまざまなレベルに応じてターミナルまたは .log

ファイルに出力できます。感謝せずに持ち帰ることができます。

路由

对于 Flask 项目而言，蓝图和路由会让整个项目更具观赏性(当然指的是代码的阅读)。

这里我采用两个分支来作为数据支撑，一个是 Math 入口，另一个是 Baike 入口，数据的来源是基于上一篇的百度百科爬虫所得，根据深度优先的爬取方式抓取后放入 ES 中。

# coding:utf8
from flask import Flask
from flask_sqlalchemy import SQLAlchemy
from app.config.config import Config
from flask_mail import Mail
from flask_wtf.csrf import CSRFProtect

app = Flask(__name__,template_folder=&#39;templates&#39;,static_folder=&#39;static&#39;)
app.config.from_object(Config)

db = SQLAlchemy(app)
db.init_app(app)

csrf = CSRFProtect(app)
mail = Mail(app)
# 不要在生成db之前导入注册蓝图。
from app.home.baike import baike as baike_blueprint
from app.home.math import math as math_blueprint
from app.home.home import home as home_blueprint

app.register_blueprint(home_blueprint)
app.register_blueprint(math_blueprint,url_prefix="/math")
app.register_blueprint(baike_blueprint,url_prefix="/baike")

ログイン後にコピー

# -*- coding:utf-8 -*-
from flask import Blueprint
baike = Blueprint("baike", __name__)

from app.home.baike import views

ログイン後にコピー

# -*- coding:utf-8 -*-
from flask import Blueprint
math = Blueprint("math", __name__)

from app.home.math import views

ログイン後にコピー

声明路由并在 __init__ 文件中初始化

下面来看看路由的实现(以Baike为例)

# -*- coding:utf-8 -*-
import os
from flask_paginate import Pagination, get_page_parameter
from app.Logger.logger import log_v
from app.elasticsearchClass import elasticSearch

from app.home.forms import SearchForm

from app.home.baike import baike
from flask import request, jsonify, render_template, redirect

baike_es = elasticSearch(index_type="baike_data",index_name="baike")

@baike.route("/")
def index():
    searchForm = SearchForm()
    return render_template(&#39;baike/index.html&#39;, searchForm=searchForm)

@baike.route("/search", methods=[&#39;GET&#39;, &#39;POST&#39;])
def baikeSearch():
    search_key = request.args.get("b", default=None)
    if search_key:
        searchForm = SearchForm()
        log_v.error("[+] Search Keyword: " + search_key)
        match_data = baike_es.search(search_key,count=30)

        # 翻页
        PER_PAGE = 10
        page = request.args.get(get_page_parameter(), type=int, default=1)
        start = (page - 1) * PER_PAGE
        end = start + PER_PAGE
        total = 30
        print("最大数据总量:", total)
        pagination = Pagination(page=page, start=start, end=end, total=total)
        context = {
            &#39;match_data&#39;: match_data["hits"]["hits"][start:end],
            &#39;pagination&#39;: pagination,
            &#39;uid_link&#39;: "/baike/"
        }
        return render_template(&#39;data.html&#39;, q=search_key, searchForm=searchForm, **context)
    return redirect(&#39;home.index&#39;)


@baike.route(&#39;/<uid>&#39;)
def baikeSd(uid):
    base_path = os.path.abspath(&#39;app/templates/s_d/&#39;)
    old_file = os.listdir(base_path)[0]
    old_path = os.path.join(base_path, old_file)
    file_path = os.path.abspath(&#39;app/templates/s_d/{}.html&#39;.format(uid))
    if not os.path.exists(file_path):
        log_v.debug("[-] File does not exist, renaming !!!")
        os.rename(old_path, file_path)
    match_data = baike_es.id_get_doc(uid=uid)
    return render_template(&#39;s_d/{}.html&#39;.format(uid), match_data=match_data)

ログイン後にコピー

可以看到我们成功的将 elasticSearch 类初始化并且进行了数据搜索。

我们使用了 Flask 的分页插件进行分页并进行了单页数量的限制，根据 Uid 来跳转到详情页中。

细心的小伙伴会发现我这里用了个小技巧

@baike.route(&#39;/<uid>&#39;)
def baikeSd(uid):
    base_path = os.path.abspath(&#39;app/templates/s_d/&#39;)
    old_file = os.listdir(base_path)[0]
    old_path = os.path.join(base_path, old_file)
    file_path = os.path.abspath(&#39;app/templates/s_d/{}.html&#39;.format(uid))
    if not os.path.exists(file_path):
        log_v.debug("[-] File does not exist, renaming !!!")
        os.rename(old_path, file_path)
    match_data = baike_es.id_get_doc(uid=uid)
    return render_template(&#39;s_d/{}.html&#39;.format(uid), match_data=match_data)

ログイン後にコピー

以此来保证存放详情页面的模板中始终只保留一个 html 文件。

项目启动

一如既往的采用 flask_script 作为项目的启动方案，确实方便。

# coding:utf8
from app import app
from flask_script import Manager, Server

manage = Manager(app)

# 启动命令
manage.add_command("runserver", Server(use_debugger=True))


if __name__ == "__main__":
    manage.run()

ログイン後にコピー

黑窗口键入

python manage.py runserver

ログイン後にコピー

就可以启动项目，默认端口 5000，访问 http://127.0.0.1:5000

Flask を使用して ES 検索エンジンを構築する方法を段階的に説明します (実践)

使用gunicorn启动

pip install gunicorn

ログイン後にコピー

#encoding:utf-8
import multiprocessing

from gevent import monkey
monkey.patch_all()

# 并行工作进程数
workers = multiprocessing.cpu_count() * 2 + 1

debug = True

reload = True # 自动重新加载

loglevel = &#39;debug&#39;

# 指定每个工作者的线程数
threads = 2

# 转发为监听端口8000
bind = &#39;0.0.0.0:5001&#39;

# 设置守护进程,将进程交给supervisor管理
daemon = &#39;false&#39;

# 工作模式协程
worker_class = &#39;gevent&#39;

# 设置最大并发量
worker_connections = 2000

# 设置进程文件目录
pidfile = &#39;log/gunicorn.pid&#39;
logfile = &#39;log/debug.log&#39;

# 设置访问日志和错误信息日志路径
accesslog = &#39;log/gunicorn_acess.log&#39;
errorlog = &#39;log/gunicorn_error.log&#39;

ログイン後にコピー

利用配置文件来启动 gunicorn 服务器