DocuTranslator,一个文档翻译系统,内置于 AWS 中,由 Streamlit 应用程序框架开发。该应用程序允许最终用户将文档翻译成他们想要上传的首选语言。它提供了根据用户需要翻译成多种语言的可行性,这确实帮助用户以舒适的方式理解内容。
这个项目的目的是提供一个用户友好、简单的应用程序界面,以完成用户期望的简单翻译过程。在此系统中,没有人需要通过进入 AWS Translate 服务来翻译文档,最终用户可以直接访问应用程序端点并满足要求。
以上架构显示了以下要点 -
在这里,我们使用 EFS 共享路径在两个底层 EC2 实例之间共享相同的应用程序文件。我们在 EC2 实例内创建了一个挂载点 /streamlit_appfiles 并使用 EFS 共享挂载。这种方法将有助于在两个不同的服务器之间共享相同的内容。之后,我们的目的是创建一个复制相同的应用程序内容到容器工作目录 /streamlit。为此,我们使用了绑定挂载,以便对 EC2 级别的应用程序代码进行的任何更改也将被复制到容器。我们需要限制双向复制,这意味着如果任何人错误地从容器内部更改代码,它不应该复制到 EC2 主机级别,因此容器内部工作目录已创建为只读文件系统。
底层 EC2 配置:
实例类型:t2.medium
网络类型:私有子网
容器配置:
图片:
网络模式:默认
主机端口:16347
集装箱港口:8501
任务CPU:2个vCPU(2048个)
任务内存:2.5 GB (2560 MiB)
音量配置:
卷名称:streamlit-volume
源路径:/streamlit_appfiles
容器路径:/streamlit
只读文件系统:是
任务定义参考:
{ "taskDefinitionArn": "arn:aws:ecs:us-east-1:<account-id>:task-definition/Streamlit_TDF-1:5", "containerDefinitions": [ { "name": "streamlit", "image": "<account-id>.dkr.ecr.us-east-1.amazonaws.com/anirban:latest", "cpu": 0, "portMappings": [ { "name": "streamlit-8501-tcp", "containerPort": 8501, "hostPort": 16347, "protocol": "tcp", "appProtocol": "http" } ], "essential": true, "environment": [], "environmentFiles": [], "mountPoints": [ { "sourceVolume": "streamlit-volume", "containerPath": "/streamlit", "readOnly": true } ], "volumesFrom": [], "ulimits": [], "logConfiguration": { "logDriver": "awslogs", "options": { "awslogs-group": "/ecs/Streamlit_TDF-1", "mode": "non-blocking", "awslogs-create-group": "true", "max-buffer-size": "25m", "awslogs-region": "us-east-1", "awslogs-stream-prefix": "ecs" }, "secretOptions": [] }, "systemControls": [] } ], "family": "Streamlit_TDF-1", "taskRoleArn": "arn:aws:iam::<account-id>:role/ecsTaskExecutionRole", "executionRoleArn": "arn:aws:iam::<account-id>:role/ecsTaskExecutionRole", "revision": 5, "volumes": [ { "name": "streamlit-volume", "host": { "sourcePath": "/streamlit_appfiles" } } ], "status": "ACTIVE", "requiresAttributes": [ { "name": "com.amazonaws.ecs.capability.logging-driver.awslogs" }, { "name": "ecs.capability.execution-role-awslogs" }, { "name": "com.amazonaws.ecs.capability.ecr-auth" }, { "name": "com.amazonaws.ecs.capability.docker-remote-api.1.19" }, { "name": "com.amazonaws.ecs.capability.docker-remote-api.1.28" }, { "name": "com.amazonaws.ecs.capability.task-iam-role" }, { "name": "ecs.capability.execution-role-ecr-pull" }, { "name": "com.amazonaws.ecs.capability.docker-remote-api.1.18" }, { "name": "com.amazonaws.ecs.capability.docker-remote-api.1.29" } ], "placementConstraints": [], "compatibilities": [ "EC2" ], "requiresCompatibilities": [ "EC2" ], "cpu": "2048", "memory": "2560", "runtimePlatform": { "cpuArchitecture": "X86_64", "operatingSystemFamily": "LINUX" }, "registeredAt": "2024-11-09T05:59:47.534Z", "registeredBy": "arn:aws:iam::<account-id>:root", "tags": [] }
app.py
import streamlit as st import boto3 import os import time from pathlib import Path s3 = boto3.client('s3', region_name='us-east-1') tran = boto3.client('translate', region_name='us-east-1') lam = boto3.client('lambda', region_name='us-east-1') # Function to list S3 buckets def listbuckets(): list_bucket = s3.list_buckets() bucket_name = tuple([it["Name"] for it in list_bucket["Buckets"]]) return bucket_name # Upload object to S3 bucket def upload_to_s3bucket(file_path, selected_bucket, file_name): s3.upload_file(file_path, selected_bucket, file_name) def list_language(): response = tran.list_languages() list_of_langs = [i["LanguageName"] for i in response["Languages"]] return list_of_langs def wait_for_s3obj(dest_selected_bucket, file_name): while True: try: get_obj = s3.get_object(Bucket=dest_selected_bucket, Key=f'Translated-{file_name}.txt') obj_exist = 'true' if get_obj['Body'] else 'false' return obj_exist except s3.exceptions.ClientError as e: if e.response['Error']['Code'] == "404": print(f"File '{file_name}' not found. Checking again in 3 seconds...") time.sleep(3) def download(dest_selected_bucket, file_name, file_path): s3.download_file(dest_selected_bucket,f'Translated-{file_name}.txt', f'{file_path}/download/Translated-{file_name}.txt') with open(f"{file_path}/download/Translated-{file_name}.txt", "r") as file: st.download_button( label="Download", data=file, file_name=f"{file_name}.txt" ) def streamlit_application(): # Give a header st.header("Document Translator", divider=True) # Widgets to upload a file uploaded_files = st.file_uploader("Choose a PDF file", accept_multiple_files=True, type="pdf") # # upload a file file_name = uploaded_files[0].name.replace(' ', '_') if uploaded_files else None # Folder path file_path = '/tmp' # Select the bucket from drop down selected_bucket = st.selectbox("Choose the S3 Bucket to upload file :", listbuckets()) dest_selected_bucket = st.selectbox("Choose the S3 Bucket to download file :", listbuckets()) selected_language = st.selectbox("Choose the Language :", list_language()) # Create a button click = st.button("Upload", type="primary") if click == True: if file_name: with open(f'{file_path}/{file_name}', mode='wb') as w: w.write(uploaded_files[0].getvalue()) # Set the selected language to the environment variable of lambda function lambda_env1 = lam.update_function_configuration(FunctionName='TriggerFunctionFromS3', Environment={'Variables': {'UserInputLanguage': selected_language, 'DestinationBucket': dest_selected_bucket, 'TranslatedFileName': file_name}}) # Upload the file to S3 bucket: upload_to_s3bucket(f'{file_path}/{file_name}', selected_bucket, file_name) if s3.get_object(Bucket=selected_bucket, Key=file_name): st.success("File uploaded successfully", icon="✅") output = wait_for_s3obj(dest_selected_bucket, file_name) if output: download(dest_selected_bucket, file_name, file_path) else: st.error("File upload failed", icon="?") streamlit_application()
about.py
import streamlit as st ## Write the description of application st.header("About") about = ''' Welcome to the File Uploader Application! This application is designed to make uploading PDF documents simple and efficient. With just a few clicks, users can upload their documents securely to an Amazon S3 bucket for storage. Here’s a quick overview of what this app does: **Key Features:** - **Easy Upload:** Users can quickly upload PDF documents by selecting the file and clicking the 'Upload' button. - **Seamless Integration with AWS S3:** Once the document is uploaded, it is stored securely in a designated S3 bucket, ensuring reliable and scalable cloud storage. - **User-Friendly Interface:** Built using Streamlit, the interface is clean, intuitive, and accessible to all users, making the uploading process straightforward. **How it Works:** 1. **Select a PDF Document:** Users can browse and select any PDF document from their local system. 2. **Upload the Document:** Clicking the ‘Upload’ button triggers the process of securely uploading the selected document to an AWS S3 bucket. 3. **Success Notification:** After a successful upload, users will receive a confirmation message that their document has been stored in the cloud. This application offers a streamlined way to store documents on the cloud, reducing the hassle of manual file management. Whether you're an individual or a business, this tool helps you organize and store your files with ease and security. You can further customize this page by adding technical details, usage guidelines, or security measures as per your application's specifications.''' st.markdown(about)
navigation.py
import streamlit as st pg = st.navigation([ st.Page("app.py", title="DocuTranslator", icon="?"), st.Page("about.py", title="About", icon="?") ], position="sidebar") pg.run()
Dockerfile:
FROM python:3.9-slim WORKDIR /streamlit COPY requirements.txt /streamlit/requirements.txt RUN pip install --no-cache-dir -r requirements.txt RUN mkdir /tmp/download COPY . /streamlit EXPOSE 8501 CMD ["streamlit", "run", "navigation.py", "--server.port=8501", "--server.headless=true"]
Docker 文件将通过打包所有上述应用程序配置文件来创建镜像,然后将其推送到 ECR 存储库。 Docker Hub 也可以用来存储镜像。
在该架构中,应用程序实例应该在私有子网中创建,并且负载均衡器应该创建以减少私有 EC2 实例的传入流量负载。
由于有两个底层 EC2 主机可用于托管容器,因此在两个 EC2 主机之间配置负载均衡以分配传入流量。创建两个不同的目标组,在每个目标组中放置两个 EC2 实例,权重为 50%。
负载均衡器接受端口 80 处的传入流量,然后传递到端口 16347 处的后端 EC2 实例,并传递给相应的 ECS 容器。
有一个 lambda 函数,配置为将源存储桶作为输入,从那里下载 pdf 文件并提取内容,然后将内容从当前语言翻译为用户提供的目标语言,并创建一个文本文件以上传到目标 S3桶。
{ "taskDefinitionArn": "arn:aws:ecs:us-east-1:<account-id>:task-definition/Streamlit_TDF-1:5", "containerDefinitions": [ { "name": "streamlit", "image": "<account-id>.dkr.ecr.us-east-1.amazonaws.com/anirban:latest", "cpu": 0, "portMappings": [ { "name": "streamlit-8501-tcp", "containerPort": 8501, "hostPort": 16347, "protocol": "tcp", "appProtocol": "http" } ], "essential": true, "environment": [], "environmentFiles": [], "mountPoints": [ { "sourceVolume": "streamlit-volume", "containerPath": "/streamlit", "readOnly": true } ], "volumesFrom": [], "ulimits": [], "logConfiguration": { "logDriver": "awslogs", "options": { "awslogs-group": "/ecs/Streamlit_TDF-1", "mode": "non-blocking", "awslogs-create-group": "true", "max-buffer-size": "25m", "awslogs-region": "us-east-1", "awslogs-stream-prefix": "ecs" }, "secretOptions": [] }, "systemControls": [] } ], "family": "Streamlit_TDF-1", "taskRoleArn": "arn:aws:iam::<account-id>:role/ecsTaskExecutionRole", "executionRoleArn": "arn:aws:iam::<account-id>:role/ecsTaskExecutionRole", "revision": 5, "volumes": [ { "name": "streamlit-volume", "host": { "sourcePath": "/streamlit_appfiles" } } ], "status": "ACTIVE", "requiresAttributes": [ { "name": "com.amazonaws.ecs.capability.logging-driver.awslogs" }, { "name": "ecs.capability.execution-role-awslogs" }, { "name": "com.amazonaws.ecs.capability.ecr-auth" }, { "name": "com.amazonaws.ecs.capability.docker-remote-api.1.19" }, { "name": "com.amazonaws.ecs.capability.docker-remote-api.1.28" }, { "name": "com.amazonaws.ecs.capability.task-iam-role" }, { "name": "ecs.capability.execution-role-ecr-pull" }, { "name": "com.amazonaws.ecs.capability.docker-remote-api.1.18" }, { "name": "com.amazonaws.ecs.capability.docker-remote-api.1.29" } ], "placementConstraints": [], "compatibilities": [ "EC2" ], "requiresCompatibilities": [ "EC2" ], "cpu": "2048", "memory": "2560", "runtimePlatform": { "cpuArchitecture": "X86_64", "operatingSystemFamily": "LINUX" }, "registeredAt": "2024-11-09T05:59:47.534Z", "registeredBy": "arn:aws:iam::<account-id>:root", "tags": [] }
打开应用程序负载均衡器 URL“ALB-747339710.us-east-1.elb.amazonaws.com”以打开 Web 应用程序。浏览任何 pdf 文件,保持源 "fileuploadbucket-hwirio984092jjs" 和目标存储桶 "translatedfileuploadbucket-kh939809kjkfjsekfl" 不变,因为在 lambda 代码中,它已被硬编码为目标桶就是上面提到的。选择您想要翻译文档的语言,然后单击上传。单击后,应用程序将开始轮询目标 S3 存储桶以查明翻译文件是否已上传。如果找到确切的文件,则会显示一个新选项“下载”,用于从目标 S3 存储桶下载文件。
申请链接:http://alb-747339710.us-east-1.elb.amazonaws.com/
实际内容:
import streamlit as st import boto3 import os import time from pathlib import Path s3 = boto3.client('s3', region_name='us-east-1') tran = boto3.client('translate', region_name='us-east-1') lam = boto3.client('lambda', region_name='us-east-1') # Function to list S3 buckets def listbuckets(): list_bucket = s3.list_buckets() bucket_name = tuple([it["Name"] for it in list_bucket["Buckets"]]) return bucket_name # Upload object to S3 bucket def upload_to_s3bucket(file_path, selected_bucket, file_name): s3.upload_file(file_path, selected_bucket, file_name) def list_language(): response = tran.list_languages() list_of_langs = [i["LanguageName"] for i in response["Languages"]] return list_of_langs def wait_for_s3obj(dest_selected_bucket, file_name): while True: try: get_obj = s3.get_object(Bucket=dest_selected_bucket, Key=f'Translated-{file_name}.txt') obj_exist = 'true' if get_obj['Body'] else 'false' return obj_exist except s3.exceptions.ClientError as e: if e.response['Error']['Code'] == "404": print(f"File '{file_name}' not found. Checking again in 3 seconds...") time.sleep(3) def download(dest_selected_bucket, file_name, file_path): s3.download_file(dest_selected_bucket,f'Translated-{file_name}.txt', f'{file_path}/download/Translated-{file_name}.txt') with open(f"{file_path}/download/Translated-{file_name}.txt", "r") as file: st.download_button( label="Download", data=file, file_name=f"{file_name}.txt" ) def streamlit_application(): # Give a header st.header("Document Translator", divider=True) # Widgets to upload a file uploaded_files = st.file_uploader("Choose a PDF file", accept_multiple_files=True, type="pdf") # # upload a file file_name = uploaded_files[0].name.replace(' ', '_') if uploaded_files else None # Folder path file_path = '/tmp' # Select the bucket from drop down selected_bucket = st.selectbox("Choose the S3 Bucket to upload file :", listbuckets()) dest_selected_bucket = st.selectbox("Choose the S3 Bucket to download file :", listbuckets()) selected_language = st.selectbox("Choose the Language :", list_language()) # Create a button click = st.button("Upload", type="primary") if click == True: if file_name: with open(f'{file_path}/{file_name}', mode='wb') as w: w.write(uploaded_files[0].getvalue()) # Set the selected language to the environment variable of lambda function lambda_env1 = lam.update_function_configuration(FunctionName='TriggerFunctionFromS3', Environment={'Variables': {'UserInputLanguage': selected_language, 'DestinationBucket': dest_selected_bucket, 'TranslatedFileName': file_name}}) # Upload the file to S3 bucket: upload_to_s3bucket(f'{file_path}/{file_name}', selected_bucket, file_name) if s3.get_object(Bucket=selected_bucket, Key=file_name): st.success("File uploaded successfully", icon="✅") output = wait_for_s3obj(dest_selected_bucket, file_name) if output: download(dest_selected_bucket, file_name, file_path) else: st.error("File upload failed", icon="?") streamlit_application()
翻译内容(加拿大法语)
import streamlit as st ## Write the description of application st.header("About") about = ''' Welcome to the File Uploader Application! This application is designed to make uploading PDF documents simple and efficient. With just a few clicks, users can upload their documents securely to an Amazon S3 bucket for storage. Here’s a quick overview of what this app does: **Key Features:** - **Easy Upload:** Users can quickly upload PDF documents by selecting the file and clicking the 'Upload' button. - **Seamless Integration with AWS S3:** Once the document is uploaded, it is stored securely in a designated S3 bucket, ensuring reliable and scalable cloud storage. - **User-Friendly Interface:** Built using Streamlit, the interface is clean, intuitive, and accessible to all users, making the uploading process straightforward. **How it Works:** 1. **Select a PDF Document:** Users can browse and select any PDF document from their local system. 2. **Upload the Document:** Clicking the ‘Upload’ button triggers the process of securely uploading the selected document to an AWS S3 bucket. 3. **Success Notification:** After a successful upload, users will receive a confirmation message that their document has been stored in the cloud. This application offers a streamlined way to store documents on the cloud, reducing the hassle of manual file management. Whether you're an individual or a business, this tool helps you organize and store your files with ease and security. You can further customize this page by adding technical details, usage guidelines, or security measures as per your application's specifications.''' st.markdown(about)
本文向我们展示了文档翻译过程如何像我们想象的那样简单,最终用户必须单击一些选项来选择所需的信息,并在几秒钟内获得所需的输出,而无需考虑配置。目前,我们已经包含了翻译 pdf 文档的单一功能,但稍后我们将对此进行更多研究,以便在单个应用程序中具有多种功能,并具有一些有趣的功能。
以上是使用 Streamlit 和 AWS Translator 的文档翻译服务的详细内容。更多信息请关注PHP中文网其他相关文章!