Home > Backend Development > Python Tutorial > Document Translation Service using Streamlit & AWS Translator

Document Translation Service using Streamlit & AWS Translator

Barbara Streisand
Release: 2025-01-01 02:35:09
Original
270 people have browsed it

Introduction:

DocuTranslator, a document translation system, built in AWS and developed by Streamlit application framework. This application allows end user to translate the documents in their preferred language which they want to upload. It provides feasibility to translate in multiple languages as user wants, which really helps users to understand the content in their comfortable way.

Background:

The intent of this project is to provide a user friendly, simple application interface to fulfill the translation process as simple as users expect. In this system, nobody has to translate documents by entering into AWS Translate service, rather end user can directly access the application endpoint and get the requirements fulfilled.

High Level Architecture Diagram:

Document Translation Service using Streamlit & AWS Translator

How Does This Work:

  • End user is allowed to access an application through an application load balancer.
  • Once application interface is opened, user will upload the required files to be translated and language to translate to.
  • After submitting these details, file will be uploaded to mentioned source S3 bucket which triggers a lambda function to connect with AWS Translator service.
  • Once translated document is ready, will be uploaded to destination S3 bucket.
  • After that, end user can download the translated document from Streamlit application portal.

Technical Architecture:

Document Translation Service using Streamlit & AWS Translator

Above architecture shows below key points -

  • Application code has been containerized and stored to ECR repository.
  • As per above design, an ECS cluster has been setup which instantiates two tasks that pulls application image from ECR repository.
  • Both the tasks are launched on top of EC2 as a launch type. Both EC2s are launched in private subnet in us-east-1a and us-east-1b availability zones.
  • A EFS file system is created to share application codes between two underlying EC2 instances. Two mountpoints are created in two availability zones (us-east-1a and us-east-1b).
  • Two public subnets are configured in front of private subnets and a NAT gateway is set up in the public subnet in us-east-1a availability zone.
  • An application load balancer has been configured in front of private subnets which distributes the traffic across two public subnets at port 80 of application load balancer security group(ALB SG).
  • Two EC2 instances are configured in two different target group with same EC2 security group(Streamlit_SG) which accepts traffic at 16347 port from application load balancer.
  • There is port mapping configured between port 16347 in EC2 instances and port 8501 at ECS container. Once traffic will hit at port 16347 of EC2 security group, will be redirected to 8501 port at ECS container level.

How Data is Getting Stored ?

Here, we have used EFS share path to share same application files between two underlying EC2 instances. We have created a mountpoint /streamlit_appfiles inside the EC2 instances and mounted with EFS share. This approach will help in sharing same content across two different servers. After that, our intent is to create a replicate same application content to container working directory which is /streamlit. For that we have used bind mounts so that whatever changes will be made on application code at EC2 level, will be replicated to container as well. We need to restrict bi-directional replication which says if anyone mistakenly changes code from inside the container, it should not replicate to EC2 host level, hence inside the container working directory has been created as a read only filesystem.

Document Translation Service using Streamlit & AWS Translator

ECS Container Configuration and Volume:

Underlying EC2 Configuration:
Instance Type: t2.medium
Network type: Private Subnet

Container Configuration:
Image:
Network Mode: Default
Host Port: 16347
Container Port: 8501
Task CPU: 2 vCPU (2048 units)
Task Memory: 2.5 GB (2560 MiB)

Document Translation Service using Streamlit & AWS Translator

Volume Configuration:
Volume Name: streamlit-volume
Source Path: /streamlit_appfiles
Container Path: /streamlit
Read Only Filesystem: YES

Document Translation Service using Streamlit & AWS Translator

Task Definition Reference:

{
    "taskDefinitionArn": "arn:aws:ecs:us-east-1:<account-id>:task-definition/Streamlit_TDF-1:5",
    "containerDefinitions": [
        {
            "name": "streamlit",
            "image": "<account-id>.dkr.ecr.us-east-1.amazonaws.com/anirban:latest",
            "cpu": 0,
            "portMappings": [
                {
                    "name": "streamlit-8501-tcp",
                    "containerPort": 8501,
                    "hostPort": 16347,
                    "protocol": "tcp",
                    "appProtocol": "http"
                }
            ],
            "essential": true,
            "environment": [],
            "environmentFiles": [],
            "mountPoints": [
                {
                    "sourceVolume": "streamlit-volume",
                    "containerPath": "/streamlit",
                    "readOnly": true
                }
            ],
            "volumesFrom": [],
            "ulimits": [],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "/ecs/Streamlit_TDF-1",
                    "mode": "non-blocking",
                    "awslogs-create-group": "true",
                    "max-buffer-size": "25m",
                    "awslogs-region": "us-east-1",
                    "awslogs-stream-prefix": "ecs"
                },
                "secretOptions": []
            },
            "systemControls": []
        }
    ],
    "family": "Streamlit_TDF-1",
    "taskRoleArn": "arn:aws:iam::<account-id>:role/ecsTaskExecutionRole",
    "executionRoleArn": "arn:aws:iam::<account-id>:role/ecsTaskExecutionRole",
    "revision": 5,
    "volumes": [
        {
            "name": "streamlit-volume",
            "host": {
                "sourcePath": "/streamlit_appfiles"
            }
        }
    ],
    "status": "ACTIVE",
    "requiresAttributes": [
        {
            "name": "com.amazonaws.ecs.capability.logging-driver.awslogs"
        },
        {
            "name": "ecs.capability.execution-role-awslogs"
        },
        {
            "name": "com.amazonaws.ecs.capability.ecr-auth"
        },
        {
            "name": "com.amazonaws.ecs.capability.docker-remote-api.1.19"
        },
        {
            "name": "com.amazonaws.ecs.capability.docker-remote-api.1.28"
        },
        {
            "name": "com.amazonaws.ecs.capability.task-iam-role"
        },
        {
            "name": "ecs.capability.execution-role-ecr-pull"
        },
        {
            "name": "com.amazonaws.ecs.capability.docker-remote-api.1.18"
        },
        {
            "name": "com.amazonaws.ecs.capability.docker-remote-api.1.29"
        }
    ],
    "placementConstraints": [],
    "compatibilities": [
        "EC2"
    ],
    "requiresCompatibilities": [
        "EC2"
    ],
    "cpu": "2048",
    "memory": "2560",
    "runtimePlatform": {
        "cpuArchitecture": "X86_64",
        "operatingSystemFamily": "LINUX"
    },
    "registeredAt": "2024-11-09T05:59:47.534Z",
    "registeredBy": "arn:aws:iam::<account-id>:root",
    "tags": []
}
Copy after login
Copy after login

Document Translation Service using Streamlit & AWS Translator

Developing Application Code and Creating Docker Image:

app.py

import streamlit as st
import boto3
import os
import time
from pathlib import Path

s3 = boto3.client('s3', region_name='us-east-1')
tran = boto3.client('translate', region_name='us-east-1')
lam = boto3.client('lambda', region_name='us-east-1')


# Function to list S3 buckets
def listbuckets():
    list_bucket = s3.list_buckets()
    bucket_name = tuple([it["Name"] for it in list_bucket["Buckets"]])
    return bucket_name

# Upload object to S3 bucket
def upload_to_s3bucket(file_path, selected_bucket, file_name):
    s3.upload_file(file_path, selected_bucket, file_name)

def list_language():
    response = tran.list_languages()
    list_of_langs = [i["LanguageName"] for i in response["Languages"]]
    return list_of_langs

def wait_for_s3obj(dest_selected_bucket, file_name):
    while True:
        try:
            get_obj = s3.get_object(Bucket=dest_selected_bucket, Key=f'Translated-{file_name}.txt')
            obj_exist = 'true' if get_obj['Body'] else 'false'
            return obj_exist
        except s3.exceptions.ClientError as e:
            if e.response['Error']['Code'] == "404":
                print(f"File '{file_name}' not found. Checking again in 3 seconds...")
                time.sleep(3)

def download(dest_selected_bucket, file_name, file_path):
     s3.download_file(dest_selected_bucket,f'Translated-{file_name}.txt', f'{file_path}/download/Translated-{file_name}.txt')
     with open(f"{file_path}/download/Translated-{file_name}.txt", "r") as file:
       st.download_button(
             label="Download",
             data=file,
             file_name=f"{file_name}.txt"
       )

def streamlit_application():
    # Give a header
    st.header("Document Translator", divider=True)
    # Widgets to upload a file
    uploaded_files = st.file_uploader("Choose a PDF file", accept_multiple_files=True, type="pdf")
    # # upload a file
    file_name = uploaded_files[0].name.replace(' ', '_') if uploaded_files else None
    # Folder path
    file_path = '/tmp'
    # Select the bucket from drop down
    selected_bucket = st.selectbox("Choose the S3 Bucket to upload file :", listbuckets())
    dest_selected_bucket = st.selectbox("Choose the S3 Bucket to download file :", listbuckets())
    selected_language = st.selectbox("Choose the Language :", list_language())
    # Create a button
    click = st.button("Upload", type="primary")
    if click == True:
        if file_name:
            with open(f'{file_path}/{file_name}', mode='wb') as w:
                w.write(uploaded_files[0].getvalue())
        # Set the selected language to the environment variable of lambda function
        lambda_env1 = lam.update_function_configuration(FunctionName='TriggerFunctionFromS3', Environment={'Variables': {'UserInputLanguage': selected_language, 'DestinationBucket': dest_selected_bucket, 'TranslatedFileName': file_name}})
        # Upload the file to S3 bucket:
        upload_to_s3bucket(f'{file_path}/{file_name}', selected_bucket, file_name)
        if s3.get_object(Bucket=selected_bucket, Key=file_name):
            st.success("File uploaded successfully", icon="✅")
            output = wait_for_s3obj(dest_selected_bucket, file_name)
            if output:
              download(dest_selected_bucket, file_name, file_path)
        else:
            st.error("File upload failed", icon="?")


streamlit_application()
Copy after login
Copy after login

about.py

import streamlit as st

## Write the description of application
st.header("About")
about = '''
Welcome to the File Uploader Application!

This application is designed to make uploading PDF documents simple and efficient. With just a few clicks, users can upload their documents securely to an Amazon S3 bucket for storage. Here’s a quick overview
of what this app does:

**Key Features:**
- **Easy Upload:** Users can quickly upload PDF documents by selecting the file and clicking the 'Upload' button.
- **Seamless Integration with AWS S3:** Once the document is uploaded, it is stored securely in a designated S3 bucket, ensuring reliable and scalable cloud storage.
- **User-Friendly Interface:** Built using Streamlit, the interface is clean, intuitive, and accessible to all users, making the uploading process straightforward.

**How it Works:**
1. **Select a PDF Document:** Users can browse and select any PDF document from their local system.
2. **Upload the Document:** Clicking the ‘Upload’ button triggers the process of securely uploading the selected document to an AWS S3 bucket.
3. **Success Notification:** After a successful upload, users will receive a confirmation message that their document has been stored in the cloud.
This application offers a streamlined way to store documents on the cloud, reducing the hassle of manual file management. Whether you're an individual or a business, this tool helps you organize and store your
 files with ease and security.
You can further customize this page by adding technical details, usage guidelines, or security measures as per your application's specifications.'''

st.markdown(about)
Copy after login
Copy after login

navigation.py

import streamlit as st

pg = st.navigation([
    st.Page("app.py", title="DocuTranslator", icon="?"),
    st.Page("about.py", title="About", icon="?")
], position="sidebar")

pg.run()
Copy after login

Dockerfile:

FROM python:3.9-slim
WORKDIR /streamlit
COPY requirements.txt /streamlit/requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
RUN mkdir /tmp/download
COPY . /streamlit
EXPOSE 8501
CMD ["streamlit", "run", "navigation.py", "--server.port=8501", "--server.headless=true"]
Copy after login

Docker file will create an image by packaging all above application configuration files and then it was pushed to ECR repository. Docker Hub can also be used to store the image.

Load Balancing

In the architecture, application instances are supposed to be created in private subnet and load balancer is supposed to create to reduce incoming traffic load to private EC2 instances.
As there are two underlying EC2 hosts available to host containers, so load balancing is configured across two EC2 hosts to distribute incoming traffic. Two different target groups are created to place two EC2 instances in each with 50% weightage.

Load balancer accepts incoming traffic at port 80 and then passes to backend EC2 instances at port 16347 and that also passed to corresponding ECS container.

Document Translation Service using Streamlit & AWS Translator

Document Translation Service using Streamlit & AWS Translator

Lambda Function:

There is a lambda function configured to take source bucket as an input to download pdf file from there and extract the contents, then it translates the contents from current language to user provided target language and creates a text file to upload to destination S3 bucket.

{
    "taskDefinitionArn": "arn:aws:ecs:us-east-1:<account-id>:task-definition/Streamlit_TDF-1:5",
    "containerDefinitions": [
        {
            "name": "streamlit",
            "image": "<account-id>.dkr.ecr.us-east-1.amazonaws.com/anirban:latest",
            "cpu": 0,
            "portMappings": [
                {
                    "name": "streamlit-8501-tcp",
                    "containerPort": 8501,
                    "hostPort": 16347,
                    "protocol": "tcp",
                    "appProtocol": "http"
                }
            ],
            "essential": true,
            "environment": [],
            "environmentFiles": [],
            "mountPoints": [
                {
                    "sourceVolume": "streamlit-volume",
                    "containerPath": "/streamlit",
                    "readOnly": true
                }
            ],
            "volumesFrom": [],
            "ulimits": [],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "/ecs/Streamlit_TDF-1",
                    "mode": "non-blocking",
                    "awslogs-create-group": "true",
                    "max-buffer-size": "25m",
                    "awslogs-region": "us-east-1",
                    "awslogs-stream-prefix": "ecs"
                },
                "secretOptions": []
            },
            "systemControls": []
        }
    ],
    "family": "Streamlit_TDF-1",
    "taskRoleArn": "arn:aws:iam::<account-id>:role/ecsTaskExecutionRole",
    "executionRoleArn": "arn:aws:iam::<account-id>:role/ecsTaskExecutionRole",
    "revision": 5,
    "volumes": [
        {
            "name": "streamlit-volume",
            "host": {
                "sourcePath": "/streamlit_appfiles"
            }
        }
    ],
    "status": "ACTIVE",
    "requiresAttributes": [
        {
            "name": "com.amazonaws.ecs.capability.logging-driver.awslogs"
        },
        {
            "name": "ecs.capability.execution-role-awslogs"
        },
        {
            "name": "com.amazonaws.ecs.capability.ecr-auth"
        },
        {
            "name": "com.amazonaws.ecs.capability.docker-remote-api.1.19"
        },
        {
            "name": "com.amazonaws.ecs.capability.docker-remote-api.1.28"
        },
        {
            "name": "com.amazonaws.ecs.capability.task-iam-role"
        },
        {
            "name": "ecs.capability.execution-role-ecr-pull"
        },
        {
            "name": "com.amazonaws.ecs.capability.docker-remote-api.1.18"
        },
        {
            "name": "com.amazonaws.ecs.capability.docker-remote-api.1.29"
        }
    ],
    "placementConstraints": [],
    "compatibilities": [
        "EC2"
    ],
    "requiresCompatibilities": [
        "EC2"
    ],
    "cpu": "2048",
    "memory": "2560",
    "runtimePlatform": {
        "cpuArchitecture": "X86_64",
        "operatingSystemFamily": "LINUX"
    },
    "registeredAt": "2024-11-09T05:59:47.534Z",
    "registeredBy": "arn:aws:iam::<account-id>:root",
    "tags": []
}
Copy after login
Copy after login

Application Testing:

Open the application load balancer url "ALB-747339710.us-east-1.elb.amazonaws.com" to open the web application. Browse any pdf file, keep both source "fileuploadbucket-hwirio984092jjs" and destination bucket "translatedfileuploadbucket-kh939809kjkfjsekfl" as it is, because in the lambda code, it has been hard coded the target bucket is as mentioned above. Choose the language you want the document to be translated and click on upload. Once it's clicked, application program will start polling the destination S3 bucket to find out if the translated file is uploaded. If it find the exact file, then a new option "Download" will be visible to down load the file from destination S3 bucket.

Application Link: http://alb-747339710.us-east-1.elb.amazonaws.com/

Document Translation Service using Streamlit & AWS Translator

Actual Content:

import streamlit as st
import boto3
import os
import time
from pathlib import Path

s3 = boto3.client('s3', region_name='us-east-1')
tran = boto3.client('translate', region_name='us-east-1')
lam = boto3.client('lambda', region_name='us-east-1')


# Function to list S3 buckets
def listbuckets():
    list_bucket = s3.list_buckets()
    bucket_name = tuple([it["Name"] for it in list_bucket["Buckets"]])
    return bucket_name

# Upload object to S3 bucket
def upload_to_s3bucket(file_path, selected_bucket, file_name):
    s3.upload_file(file_path, selected_bucket, file_name)

def list_language():
    response = tran.list_languages()
    list_of_langs = [i["LanguageName"] for i in response["Languages"]]
    return list_of_langs

def wait_for_s3obj(dest_selected_bucket, file_name):
    while True:
        try:
            get_obj = s3.get_object(Bucket=dest_selected_bucket, Key=f'Translated-{file_name}.txt')
            obj_exist = 'true' if get_obj['Body'] else 'false'
            return obj_exist
        except s3.exceptions.ClientError as e:
            if e.response['Error']['Code'] == "404":
                print(f"File '{file_name}' not found. Checking again in 3 seconds...")
                time.sleep(3)

def download(dest_selected_bucket, file_name, file_path):
     s3.download_file(dest_selected_bucket,f'Translated-{file_name}.txt', f'{file_path}/download/Translated-{file_name}.txt')
     with open(f"{file_path}/download/Translated-{file_name}.txt", "r") as file:
       st.download_button(
             label="Download",
             data=file,
             file_name=f"{file_name}.txt"
       )

def streamlit_application():
    # Give a header
    st.header("Document Translator", divider=True)
    # Widgets to upload a file
    uploaded_files = st.file_uploader("Choose a PDF file", accept_multiple_files=True, type="pdf")
    # # upload a file
    file_name = uploaded_files[0].name.replace(' ', '_') if uploaded_files else None
    # Folder path
    file_path = '/tmp'
    # Select the bucket from drop down
    selected_bucket = st.selectbox("Choose the S3 Bucket to upload file :", listbuckets())
    dest_selected_bucket = st.selectbox("Choose the S3 Bucket to download file :", listbuckets())
    selected_language = st.selectbox("Choose the Language :", list_language())
    # Create a button
    click = st.button("Upload", type="primary")
    if click == True:
        if file_name:
            with open(f'{file_path}/{file_name}', mode='wb') as w:
                w.write(uploaded_files[0].getvalue())
        # Set the selected language to the environment variable of lambda function
        lambda_env1 = lam.update_function_configuration(FunctionName='TriggerFunctionFromS3', Environment={'Variables': {'UserInputLanguage': selected_language, 'DestinationBucket': dest_selected_bucket, 'TranslatedFileName': file_name}})
        # Upload the file to S3 bucket:
        upload_to_s3bucket(f'{file_path}/{file_name}', selected_bucket, file_name)
        if s3.get_object(Bucket=selected_bucket, Key=file_name):
            st.success("File uploaded successfully", icon="✅")
            output = wait_for_s3obj(dest_selected_bucket, file_name)
            if output:
              download(dest_selected_bucket, file_name, file_path)
        else:
            st.error("File upload failed", icon="?")


streamlit_application()
Copy after login
Copy after login

Translated Content (in Canadian French)

import streamlit as st

## Write the description of application
st.header("About")
about = '''
Welcome to the File Uploader Application!

This application is designed to make uploading PDF documents simple and efficient. With just a few clicks, users can upload their documents securely to an Amazon S3 bucket for storage. Here’s a quick overview
of what this app does:

**Key Features:**
- **Easy Upload:** Users can quickly upload PDF documents by selecting the file and clicking the 'Upload' button.
- **Seamless Integration with AWS S3:** Once the document is uploaded, it is stored securely in a designated S3 bucket, ensuring reliable and scalable cloud storage.
- **User-Friendly Interface:** Built using Streamlit, the interface is clean, intuitive, and accessible to all users, making the uploading process straightforward.

**How it Works:**
1. **Select a PDF Document:** Users can browse and select any PDF document from their local system.
2. **Upload the Document:** Clicking the ‘Upload’ button triggers the process of securely uploading the selected document to an AWS S3 bucket.
3. **Success Notification:** After a successful upload, users will receive a confirmation message that their document has been stored in the cloud.
This application offers a streamlined way to store documents on the cloud, reducing the hassle of manual file management. Whether you're an individual or a business, this tool helps you organize and store your
 files with ease and security.
You can further customize this page by adding technical details, usage guidelines, or security measures as per your application's specifications.'''

st.markdown(about)
Copy after login
Copy after login

Conclusion:

This article has shown us how document translation process can be as easy as we imagine where an end user has to click on some options to choose required information and get the desired output within few seconds without thinking about configuration. For now, we have included single feature to translate a pdf document, but later on, we will research more on this to have multi functionality in a single application with having some interesting features.

The above is the detailed content of Document Translation Service using Streamlit & AWS Translator. For more information, please follow other related articles on the PHP Chinese website!

source:dev.to
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template