Microsoft recently open-sourced MarkItDown, a program that converts Office files to Markdown format. The project quickly climbed to GitHub's trending list upon release.
However, since MarkItDown is a Python program, it might be challenging for non-technical users to use. To address this issue, I thought of using WebAssembly technology to run Python code directly in the browser.
Pyodide is an open-source program that runs Python in the browser, using WebAssembly to port CPython, so it supports all Python syntax. Cloudflare's Python Workers also use Pyodide.
Pyodide is a port of CPython to WebAssembly/Emscripten.
Pyodide makes it possible to install and run Python packages in the browser using micropip. Any pure Python package with wheels available on PyPI is supported.
Many packages with C extensions have also been ported for use with Pyodide. These include common packages like regex, PyYAML, lxml, and scientific Python packages including NumPy, pandas, SciPy, Matplotlib, and scikit-learn. Pyodide comes with a robust JavaScript ⟺ Python foreign function interface that allows you to freely mix these languages in your code with minimal friction. This includes comprehensive support for error handling, async/await, and more.
When used in the browser, Python has full access to the Web APIs.
Trying to run MarkItDown was surprisingly smooth, proving that WebAssembly is truly the future of browsers.
The main challenges faced and solutions:
File Transfer Issue: How to pass user-selected files to the Python runtime in the Worker?
Dependency Installation Issue: Limited access to PyPI in mainland China.
Eventually, we successfully implemented a MarkItDown tool that runs entirely in the browser. Feel free to try it out at Office File to Markdown.
Here's the core code for running Python in the Worker:
// eslint-disable-next-line no-undef importScripts('https://testingcf.jsdelivr.net/pyodide/v0.26.4/full/pyodide.js') async function loadPyodideAndPackages() { // eslint-disable-next-line no-undef const pyodide = await loadPyodide() globalThis.pyodide = pyodide await pyodide.loadPackage('micropip') const micropip = pyodide.pyimport('micropip') // micropip.set_index_urls([ // 'https://pypi.your.domains/pypi/simple', // ]) await micropip.install('markitdown==0.0.1a2') } const pyodideReadyPromise = loadPyodideAndPackages() globalThis.onmessage = async (event) => { await pyodideReadyPromise const file = event.data try { console.log('file', file) const startTime = Date.now() globalThis.pyodide.FS.writeFile(`/${file.filename}`, file.buffer) await globalThis.pyodide.runPythonAsync(` from markitdown import MarkItDown markitdown = MarkItDown() result = markitdown.convert("/${file.filename}") print(result.text_content) with open("/${file.filename}.md", "w") as file: file.write(result.text_content) `) globalThis.postMessage({ filename: `${file.filename}.md`, content: globalThis.pyodide.FS.readFile(`/${file.filename}.md`, { encoding: 'utf8' }), time: Date.now() - startTime, }) } catch (error) { globalThis.postMessage({ error: error.message || 'convert error', filename: file.filename }) } }
The above is the detailed content of Run Python programs easily in the browser.. For more information, please follow other related articles on the PHP Chinese website!