How to build a new Harlequin adapter with Poetry
Welcome to the first post in LETSQL's tutorial series!
In this blog post, we take a detour from our usual theme of data pipelines to demonstrate how to create and publish a Python package with Poetry, using DataFusion as an example.
Introduction
Harlequin is a TUI client for SQL databases known for its light-weight extensive support for SQL databases. It is a versatile tool for data exploration and analysis workflows. Harlequin provides an interactive SQL editor with features like autocomplete, syntax highlighting, and query history. It also has a results viewer that can display large result sets. However, Harlequin did not have a DataFusion adapter before. Thankfully, it was really easy to add one.
In this post, We'll demonstrate these concepts by building a Harlequin adapter for DataFusion. And, by way of doing so, we will also cover Poetry's essential features, project setup, and the steps to publish your package on PyPI.
To get the most out of this guide, you should have a basic understanding of virtual environments, Python packages and modules, and pip.
Our objectives are to:
- Introduce Poetry and its advantages
- Set up a project using Poetry
- Develop a Harlequin adapter for DataFusion
- Prepare and publish the package to PyPI
By the end, you'll have practical experience with Poetry and an understanding of modern Python package management.
The code implemented in this post is available on GitHub and available in PyPI.
Harlequin
Harlequin is a SQL IDE that runs in the terminal. It provides a powerful and feature-rich alternative to traditional command-line database tools, making it versatile for data exploration and analysis workflows.
Some key things to know about Harlequin:
- Harlequin supports multiple database adapters, connecting you to DuckDB, SQLite, PostgreSQL, MySQL, and more.
- Harlequin provides an interactive SQL editor with features like autocomplete, syntax highlighting, and query history. It also has a results viewer that can display large result sets.
- Harlequin replaces traditional terminal-based database tools with a more powerful and user-friendly interface.
- Harlequin uses adapter plug-ins as a generic interface to any database.
DataFusion
DataFusion is a fast, extensible query engine for building high-quality data-centric systems in Rust, using the Apache Arrow in-memory format.
DataFusion offers SQL and Dataframe APIs, excellent performance, built-in support for CSV, Parquet, JSON, and Avro, extensive customization, and a great community.
It ships with it its own CLI, more information can be found here.
Poetry
Poetry is a modern, feature-rich tool that streamlines dependency management and packaging for Python projects, making development more deterministic and efficient.
From the documentation:
Poetry is a tool for dependency management and packaging in Python. It allows you to declare the libraries your project depends on, and it will manage (install/update) them for you.
Poetry offers a lockfile to ensure repeatable installs and can build your project for distribution.
Creating New Adapters for Harlequin
A Harlequin adapter is a Python package that allows Harlequin to work with a database system.
An adapter is a Python package that declares an entry point in the harlequin.adapters group. That entry point should reference a subclass of the HarlequinAdapter abstract base class.
This allows Harlequin to discover installed adapters and instantiate a selected adapter at run-time
In addition to the HarlequinAdapter class, the package must also provide implementations for HarlequinConnection, and HarlequinCursor. A more detailed description can be found on this
guide.
The Harlequin Adapter Template
The first step for developing a Harlequin adapter is to generate a new repo from the existing harlequin-adapter-template
GitHub templates are repositories that serve as starting points for new projects. They provide pre-configured files, structures, and settings that are copied to new repositories, allowing for quick project setup without the overhead of forking.
This feature streamlines the process of creating consistent, well-structured projects based on established patterns.
The harlequin-adapter-template comes with a poetry.lock file and a pyproject.toml file, in addition to some boilerplate code for defining the required classes.
Coding the Adapter
Let's explore the essential files needed for package distribution before we get into the specifics of coding.
Package configuration
The pyproject.toml file is now the standard for configuring Python packages for publication and other tools. Introduced in PEP 518 and PEP 621, this TOML-formatted file consolidates multiple configuration files into one. It enhances dependency management by making it more robust and standardized.
Poetry, utilizes pyproject.toml to handle the project's virtual environment, resolve dependencies, and create packages.
The pyproject.toml of the template is as follows:
[tool.poetry] name = "harlequin-myadapter" version = "0.1.0" description = "A Harlequin adapter for <my favorite database>." authors = ["Ted Conbeer <tconbeer@users.noreply.github.com>"] license = "MIT" readme = "README.md" packages = [ { include = "harlequin_myadapter", from = "src" }, ] [tool.poetry.plugins."harlequin.adapter"] my-adapter = "harlequin_myadapter:MyAdapter" [tool.poetry.dependencies] python = ">=3.8.1,<4.0" harlequin = "^1.7" [tool.poetry.group.dev.dependencies] ruff = "^0.1.6" pytest = "^7.4.3" mypy = "^1.7.0" pre-commit = "^3.5.0" importlib_metadata = { version = ">=4.6.0", python = "<3.10.0" } [build-system] requires = ["poetry-core"] build-backend = "poetry.core.masonry.api"
As it can be seen:
The [tool.poetry] section of the pyproject.toml file is where you define the metadata for your Python package, such as the name, version, description, authors, etc.
The [tool.poetry.dependencies] subsection is where you declare the runtime dependencies your project requires. Running poetry add
will automatically update this section. The [tool.poetry.dev-dependencies] subsection is where you declare development-only dependencies, like testing frameworks, linters, etc.
The [build-system] section is used to store build-related data. In this case, it specifies the build-backend as "poetry.core.masonry.api". In a narrow sense, the core responsibility of a
build-backend is to build wheels and sdist.The repository also includes a poetry.lock file, a Poetry-specific component generated by running poetry install or poetry update. This lock file specifies the exact versions of all dependencies and sub-dependencies for your project, ensuring reproducible installations across different environments.
It's crucial to avoid manual edits to the poetry.lock file, as this can cause inconsistencies and installation issues. Instead, make changes to your pyproject.toml file and allow Poetry to automatically update the lock file by running poetry lock.
Getting Poetry
Per Poetry's installation warning
::: {.warning}
Poetry should always be installed in a dedicated virtual environment to isolate it from the rest of your system. It should in no case be installed in the environment of the project that is to be managed by Poetry.
:::Here we will presume you have access to Poetry by running pipx install poetry
Developing in the virtual environment
With our file structure clarified, let's begin the development process by setting up our environment. Since our project already includes pyproject.toml and poetry.lock files, we can initiate our environment using the poetry shell command.
This command activates the virtual environment linked to the current Poetry project, ensuring all subsequent operations occur within the project's dependency context. If no virtual environment exists, poetry shell automatically creates and activates one.
poetry shell detects your current shell and launches a new instance within the virtual environment. As Poetry centralizes virtual environments by default, this command eliminates the need to locate or recall the specific path to the activate script.
To verify which Python environment is currently in use with Poetry, you can use the following commands:
poetry env list --full-path
Copy after loginThis will show all the virtual environments associated with your project and indicate which one is currently active.
As an alternative, you can get the full path of only the current environment:
poetry env info -p
Copy after loginWith the environment activated, use poetry install to install the required dependencies. The command works as follows
- If a poetry.lock file is present, poetry install will use the exact versions specified in that file rather than resolving the dependencies dynamically. This ensures consistent, repeatable installations across different environments. i. If you run poetry install and it doesn't seem to be progressing, you may need to run export PYTHON_KEYRING_BACKEND=keyring.backends.null.Keyring in the shell you're installing in
- Otherwise, it reads the pyproject.toml file in the current project, resolves the dependencies listed there, and installs them.
- If no poetry.lock file exists, poetry install will create one after resolving the dependencies, otherwise it will update the existing one.
To complete the environment setup, we need to add the datafusion library to our dependencies. Execute the following command:
poetry add datafusion
Copy after loginThis command updates your pyproject.toml file with the datafusion package and installs it. If you don't specify a version, Poetry will automatically select an appropriate one based on available package versions.
Implementing the Interfaces
To create a Harlequin Adapter, you need to implement three interfaces defined as abstract classes in the harlequin.adapter module.
The first one is the HarlequinAdapter.
#| eval: false #| code-fold: false #| code-summary: implementation of HarlequinAdapter class DataFusionAdapter(HarlequinAdapter): def __init__(self, conn_str: Sequence[str], **options: Any) -> None: self.conn_str = conn_str self.options = options def connect(self) -> DataFusionConnection: conn = DataFusionConnection(self.conn_str, self.options) return conn
Copy after loginThe second one is the HarlequinConnection, particularly the methods execute and get_catalog.
#| eval: false #| code-fold: false #| code-summary: implementation of execution of HarlequinConnection def execute(self, query: str) -> HarlequinCursor | None: try: cur = self.conn.sql(query) # type: ignore if str(cur.logical_plan()) == "EmptyRelation": return None except Exception as e: raise HarlequinQueryError( msg=str(e), title="Harlequin encountered an error while executing your query.", ) from e else: if cur is not None: return DataFusionCursor(cur) else: return None
Copy after loginFor brevity, we've omitted the implementation of the get_catalog function. You can find the full code in the adapter.py file within our GitHub repository.
Finally, a HarlequinCursor implementation must be provided as well:
#| eval: false #| code-fold: false #| code-summary: implementation of HarlequinCursor class DataFusionCursor(HarlequinCursor): def __init__(self, *args: Any, **kwargs: Any) -> None: self.cur = args[0] self._limit: int | None = None def columns(self) -> list[tuple[str, str]]: return [ (field.name, _mapping.get(field.type, "?")) for field in self.cur.schema() ] def set_limit(self, limit: int) -> DataFusionCursor: self._limit = limit return self def fetchall(self) -> AutoBackendType: try: if self._limit is None: return self.cur.to_arrow_table() else: return self.cur.limit(self._limit).to_arrow_table() except Exception as e: raise HarlequinQueryError( msg=str(e), title="Harlequin encountered an error while executing your query.", ) from e
Copy after loginMaking the plugin discoverable
Your adapter must register an entry point in the harlequin.adapters group using the packaging software you use to build your project.
If you use Poetry, you can define the entry point in your pyproject.toml file:
[tool.poetry.plugins."harlequin.adapter"] datafusion = "harlequin_datafusion:DataFusionAdapter"
Copy after loginAn entry point is a mechanism for code to advertise components it provides to be discovered and used by other code.
Notice that registering a plugin with Poetry is equivalent to the following pyproject.toml specification for entry points:
[project.entry-points."harlequin.adapter"] datafusion = "harlequin_datafusion:DataFusionAdapter"
Copy after loginTesting
The template provides a set of pre-configured tests, some of which are applicable to DataFusion while others may not be relevant. One test that's pretty cool checks if the plugin can be discovered, which is crucial for ensuring proper integration:
#| eval: false #| code-fold: false if sys.version_info < (3, 10): from importlib_metadata import entry_points else: from importlib.metadata import entry_points def test_plugin_discovery() -> None: PLUGIN_NAME = "datafusion" eps = entry_points(group="harlequin.adapter") assert eps[PLUGIN_NAME] adapter_cls = eps[PLUGIN_NAME].load() assert issubclass(adapter_cls, HarlequinAdapter) assert adapter_cls == DataFusionAdapter
Copy after loginTo make sure the tests are passing, run:
poetry run pytest
Copy after loginThe run command executes the given command inside the project’s virtualenv.
Building and Publishing to PyPI
With the tests passing, we're nearly ready to publish our project. Let's enhance our pyproject.toml file to make our package more discoverable and appealing on PyPI. We'll add key metadata including:
- A link to the GitHub repository
- A path to the README file
- A list of relevant classifiers
These additions will help potential users find and understand our package more easily.
classifiers = [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "Topic :: Software Development :: User Interfaces", "Topic :: Database :: Database Engines/Servers", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: Implementation :: CPython" ] readme = "README.md" repository = "https://github.com/mesejo/datafusion-adapter"
Copy after loginFor reference:
- The complete list of classifiers is available on PyPI's website.
- For a detailed guide on writing pyproject.toml, check out this resource.
- The formal, technical specification for pyproject.toml can be found on packaging.python.org.
Building
We're now ready to build our library and verify its functionality by installing it in a clean virtual environment. Let's start with the build process:
poetry build
Copy after loginThis command will create distribution packages (both source and wheel) in the dist directory.
The wheel file should have a name like harlequin_datafusion-0.1.1-py3-none-any.whl. This follows the standard naming convention:
- harlequin_datafusion is the package (or distribution) name
- 0.1.1 is the version number
- py3 indicates it's compatible with Python 3
- none compatible with any CPU architecture
- any with no ABI (pure Python)
To test the installation, create a new directory called test_install. Then, set up a fresh virtual environment with the following command:
python -m venv .venv
Copy after loginTo activate the virtual environment on MacOS or Linux:
source .venv/bin/activate
Copy after loginAfter running this command, you should see the name of your virtual environment (.venv) prepended to your command prompt, indicating that the virtual environment is now active.
To install the wheel file we just built, use pip as follows:
pip install /path/to/harlequin_datafusion-0.1.1-py3-none-any.whl
Copy after loginReplace /path/to/harlequin_datafusion-0.1.1-py3-none-any.whl with the actual path to the wheel file you want to install.
If everything works fined, you should see some dependencies installed, and you should be able to do:
harlequin -a datafusion
Copy after loginCongrats! You have built a Python library. Now it is time to share it with the world.
Publishing to PyPI
The best practice before publishing to PyPI is to actually publish to the Test Python Package Index (TestPyPI)
To publish a package to TestPyPI using Poetry, follow these steps:
Create an account at TestPyPI if you haven't already.
Generate an API token on your TestPyPI account page.
-
Register the TestPyPI repository with Poetry by running:
poetry config repositories.test-pypi https://test.pypi.org/legacy/
Copy after login -
To publish your package, run:
poetry publish -r testpypi --username __token__ --password <token>
Copy after login
Replace
with the actual token value you generated in step 2. To verify the publishing process, use the following command:
python -m pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple <package-name>
Copy after loginThis command uses two key arguments:
- --index-url: Directs pip to find your package on TestPyPI.
- --extra-index-url: Allows pip to fetch any dependencies from the main PyPI repository.
Replace
with your specific package name (e.g., harlequin-datafusion if following this post). For additional details, consult the information provided in this post. To publish to the actual Python Package Index (PyPI) instead:
Create an account at https://pypi.org/ if you haven't already.
Generate an API token on your PyPI account page.
-
Run:
poetry publish --username __token__ --password <token>
Copy after login
The default repository is PyPI, so there's no need to specify it.
Is worth noting that Poetry only supports the Legacy Upload API when publishing your project.
Automated Publishing on GitHub release
Manually publishing each time is repetitive and error-prone, so to fix this problem, let us create a GitHub Action to
publish each time we create a release.Here are the key steps to publish a Python package to PyPI using GitHub Actions and Poetry:
Set up PyPI authentication: You must provide your PyPI credentials (the API token) as GitHub secrets so the GitHub Actions workflow can access them. Name these secrets something like PYPI_TOKEN.
Create a GitHub Actions workflow file: In your project's .github/workflows directory, create a new file like publish.yml with the following content:
name: Build and publish python package on: release: types: [ published ] jobs: publish-package: runs-on: ubuntu-latest permissions: contents: write steps: - uses: actions/checkout@v3 - uses: actions/setup-python@v4 with: python-version: '3.10' - name: Install Poetry uses: snok/install-poetry@v1 - run: poetry config pypi-token.pypi "${{ secrets.PYPI_TOKEN }}" - name: Publish package run: poetry publish --build --username __token__
Copy after loginThe key is to leverage GitHub Actions to automate the publishing process and use Poetry to manage your package's dependencies and metadata.
Conclusion
Poetry is a user-friendly Python package management tool that simplifies project setup and publication. Its intuitive command-line interface streamlines environment management and dependency installation. It supports plugin development, integrates with other tools, and emphasizes testing for robust code. With straightforward commands for building and publishing packages, Poetry makes it easier for developers to share their work with the Python community.
At LETSQL, we're committed to contributing to the developer community. We hope this blog post serves as a straightforward guide to developing and publishing Python packages, emphasizing best practices and providing valuable resources.
To subscribe to our newsletter, visit letsql.com.Future Work
As we continue to refine the adapter, we would like to provide better autocompletion and direct reading from files (parquet, csv) as in the DataFusion-cli. This requires a tighter integration with the Rust library without going through the Python bindings.
Your thoughts and feedback are invaluable as we navigate this journey. Share your experiences, questions, or suggestions in the comments below or on our community forum. Let's redefine the boundaries of data science and machine learning integration.
Acknowledgements
Thanks to Dan Lovell and Hussain Sultan for the comments and the thorough review.
The above is the detailed content of How to build a new Harlequin adapter with Poetry. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

Solution to permission issues when viewing Python version in Linux terminal When you try to view Python version in Linux terminal, enter python...

How to avoid being detected when using FiddlerEverywhere for man-in-the-middle readings When you use FiddlerEverywhere...

How to teach computer novice programming basics within 10 hours? If you only have 10 hours to teach computer novice some programming knowledge, what would you choose to teach...

When using Python's pandas library, how to copy whole columns between two DataFrames with different structures is a common problem. Suppose we have two Dats...

How does Uvicorn continuously listen for HTTP requests? Uvicorn is a lightweight web server based on ASGI. One of its core functions is to listen for HTTP requests and proceed...

Fastapi ...

Using python in Linux terminal...

Understanding the anti-crawling strategy of Investing.com Many people often try to crawl news data from Investing.com (https://cn.investing.com/news/latest-news)...
