infiniDB在linux下完成倒库
infiniDB列式数据库,查询速度快。但维护速度慢,增删改 特别慢,不适合大数据量操作。 在mysql,大数据量查询慢,但维度快(增删改快)。 方案,mysql下,入库每天的数据,按天处理和维护数据。导入infiniDB 一,连接数据库:Navicat连接。建立mysql和infin
infiniDB列式数据库,查询速度快。但维护速度慢,增删改 特别慢,不适合大数据量操作。
在mysql,大数据量查询慢,但维度快(增删改快)。
方案,mysql下,入库每天的数据,按天处理和维护数据。导入infiniDB
一,连接数据库:Navicat连接。建立mysql和infiniDB的数据表结构。
mysql数据由kettle导入,并完成相关逻辑处理,如去重,去空等(复杂按经验分析)。
二,本机mysql倒出表
用kettle设计一个自动化部署,将mysql数据表导出成tbl格式文件。网上有Smoodo @ freenode.net ##pentaho做的,infiniDB export.kjb, 完成相关设置即可。
三,infiniDB倒入数据库,运行infiniDB的提供的开源shell脚本,自动导入。
1,windows下编写的shell脚本,放到linux下不能运行,要先dos2unxi转化。判断vim下,:%!xxd 看十六进制是否出现0a0d,对应为"."。
2,注意,kette导出的tbl表,默认是用|作为delimiter。如果你的数据中包含|字符,就会出现错行。先go through 文件内容,确定不包含该delimiter,否则就要更换,配合hive首选"\t"。
3,infiniDB倒库的shell脚本,放在/usr/local/Calpont/data/bulk/data/import路径下。基本内容如下:
cd /usr/local/Calpont/data/bulk/data/import;
/usr/local/Calpont/bin/colxml aso1 -t dimAppNameNew -d "\t" -j 1
/usr/local/Calpont/bin/cpimport -j 1
(1)区分大小写,没-l,默认找dimAppNameNew.tbl。-d "\t"是delimiter改成tab。
(2)一个shell脚本重复执行,会重复插入。只能执行一次。
(3)双引号无影响:/usr/local/Calpont/bin/colxml aso1 -t "dimAppNameNew" -l "dimappnamenew.tbl" -d "\t" -j 1 ,也能正确执行。
(4)shell脚本在不同路径下也能执行。前面的cd是为了让当前路径固定在指定路径下,修改后无影响,所以目前作用不清楚。(好像colxml会默认去找import路径下的文件,没时间去认证了。估计该也是该colxml文件里的配置参数)
(5)shell脚本按-j的配置生成job文件,在/usr/local/Calpont/data/bulk/job下。-j是设置对应的job数,cpimport会完成指定的job,导入。
(6)colxml或者cpimport 加-h可以查看参数信息:
/usr/local/Calpont/bin/colxml -h
显示如下
Usage: colxml [options] dbName
Options:
-d delimiter (default '|')
-e max error rows (numeric)
-h Print this message
-j Job id (numeric)
-l load file name
-n "name in quotes"
-p path for XML job description file that is generated
-s "description in quotes"
-t table name
-u user
-r Number of read buffers (numeric)
-c Read buffer size (numeric)
-w Write buffer size (numeric)
-x Extension of file name (default ".tbl")
-E EnclosedByChar (if data has enclosed values)
-C EscapeChar
-b debug level (1-3)
dbName - Required parm specifying the name of the database;
all others are optional
Example:
colxml -t lineitem -j 123 tpch
如具体的例子:
/usr/local/Calpont/bin/colxml ssp_bi_cloud_saiku -x tbl -d "\t" -l "dimAd.tbl" -j 1
/usr/local/Calpont/bin/colxml ssp_bi_cloud_saiku -t dimad -x tbl -d "\t" -l "dimAd.tbl" -j 1
/usr/local/Calpont/bin/colxml ssp_bi_cloud_saiku -d "\t" dimad -j 1
最后,saiku连接:xml文件已建好,在linux下,放到saiku的安装目录下,看是否连接成功,能否使用。

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



Linux is suitable for servers, development environments, and embedded systems. 1. As a server operating system, Linux is stable and efficient, and is often used to deploy high-concurrency applications. 2. As a development environment, Linux provides efficient command line tools and package management systems to improve development efficiency. 3. In embedded systems, Linux is lightweight and customizable, suitable for environments with limited resources.

The steps to start Apache are as follows: Install Apache (command: sudo apt-get install apache2 or download it from the official website) Start Apache (Linux: sudo systemctl start apache2; Windows: Right-click the "Apache2.4" service and select "Start") Check whether it has been started (Linux: sudo systemctl status apache2; Windows: Check the status of the "Apache2.4" service in the service manager) Enable boot automatically (optional, Linux: sudo systemctl

When the Apache 80 port is occupied, the solution is as follows: find out the process that occupies the port and close it. Check the firewall settings to make sure Apache is not blocked. If the above method does not work, please reconfigure Apache to use a different port. Restart the Apache service.

This article describes how to effectively monitor the SSL performance of Nginx servers on Debian systems. We will use NginxExporter to export Nginx status data to Prometheus and then visually display it through Grafana. Step 1: Configuring Nginx First, we need to enable the stub_status module in the Nginx configuration file to obtain the status information of Nginx. Add the following snippet in your Nginx configuration file (usually located in /etc/nginx/nginx.conf or its include file): location/nginx_status{stub_status

The steps to start an Oracle listener are as follows: Check the listener status (using the lsnrctl status command) For Windows, start the "TNS Listener" service in Oracle Services Manager For Linux and Unix, use the lsnrctl start command to start the listener run the lsnrctl status command to verify that the listener is started

This article introduces two methods of configuring a recycling bin in a Debian system: a graphical interface and a command line. Method 1: Use the Nautilus graphical interface to open the file manager: Find and start the Nautilus file manager (usually called "File") in the desktop or application menu. Find the Recycle Bin: Look for the Recycle Bin folder in the left navigation bar. If it is not found, try clicking "Other Location" or "Computer" to search. Configure Recycle Bin properties: Right-click "Recycle Bin" and select "Properties". In the Properties window, you can adjust the following settings: Maximum Size: Limit the disk space available in the Recycle Bin. Retention time: Set the preservation before the file is automatically deleted in the recycling bin

To restart the Apache server, follow these steps: Linux/macOS: Run sudo systemctl restart apache2. Windows: Run net stop Apache2.4 and then net start Apache2.4. Run netstat -a | findstr 80 to check the server status.

In Debian systems, readdir system calls are used to read directory contents. If its performance is not good, try the following optimization strategy: Simplify the number of directory files: Split large directories into multiple small directories as much as possible, reducing the number of items processed per readdir call. Enable directory content caching: build a cache mechanism, update the cache regularly or when directory content changes, and reduce frequent calls to readdir. Memory caches (such as Memcached or Redis) or local caches (such as files or databases) can be considered. Adopt efficient data structure: If you implement directory traversal by yourself, select more efficient data structures (such as hash tables instead of linear search) to store and access directory information
