etl工作中的设计问题
背景1 : 随着接入数据和处理数据的增加,生产脚本也越来越多,脚本由于前期的开发人员没有做到规范管理,导致脚本很乱。 解决方案: 1) 在lunix上规范目录,按平台,业务模块分目录存放。 2) 做好版本管理,提交到生产的脚本必须要commit到svn服务器。 3
背景1 : 随着接入数据和处理数据的增加,生产脚本也越来越多,脚本由于前期的开发人员没有做到规范管理,导致脚本很乱。
解决方案:
1) 在lunix上规范目录,按平台,业务模块分目录存放。
2) 做好版本管理,提交到生产的脚本必须要commit到svn服务器。
3) lunix上的目录是反应到svn的目录映射。
背景2 :脚本中很多地方有范围,指标,参数值,怎么把这些做的更灵活,而不是写死?
解决方案:
1)尽量把中文或英文映射为数字,不仅节省存储资源,还使程序更灵活。
比如:平台有 pc电脑,手机等,那就可以分别用1代表pc电脑 2 代表手机 3 代表其它。同时做一个码表来解释对应的关系。
主页 0 a
电台 1 b
语种 2 c
华语 3 d
多级目录的解析
/主页/电台/语种/华语
/0/1/2/3
/1/2/3
/xxx/xxx/xxx/xxx
如果多级目录有变化,怎么自动适应目录变化或者 回归二进制本质,用二进制表示。
2) 灵活应用参数列表,做一个的参数码表,动态生成hql语句。
比如:现在要分时段统计,用户出现次数,时段如下:
早上 6:00 -8:00
上午 8:00-12:00
中午 12:00-14:00
下午 14:00-18:00
晚上 18:00-23:00
深夜 23:00-00:00
凌晨 00:00-6:00
做一个码表 id comment time_region val par1 par2
1 早上 6:00 -8:00 1 6 8
2 上午 8:00-12:00 2 8 12
3 中午 12:00-14:00 3 12 14
4 下午 14:00-18:00 4 14 18
5 晚上 18:00-23:00 5 18 23
6 深夜 23:00-24:00 6 23 24
7 凌晨 00:00-6:00 7 0 6
读取该表,动态拼hql ,用后面的val分别代表这些时段。不管以后时段怎么修改,只要修改码表就行了。
比如要写如下的hive_sql
insert overwrite table common.order select userid ,case when hour_time>=0 and hour_time<=2 then '00_03' when hour_time>=3 and hour_time<=5 then '03_06' when hour_time>=6 and hour_time<=7 then '06_08' when hour_time>=8 and hour_time<=11 then '08_12' when hour_time>=12 and hour_time<=13 then '12_14' when hour_time>=14 and hour_time<=17 then '14_18' when hour_time>=18 and hour_time<=23 then '18_24' else '0' end as hour_time ,count(distinct dt) hour_dt_num where dt >='2014-10-01' and dt<='2014-10-30' group by userid, case when hour_time>=0 and hour_time<=2 then '00_03' when hour_time>=3 and hour_time<=5 then '03_06' when hour_time>=6 and hour_time<=7 then '06_08' when hour_time>=8 and hour_time<=11 then '08_12' when hour_time>=12 and hour_time<=13 then '12_14' when hour_time>=14 and hour_time<=17 then '14_18' when hour_time>=18 and hour_time<=23 then '18_24' else '0' end
可以写成这样子:
#!/bin/bash # # add by lishc 2014-11-25 mysql=`which mysql` user="root" password="123" database="test" table="parm" command="select par1,par2,id from test.$table " $mysql -u${user} -p${password} -e "${command}" >m.txt ###初始化 echo " insert overwrite table common.order">mysql.sql echo " select ">>mysql.sql echo " userId " >>mysql.sql echo " case ">>mysql.sql sed -i -e '1d' m.txt cat m.txt |while read line do par1=$(echo "${line}"|awk -F ' ' '{print $1}') par2=$(echo "${line}"|awk -F ' ' '{print $2}') id=$(echo "${line}"|awk -F ' ' '{print $3}') echo "par1: ${par1}" echo "par2: ${par2}" echo " when hour_time >=${par1} and hour_time<=${par2} then '${id}' ">>mysql.sql Done
3) 所有的脚本存放在数据库中,用程序解析参数,并调用执行。
可参考kettle设计:
每个步骤组件化:输入,输出,执行脚本,执行sql,管理执行顺序。
由于ETL过程或数据分析模型,都是一些有序的sql或脚本操作。

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



0.What does this article do? We propose DepthFM: a versatile and fast state-of-the-art generative monocular depth estimation model. In addition to traditional depth estimation tasks, DepthFM also demonstrates state-of-the-art capabilities in downstream tasks such as depth inpainting. DepthFM is efficient and can synthesize depth maps within a few inference steps. Let’s read about this work together ~ 1. Paper information title: DepthFM: FastMonocularDepthEstimationwithFlowMatching Author: MingGui, JohannesS.Fischer, UlrichPrestel, PingchuanMa, Dmytr

1. First, we right-click the blank space of the taskbar and select the [Task Manager] option, or right-click the start logo, and then select the [Task Manager] option. 2. In the opened Task Manager interface, we click the [Services] tab on the far right. 3. In the opened [Service] tab, click the [Open Service] option below. 4. In the [Services] window that opens, right-click the [InternetConnectionSharing(ICS)] service, and then select the [Properties] option. 5. In the properties window that opens, change [Open with] to [Disabled], click [Apply] and then click [OK]. 6. Click the start logo, then click the shutdown button, select [Restart], and complete the computer restart.

The performance of JAX, promoted by Google, has surpassed that of Pytorch and TensorFlow in recent benchmark tests, ranking first in 7 indicators. And the test was not done on the TPU with the best JAX performance. Although among developers, Pytorch is still more popular than Tensorflow. But in the future, perhaps more large models will be trained and run based on the JAX platform. Models Recently, the Keras team benchmarked three backends (TensorFlow, JAX, PyTorch) with the native PyTorch implementation and Keras2 with TensorFlow. First, they select a set of mainstream

Facing lag, slow mobile data connection on iPhone? Typically, the strength of cellular internet on your phone depends on several factors such as region, cellular network type, roaming type, etc. There are some things you can do to get a faster, more reliable cellular Internet connection. Fix 1 – Force Restart iPhone Sometimes, force restarting your device just resets a lot of things, including the cellular connection. Step 1 – Just press the volume up key once and release. Next, press the Volume Down key and release it again. Step 2 – The next part of the process is to hold the button on the right side. Let the iPhone finish restarting. Enable cellular data and check network speed. Check again Fix 2 – Change data mode While 5G offers better network speeds, it works better when the signal is weaker

According to news on April 17, HMD teamed up with the well-known beer brand Heineken and the creative company Bodega to launch a unique flip phone - The Boring Phone. This phone is not only full of innovation in design, but also returns to nature in terms of functionality, aiming to lead people back to real interpersonal interactions and enjoy the pure time of drinking with friends. Boring mobile phone adopts a unique transparent flip design, showing a simple yet elegant aesthetic. It is equipped with a 2.8-inch QVGA display inside and a 1.77-inch display outside, providing users with a basic visual interaction experience. In terms of photography, although it is only equipped with a 30-megapixel camera, it is enough to handle simple daily tasks.

According to news on April 26, ZTE’s 5G portable Wi-Fi U50S is now officially on sale, starting at 899 yuan. In terms of appearance design, ZTE U50S Portable Wi-Fi is simple and stylish, easy to hold and pack. Its size is 159/73/18mm and is easy to carry, allowing you to enjoy 5G high-speed network anytime and anywhere, achieving an unimpeded mobile office and entertainment experience. ZTE 5G portable Wi-Fi U50S supports the advanced Wi-Fi 6 protocol with a peak rate of up to 1800Mbps. It relies on the Snapdragon X55 high-performance 5G platform to provide users with an extremely fast network experience. Not only does it support the 5G dual-mode SA+NSA network environment and Sub-6GHz frequency band, the measured network speed can even reach an astonishing 500Mbps, which is easily satisfactory.

I cry to death. The world is madly building big models. The data on the Internet is not enough. It is not enough at all. The training model looks like "The Hunger Games", and AI researchers around the world are worrying about how to feed these data voracious eaters. This problem is particularly prominent in multi-modal tasks. At a time when nothing could be done, a start-up team from the Department of Renmin University of China used its own new model to become the first in China to make "model-generated data feed itself" a reality. Moreover, it is a two-pronged approach on the understanding side and the generation side. Both sides can generate high-quality, multi-modal new data and provide data feedback to the model itself. What is a model? Awaker 1.0, a large multi-modal model that just appeared on the Zhongguancun Forum. Who is the team? Sophon engine. Founded by Gao Yizhao, a doctoral student at Renmin University’s Hillhouse School of Artificial Intelligence.

Recently, the military circle has been overwhelmed by the news: US military fighter jets can now complete fully automatic air combat using AI. Yes, just recently, the US military’s AI fighter jet was made public for the first time and the mystery was unveiled. The full name of this fighter is the Variable Stability Simulator Test Aircraft (VISTA). It was personally flown by the Secretary of the US Air Force to simulate a one-on-one air battle. On May 2, U.S. Air Force Secretary Frank Kendall took off in an X-62AVISTA at Edwards Air Force Base. Note that during the one-hour flight, all flight actions were completed autonomously by AI! Kendall said - "For the past few decades, we have been thinking about the unlimited potential of autonomous air-to-air combat, but it has always seemed out of reach." However now,
