ORACLE 报表数据库开发设想-Mysql Tutorial-php.cn

Home

Database

Mysql Tutorial

ORACLE 报表数据库开发设想

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 07, 2016 pm 03:50 PM

oracle develop Report database shark

OLAP 称为在线分析,其实就是报表系统,和BI系统. BI系统是套产品在这里不谈. 分析和报表其实都是用存储过程开发出来的,一个是在线提供给用户使用,另一个是离线提供给同事使用的. 在线分析目前来看应用不广,所涉及到的数据量相对比较小,只是用户量比较大 1 用

OLAP 称为在线分析,其实就是报表系统,和BI系统. BI系统是套产品在这里不谈. 分析和报表其实都是用存储过程开发出来的,一个是在线提供给用户使用,另一个是离线提供给同事使用的.

在线分析目前来看应用不广,所涉及到的数据量相对比较小,只是用户量比较大

1 用户只关心自己的. 比如购买次数,购买总额,等用户所关心的数据

2 产品关联,比如说购买该产品的用户还购买了其他什么产品!

3 产品火红度;

而报表涉及到所有的数据,包含历性数据. 每个部门有不同的报表要求,每个同事,每个部门领导都会提些自己关心的报表.

ORACLE 数据库是从交易型数据库发展过来的,处理分析型数据时候总有点力不从心!

1 开始安装数据库时候选择OLAP 它会自动调整下必要的参数

2 设置64-128KB的数据块而不是默认的8KB

3 分层设计, 因为报表众多,如果直接从原始表获取必然造成性能大阻塞. 因此要把基础的,共同的做成数据表,其他报表直接从这些基数表里获取数据. 这样就极大减少了数量.

a 抽取源表层 b 基础表层 C 共同层 D 部门层

如何分? 哪些数据做在哪里,是需要多业务了解和熟悉,对公司和各个部门的报表了解,方能有大概的想法, 这些不一定一开始就能搞定的,需要不断地优化中.因为短时间内无法对业务的彻底熟悉.

4 任务调度:

采用储存过程和软件包来做每个报表,每个表的数据产生. 那么这些任务之间必然产生了依赖.

采用ORACLE 本身的JOB来调度,采用存储过程里面包含存储过程,也就是说JOB调度启动存储过程,启动存储过程把相关的存储过程包含在一起.

该方法不太灵活,扩展性比较差,维护比较难!

应该采用crontab 方式的调度. 比如说写个轮休的JOB 该JOB每隔5-10分钟运行一次. 该JOB只调用一个存储过程. 存储过程启动任务,任务是软件包或者是存储过程.

该存储过程读取任务信息表, 任务依赖表,何时启动该任务, 并监督任务运行状况和报警.

5 软件包里一般包含 a 抽取存储过程; b 清单存储过程;c 日数据存储过程; d 周数据存储过程; e 月存储过程;f 移动到结果表的存储过程;g 回滚的存储过程;h清理过期数据的过程

a 抽取存储过程把源表的数据抽取到临时表中,这里指任务所需数据的表; 这里的临时表是物理的以_TMP命名的.

之所以采用临时表法,因为ORACLE 对表连接成本很高, 尤其是多表的LEFT JOIN +LEFT JOIN . 采用临时表可以把必要的字段,必要的行形成较小的数据块.

b 清单存储过程

清单的意思是这部分数据要临时存上1-3个月,主要的是去重的要求, 求一个月的人数不能从每天的人数SUM过来. 以_LST命名这个清单要做成分区表月,日或者小时的分区.

C 日数据过程是从清单里获取数据进行统计,当然如果没有清单直接从抽取的临时表中获得

D 周过程, 周这个时间很麻烦的事情尤其涉及到跨年的周. 如果不去重可以直接从日数据中提取

E 月过程同上.

F 过程: 是避免结果表的更新影响到领导的查询, 所以先把所有的数据整合在一个临时汇总表中,再移动到结果表

G过程:是个重要的过程,它主要功能是实现回滚UNDO操作,因为依靠ORACLE自身的UNDO机制是很慢的.

处理月报表每天都累加一次的情况,或者是清单过于庞大,保留一个月太多了,或者说扫描一个月的数据太久了.那么采取每天跑一次,每天加一次.

类似是 update table set value=value+new_value;

这样的场景,如果运算过程中发生了故障,就会发生前后数据不一致,只更新了30%的数据就故障了. 所以更新前,把新的值存储在回滚表中.每次运行前调用回滚过程,检查回滚标志

如果非正常结束,那么提取相应的数据对数据进行 UPDATE TABLE SET VALUE=VALUE-NEW_VALUE 操作;

H 清理过程: 这里主要是清理暂时保留一段时间的清单表.

每个过程运行前都要做 TRUNCATE TABLE XXXXX_TMP 的清空表的操作. 如果涉及到清单和目的表,那么要DELETE TABLE WHERE YYYY= XXXX 因为避免得到重复的数据.

6 游标批处理

因为数据量很大成百上千万行, 不可能一次性地提交上去. 比如 insert into table_name (xx,yyy,zz,hhh,) select xx,yy,zz,hh from table_tmp left jion table_tmp2; 会很慢滴

采用游标和批提取方式

cursor cur_day_result is --计算月登录人数和次数

select provcode from table_b group by 1;

type type_provcode is table of oss_openplat_truslogin_day_lst.provcode%type index by binary_integer;

l_ary_provcode type_provcode;

begin

    open cur_day_result;
    loop
      fetch cur_day_result bulk collect into

l_ary_provcode

limit g_batch_size_n; --- 这里可以控制提取行数

      forall i in 1..l_ary_provcode.count
          insert into login_day_lst
          ( provcode)

values(l_ary_provcode )

commit; -- 这里把一部分数据提交到数据上

end loop

7 复杂的要求:

经常有连续三个月的购买用户人数, 日增加额和增加率, 当天与上个月当天的比即同比; 月累加值.

采用MERG INTO和 UPDATE 的方式会比较慢. 直接采用INSERT 和DELETE

比如日期, 分类1,分类2,分类3,统计值,统计值月累加;

通过日数据过程和月数据过程分别生成了数据

日期, 分类1,分类2,分类3,统计值;

日期, 分类1,分类2,分类3,统计值月累加;

分别insert into 到汇总表 (日期, 分类1,分类2,分类3,统计值,统计值月累加)

insert into 汇总表 (日期, 分类1,分类2,分类3,统计值,统计值月累加) select 日期, 分类1,分类2,分类3,统计值,0 from table_day_tmp;

把不属自己的字段值0

最后汇总表在移动结果表时

select 日期, 分类1,分类2,分类3, sum(统计值),sum(统计值月累加) from 汇总表 group by 日期, 分类1,分类2,分类3

8 宽表行转列

思想是通过增加列的数量来减少行的数量. 比如解决连续三个月的购买用户人数的报表需求

我们有用户表,用户购买记录表; 如果我们的用户相对比较少有1百万吧如果这1百万人中 12个月购买记录行数达到2亿行.平均每个月有1千6百万行;

从3个月的记录中大约5千4百万统计连续3个月的用户,应该会比较慢的.

假如做个宽表用户 1月购买次数,2月购买次数.......12月购买次数, 第一次购买时间,最后次购买时间

那么这个表只有1百行的记录

select 用户

from table

where 1月购买次数 > 0 and 2月购买次数>0 and 3月购买次数>0

9 报表分等级

如果说所有的报表要在早上上班9前跑出来,这是个比较难以完成的任务. 在数据量非常少的情况下比如20G 用 1台机器 32G内存 8个CPU 多个硬盘的RAID

确实可以达到要求. 如果数据量达到500GB级以上就会出现麻烦事了.

因此觉得要把报表分级别实现优先级处理

A 级报表在9:00前跑出这一般都是公司业务核心报表高层和老板 CTO CEO 这类人要看的

B级报表在中午12:00前跑出这个各部门领导关心的

C级报表在下午下班6:00前这个就是普通员工

D级报表在晚上跑出来的; 比如监控之类的

10 RAC集群

RAC并不能提升性能使用RAC关键是把任务分在不同节点上

A节点做主要的管理节点;

B节点做数据抽取同步节点,一当数据大的话必须24小时全天候时时抽取,时时同步;

C节点报表节点 ; 主要跑各个报表的任务过程

D节点页面节点报表如果以HTML方式展现来,那么页面服务器访问的数据库必须单独的节点,避免其他操作影响到该节点.

E节点随机查询节点: 这个节点基本上做自己人查询数据,核对数据,更改数据的节点.

A 节点是RAC的管理节点负责整个集群块的管理和锁的处理. 所以为了不影响性能必须单独用一个节点来负责整个集群的通讯

B 节点要做24小时数据插入工作也要单独使用一个

C 节点重量级节点该节点使用的机器比其他节点性能高出数倍. 内存达要更大才能内存进行大量数据块的操作,而不是被LINUX交换分区掉了

D节点面子节点领导老板同事访问页面的快慢体验就在这个节点上,如果跟其他节点合并在一起,容易被其他节点的任务把内存给占了.

7 分区表

一般分区达到2层就是双分区.当有的情况下要达3层物理月表月表下日分区日分区下是LIST分区. 物理月表是人工给表起名字 "TABLE_201206 "

这样要不断地人工建新表, 而存储过程访问时候需要从数据字典里获得该表名, 要不采用时间拼接法然后采用动态语句.编写起来比较繁琐.

分区表 ORACLE建议大于2G的表进行分区. 那么最小的分区应该是容量多大? 这要涉及到机器性能和IO吞吐量,以及一个分区全表扫描时间的忍受程度.

如果分区1个G 而全扫一次要10分钟,那么自然不可接受. 那么一个分区应该在1分钟内完成全扫描

11 索引

基本上不建议在表里建索引,采用多层分区表,实现全表扫描. 因为索引会导致反而比全扫描慢,索引在大规模数据更新的时候维护成本高. 会极大影响各个报表的运行时间.

索引大部分用在结果表上,因为结果表插入的数据量最少,更新的频率最低,维护成本最小.查询效率最高.

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks ago By DDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks ago By DDD

Where to find the Crane Control Keycard in Atomfall

3 weeks ago By DDD

Assassin's Creed Shadows - How To Find The Blacksmith And Unlock Weapon And Armour Customisation

1 months ago By DDD

Roblox: Dead Rails - How To Complete Every Challenge

3 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7627

CakePHP Tutorial

1389

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

140

Related knowledge

MySQL: An Introduction to the World's Most Popular Database Apr 12, 2025 am 12:18 AM

MySQL is an open source relational database management system, mainly used to store and retrieve data quickly and reliably. Its working principle includes client requests, query resolution, execution of queries and return results. Examples of usage include creating tables, inserting and querying data, and advanced features such as JOIN operations. Common errors involve SQL syntax, data types, and permissions, and optimization suggestions include the use of indexes, optimized queries, and partitioning of tables.

Why Use MySQL? Benefits and Advantages Apr 12, 2025 am 12:17 AM

MySQL is chosen for its performance, reliability, ease of use, and community support. 1.MySQL provides efficient data storage and retrieval functions, supporting multiple data types and advanced query operations. 2. Adopt client-server architecture and multiple storage engines to support transaction and query optimization. 3. Easy to use, supports a variety of operating systems and programming languages. 4. Have strong community support and provide rich resources and solutions.

What to do if the oracle can't be opened Apr 11, 2025 pm 10:06 PM

Solutions to Oracle cannot be opened include: 1. Start the database service; 2. Start the listener; 3. Check port conflicts; 4. Set environment variables correctly; 5. Make sure the firewall or antivirus software does not block the connection; 6. Check whether the server is closed; 7. Use RMAN to recover corrupt files; 8. Check whether the TNS service name is correct; 9. Check network connection; 10. Reinstall Oracle software.

How to solve the problem of closing oracle cursor Apr 11, 2025 pm 10:18 PM

The method to solve the Oracle cursor closure problem includes: explicitly closing the cursor using the CLOSE statement. Declare the cursor in the FOR UPDATE clause so that it automatically closes after the scope is ended. Declare the cursor in the USING clause so that it automatically closes when the associated PL/SQL variable is closed. Use exception handling to ensure that the cursor is closed in any exception situation. Use the connection pool to automatically close the cursor. Disable automatic submission and delay cursor closing.

How to create cursors in oracle loop Apr 12, 2025 am 06:18 AM

In Oracle, the FOR LOOP loop can create cursors dynamically. The steps are: 1. Define the cursor type; 2. Create the loop; 3. Create the cursor dynamically; 4. Execute the cursor; 5. Close the cursor. Example: A cursor can be created cycle-by-circuit to display the names and salaries of the top 10 employees.

How to create oracle dynamic sql Apr 12, 2025 am 06:06 AM

SQL statements can be created and executed based on runtime input by using Oracle's dynamic SQL. The steps include: preparing an empty string variable to store dynamically generated SQL statements. Use the EXECUTE IMMEDIATE or PREPARE statement to compile and execute dynamic SQL statements. Use bind variable to pass user input or other dynamic values to dynamic SQL. Use EXECUTE IMMEDIATE or EXECUTE to execute dynamic SQL statements.

How to stop oracle database Apr 12, 2025 am 06:12 AM

To stop an Oracle database, perform the following steps: 1. Connect to the database; 2. Shutdown immediately; 3. Shutdown abort completely.

How to read the oracle awr report Apr 11, 2025 pm 09:45 PM

An AWR report is a report that displays database performance and activity snapshots. The interpretation steps include: identifying the date and time of the activity snapshot. View an overview of activities and resource consumption. Analyze session activities to find session types, resource consumption, and waiting events. Find potential performance bottlenecks such as slow SQL statements, resource contention, and I/O issues. View waiting events, identify and resolve them for performance. Analyze latch and memory usage patterns to identify memory issues that are causing performance issues.

See all articles