Table of Contents

简介

Yahoo数据仓库

Home

Database

Mysql Tutorial

数据仓库体系架构

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 07, 2016 pm 03:28 PM

storehouse branch data Architecture Introduction

简介数据仓库架构，是IT架构的一个分支，随着数据在企业的核心作用的增强，数据仓库的架构日益重要。数据仓库架构由于其技术选择非常广泛，看上去复杂，不过背后有一套比较稳定的思路，这也是数据仓库架构设计的一个要点，稳定中蕴含变化，变化中蕴含稳定。

简介

数据仓库架构，是IT架构的一个分支，随着数据在企业的核心作用的增强，数据仓库的架构日益重要。数据仓库架构由于其技术选择非常广泛，看上去复杂，不过背后有一套比较稳定的思路，这也是数据仓库架构设计的一个要点，稳定中蕴含变化，变化中蕴含稳定。
总体来说，数据仓库架构分成两大块，一是硬件架构，二是软件架构。硬软架构又可以分成封闭式和开放式。封闭式硬件架构代表厂商有teradata，其硬件是专属的，必须使用特殊的硬件才能运行。开放式硬件架构的代表有oracle，可以运行在各种硬件上，不过开放和封闭之间的界限也逐步的融合，oracle也开始打包hp的专属硬件来推广其dw的方案，而teradata也开始用基于suse的os可运行的硬件上提供其dw产品。封闭式硬件好处是开箱即用，经过厂商的严格测试，保障性比较高，开放式硬件则需要企业具备很强大的技术实力，能够有一支具备硬件，存储，操作系统综合知识和能力的团队，在组合成一套可以运行dw软件的基础平台，并且在发现问题的时候要能很快速的定位问题的原因并解决。
数据仓库的软件架构选择更加丰富。从数据库软件，etl软件，展现软件，数据挖掘软件，每一种类型里面都具备非常多的选择。这些软件的选择是架构设计的一部分，架构设计的重要核心一部分是综合这些软件的一套思路，在一套dw架构设计的思路下，软件可以很灵活的进行选择。
软件物理架构主要特征区别就是行存储和列存储。这个也是曾经很多厂商津津乐道的地方，根据需求的不同，2种方式可以灵活采用。大部分db软件都是采用行存储，而列存储的特征在于高效的单列值压缩，在选择列比较少的时候需要io要求很低，速度很快，不过行存储的db目前在压缩效率上也在迅速提升，大部分需求还是选择行数据进行观察，行存储也更加便于表的按记录拆分进行并行化。

Yahoo数据仓库

Yahoo数据仓库在基础架构上由hadoop集群和Oracle集群组成，hadoop集群是一个计算平台，完成所有ETL数据处理过程；Oracle集群只是一个查询环境。
数据通过Data highway从源系统加载进入数据仓库的ODS层，ODS层数据保持与源系统数据结构一样。EDW数据层并没有严格意义的数据层次的逻辑细分，它可能有多层的ETL加工过程；多层的数据存储。这一个层数据主要采用维度建模的方法，根据应用需求建立数据模型。数据采用列式存储的数据结构存储。数据经过加工处理完成后，数据将会同步到Oracle的集群中用做数据查询。
Yahoo用Oracle做查询环境，他们的大量采用了基于时间RANGE分区和HASH子分区的方式来提升查询响应性能（类似与Greenplum的方式）。数据采用了压缩技术，同时基于压缩和读取的方式上ORACLE官方为他们定制了一些改进，从而获取更好的读取IO和压缩能力。 MSTR报表工具连接ORALCE完成大部分报表查询功能，同时，如果要查询最明细的数据，工具会连接到HADOOP集群上，通过创建一些临时表来满足查询功能。同时，Yahoo的仓库配备了一个功能强大的元数据管理系统，他们的元数据是通过SQL解析，直接将ETL mapping的元数据解析进入元数据库，做到了字段级别的MAPPING。同时他们的PM会维护最新的业务元数据（业务规则，指标定义）进入的元数据库系统。

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

1 weeks ago By DDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Where to find the Crane Control Keycard in Atomfall

1 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7435

CakePHP Tutorial

1359

What is the format of the account name of steam

win11 activation key permanent

Related knowledge

Use ddrescue to recover data on Linux Mar 20, 2024 pm 01:37 PM

DDREASE is a tool for recovering data from file or block devices such as hard drives, SSDs, RAM disks, CDs, DVDs and USB storage devices. It copies data from one block device to another, leaving corrupted data blocks behind and moving only good data blocks. ddreasue is a powerful recovery tool that is fully automated as it does not require any interference during recovery operations. Additionally, thanks to the ddasue map file, it can be stopped and resumed at any time. Other key features of DDREASE are as follows: It does not overwrite recovered data but fills the gaps in case of iterative recovery. However, it can be truncated if the tool is instructed to do so explicitly. Recover data from multiple files or blocks to a single

Open source! Beyond ZoeDepth! DepthFM: Fast and accurate monocular depth estimation! Apr 03, 2024 pm 12:04 PM

0.What does this article do? We propose DepthFM: a versatile and fast state-of-the-art generative monocular depth estimation model. In addition to traditional depth estimation tasks, DepthFM also demonstrates state-of-the-art capabilities in downstream tasks such as depth inpainting. DepthFM is efficient and can synthesize depth maps within a few inference steps. Let’s read about this work together ~ 1. Paper information title: DepthFM: FastMonocularDepthEstimationwithFlowMatching Author: MingGui, JohannesS.Fischer, UlrichPrestel, PingchuanMa, Dmytr

Do you know how to change the branch direction in xmind? Mar 19, 2024 pm 11:40 PM

Hello everyone, today we are going to talk about the course of Xmind. First of all, let me give you some popular science. What is Xmind? XMind is a mind mapping software, which is mainly composed of themes, such as central theme, theme, sub-theme, parent theme, free theme, etc. You can use these themes to combine into different graphics, if you use themes to design directional graphics. What we are going to discuss today is the trick on how to change branch direction in Xmind. Next, I will explain it in detail, and I hope everyone can learn and discuss it together! The steps are as follows: 1. First, we need to double-click to open the latest version of XMind tool; then, we click [New Blank Image]. (As shown in the picture) 2. At this time, we display on the canvas

Google is ecstatic: JAX performance surpasses Pytorch and TensorFlow! It may become the fastest choice for GPU inference training Apr 01, 2024 pm 07:46 PM

The performance of JAX, promoted by Google, has surpassed that of Pytorch and TensorFlow in recent benchmark tests, ranking first in 7 indicators. And the test was not done on the TPU with the best JAX performance. Although among developers, Pytorch is still more popular than Tensorflow. But in the future, perhaps more large models will be trained and run based on the JAX platform. Models Recently, the Keras team benchmarked three backends (TensorFlow, JAX, PyTorch) with the native PyTorch implementation and Keras2 with TensorFlow. First, they select a set of mainstream

Slow Cellular Data Internet Speeds on iPhone: Fixes May 03, 2024 pm 09:01 PM

Facing lag, slow mobile data connection on iPhone? Typically, the strength of cellular internet on your phone depends on several factors such as region, cellular network type, roaming type, etc. There are some things you can do to get a faster, more reliable cellular Internet connection. Fix 1 – Force Restart iPhone Sometimes, force restarting your device just resets a lot of things, including the cellular connection. Step 1 – Just press the volume up key once and release. Next, press the Volume Down key and release it again. Step 2 – The next part of the process is to hold the button on the right side. Let the iPhone finish restarting. Enable cellular data and check network speed. Check again Fix 2 – Change data mode While 5G offers better network speeds, it works better when the signal is weaker

The vitality of super intelligence awakens! But with the arrival of self-updating AI, mothers no longer have to worry about data bottlenecks Apr 29, 2024 pm 06:55 PM

I cry to death. The world is madly building big models. The data on the Internet is not enough. It is not enough at all. The training model looks like "The Hunger Games", and AI researchers around the world are worrying about how to feed these data voracious eaters. This problem is particularly prominent in multi-modal tasks. At a time when nothing could be done, a start-up team from the Department of Renmin University of China used its own new model to become the first in China to make "model-generated data feed itself" a reality. Moreover, it is a two-pronged approach on the understanding side and the generation side. Both sides can generate high-quality, multi-modal new data and provide data feedback to the model itself. What is a model? Awaker 1.0, a large multi-modal model that just appeared on the Zhongguancun Forum. Who is the team? Sophon engine. Founded by Gao Yizhao, a doctoral student at Renmin University’s Hillhouse School of Artificial Intelligence.

The U.S. Air Force showcases its first AI fighter jet with high profile! The minister personally conducted the test drive without interfering during the whole process, and 100,000 lines of code were tested for 21 times. May 07, 2024 pm 05:00 PM

Recently, the military circle has been overwhelmed by the news: US military fighter jets can now complete fully automatic air combat using AI. Yes, just recently, the US military’s AI fighter jet was made public for the first time and the mystery was unveiled. The full name of this fighter is the Variable Stability Simulator Test Aircraft (VISTA). It was personally flown by the Secretary of the US Air Force to simulate a one-on-one air battle. On May 2, U.S. Air Force Secretary Frank Kendall took off in an X-62AVISTA at Edwards Air Force Base. Note that during the one-hour flight, all flight actions were completed autonomously by AI! Kendall said - "For the past few decades, we have been thinking about the unlimited potential of autonomous air-to-air combat, but it has always seemed out of reach." However now,

Alibaba 7B multi-modal document understanding large model wins new SOTA Apr 02, 2024 am 11:31 AM

New SOTA for multimodal document understanding capabilities! Alibaba's mPLUG team released the latest open source work mPLUG-DocOwl1.5, which proposed a series of solutions to address the four major challenges of high-resolution image text recognition, general document structure understanding, instruction following, and introduction of external knowledge. Without further ado, let’s look at the effects first. One-click recognition and conversion of charts with complex structures into Markdown format: Charts of different styles are available: More detailed text recognition and positioning can also be easily handled: Detailed explanations of document understanding can also be given: You know, "Document Understanding" is currently An important scenario for the implementation of large language models. There are many products on the market to assist document reading. Some of them mainly use OCR systems for text recognition and cooperate with LLM for text processing.

See all articles