Best Practices for Loading in an Oracle Data Warehouse

作者: Maclean Liu , post on November 13th, 2010 , English Version
【本站文章除注明转载外,均为本站原创编译】
转载请注明:文章转载自: Oracle Clinic – Maclean Liu的个人技术博客 [http://www.oracledatabase12g.com/]
本文标题: Best Practices for Loading in an Oracle Data Warehouse
本文永久地址: http://www.oracledatabase12g.com/archives/best-practice-for-loading-in-oracle-data-warehouse.html

Agenda

  • The Three Layers of ETL
  • Think about Hardware
    • Database
    • Staging area
  • Use and Understand the Database features
    • SQL*Loader or External Tables
    • Direct path
    • Parallelism
    • Partition exchange load
    • Compression
  • Q&A

In this session will not cover ETL tools. The main focus will be on the database
And how you can do use a key set of features to ensure your data loads happen
As quickly and efficiently as possible. I will also briefly touch on some hardware
Considerations you should keep in mind when designing your loading strategy.

We have talked about having a balanced configuration for the
Database but there is another key hardware configuration
That will greatly impact the load performance and that is the staging area. In every
Tuning engagement I did while part of the Real World Performance group
I invariably found that the performance of the data load was in fact throttled
Or constricted by the throughput of the staging area. Given the fact the data
Simply lands in the staging a couple of minutes before its loaded tends to make
People feel its not important. BUT it is actually critically import to load performance
So basic guidelines are
Spread the staging area across as many physical spindles as you can. Just with any non Exadata storage
Each physical disk will only be able to read ~20 MB/s. Stuff all your flat files on a single physical disk and its
Maximum throughtput will by the throughput for the entire load.
Use as much network bandwidth as you can. Don’t forget the HBA to CPU ratio you learnt for the database
The same should be applied to staging. In the DBM you have spare IB connections seriously consider using them
For your load.
Internal files systems on the database node are a bad idea for a lot of reason,
Few physical disks
Not visible across all nodes
In the case of the DBM the final bullet point is not feasible but that is not ture of a custom installation.

Oracle offer a varitey of data loading options depending on
The source system. If the data is coming from an other Oracle
Environment both Data Pump or Transportable tablespaces are
Good options, as both allow you to move large amount of data
Quickly and in parallel.

I would avoid using Dblinks as they can’t do parallel query.

However, you are much more likely to see flat files being used
As every other database vender (other tha Oracle)offers a fast unload
Feature that puts data into tha flat file. For the rest of this session I
Will be assuming we are loading from falt files.

If you are loading from flat files, you have two options with
Oracle SQL*Loader and External tables

© 2010, www.oracledatabase12g.com. 版权所有.文章允许转载,但必须以链接方式注明源地址,否则追究法律责任.

相关文章 | Related posts:

  1. Designing a 300TB Data Warehouse with ASM
  2. HP-UX: Dynamic (runtime) Loading of Shared Libraries
  3. How to Make Use of Oracle's Data Compression with Materialized Views.
  4. Expert Oracle Practices Oracle Database Administration from the Oak Table PDF
  5. Data gathering for troubleshooting Oracle Real Application Cluster issues
  6. Oracle Database 11g Unstructured Data
  7. 【书籍推荐】Expert Oracle Practices
  8. Oracle 11gR1 Streams Practices and Solutions
  9. Oracle Data Integrator 11g Product Overview and New Features
  10. Oracle Real Application Clusters Installation and Configuration Best Practices

Leave a Reply

  

  

  

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>