CS614 - Data Warehousing
         
    Question # 1 of 10 ( Start time: 10:29:52 PM ) Total Marks: 1
  Data mining uses _________ algorithms to  discover patterns and regularities in data.
  Select correct option:
  Mathematical
  Computational
  Statistical
  None of these
  
  Question # 2 of 10 ( Start time: 10:31:13 PM )  Total Marks: 1
  The goal of ___________ is to look at as few  blocks as possible to find the matching records(s).
  Select correct option:
  Indexing
  Partitioning
  Joining
  None of these
  
  Question # 3 of 10 ( Start time: 10:32:34 PM )  Total Marks: 1
  An optimized structure which is built primarily  for retrieval, with update being only a secondary consideration is
  Select correct option:
  OLTP
  OLAP
  DSS
  Inverted  Index
  
  Question # 4 of 10 ( Start time: 10:33:23 PM )  Total Marks: 1
  If every key in the data file is represented in  the index file then index is
  Select correct option:
  Dense  Index
  Sparse Index
  Inverted Index
  None of these
  
  Question # 5 of 10 ( Start time: 10:34:47 PM )  Total Marks: 1
  There are many variants of the traditional  nested-loop join. If the index is built as part of the query plan and  subsequently dropped, it is called
  Select correct option:
  Naive nested-loop join
  Index nested-loop join
  Temporary  index nested-loop join
  None of these
  
  Question # 6 of 10 ( Start time: 10:36:08 PM )  Total Marks: 1
  Data mining evolve as a mechanism to cater the  limitations of ________ systems to deal massive data sets with high  dimensionality, new data types, multiple heterogeneous data resources etc.
  Select correct option:
  OLTP
  OLAP
  DSS
  DWH
  
  Question # 7 of 10 ( Start time: 10:37:30 PM )  Total Marks: 1
  A dense index, if fits into memory, costs only  ______ disk I/O access to locate a record by given key.
  Select correct option:
  One
  Two
  Linear
  Quadratic
  
  Question # 8 of 10 ( Start time: 10:38:29 PM )  Total Marks: 1
  Data mining derives its name from the  similarities between searching for valuable business information in a large  database, for example, finding linked products in gigabytes of store scanner  data, and mining a mountain for a _________ of valuable ore.
  Select correct option:
  Furrow
  Streak
  Trough
  Vein
  
  Question # 9 of 10 ( Start time: 10:39:49 PM )  Total Marks: 1
  If 'M' rows from table-A match the conditions in  the query then table-B is accessed 'M' times. Suppose table-B has an index on  the join column. If 'a' I/Os are required to read the data block for each scan  plus 'b' I/Os for each data block then the total cost of accessing table-B is  _____________ logical I/Os approximately.
  Select correct option:
  (a + b)M
  (a - b)M
  (a + b + M)
  (a * b * M)
  
  Question # 10 of 10 ( Start time: 10:41:16 PM )  Total Marks: 1
  ________ is the technique in which existing  heterogeneous segments are reshuffled, relocated into homogeneous segments.
  Select correct option:
  Clustering
  Aggregation
  Segmentation
  Partitioning
     
     
    The goal of ideal parallel execution is to completely  parallelize those parts of a computation that are not constrained by data  dependencies. The ______ the portion of the program that must be executed  sequentially, the greater the scalability of the computation
    Larger
    Smaller
    Unambiguous
    Superior
     
    _______________, if fits into memory, costs only one disk  I/O access to locate a record by given key.
    An Inverted Index
    A Sparse Index
    A Dense Index
    None of these
     
    If someone told you that he had a good model to predict  customer usage, the first thing you might try would be to ask him to apply his  model to your customer _______, where you already knew the answer.
    Base
    Drive 
    File 
    Log 
     
    The automated, prospective analyses offered by data mining  move beyond the analyses of past events provided by _____________ tools typical  of decision support systems.
    Introspective
    Intuitive
    Reminiscent
    Retrospective
     
    If every key in the data file is represented in the index  file then index is
    Dense Index    
    Sparse Index
    Inverted Index
    None of these
     
    A dense index, if fits into memory, costs only ______ disk  I/O access to locate a record by given key.
    One 
    Two
    Linear
    Quadratic
     
    With data mining, the best way to accomplish this is by  setting aside some of your data in a vault to isolate it from the mining  process; once the mining is complete, the results can be tested against the  isolated data to confirm the model's _______.
    Validity            
    Security
    Integrity
    None of these
     
    Data mining uses _________ algorithms to discover patterns  and regularities in data.
    Mathematical
    Computational
    Statistical
    None of these
     
    The goal of ___________ is to look at as few blocks as  possible to find the matching records(s).
    Indexing
    Partitioning
    Joining
    None of these
     
    _______________, if too big and does not fit into memory,  will be expensive when used to find a record by given key.
    An Inverted Index
    A Sparse Index
    A Dense Index
    None of these
     
     
    There are many  variants of the traditional nested-loop join. If the index is built as part of  the query plan and subsequently dropped, it is called
    Naive nested-loop  join
    Index nested-loop  join
    Temporary index nested-loop join
    None of these
     
    _______________,  if fits into memory, costs only one disk I/O access to locate a record by given  key.
    An Inverted Index
    A Sparse Index
    A Dense Index
    None of these
     
     
    If 'M' rows from  table-A match the conditions in the query then table-B is accessed 'M' times.  Suppose table-B has an index on the join column. If 'a' I/Os are required to  read the data block for each scan plus 'b' I/Os for each data block then the  total cost of accessing table-B is _____________ logical I/Os approximately.
    (a + b)M
    (a - b)M
    (a + b + M)
    (a * b * M)
     
     
    With data mining,  the best way to accomplish this is by setting aside some of your data in a  ________ to isolate it from the mining process; once the mining is complete,  the results can be tested against the isolated data to confirm the model's  validity.
    Cell
    Disk
    Folder
    Vault
     
    The goal of ideal  parallel execution is to completely parallelize those parts of a computation  that are not constrained by data dependencies. The smaller the portion of the  program that must be executed __________, the greater the scalability of the  computation.
    In Parallel
    Distributed
    Sequentially
    None of these
     
     
    Data mining is  a/an __________ approach, where browsing through data using data mining  techniques may reveal something that might be of interest to the user as  information that was unknown previously.
    Non-Exploratory
    Exploratory
    Compute Science
    none of these
     
    Data  mining evolve as mechanism to cater the limitations of _____ systems to  deal massive data sets with high dimensionality , new data types, multiple  heterogeneous data resources etc..
  OLTP
    OLAP
    DSS
    DWH
    To identify the  __________________ required we need to perform data profiling
  Degree of Transformation
  Complexity
  Cost
  Time
    
  Execution can be completed successfully or it may be stopped due to some error.  If some error occurs, execution will be terminated abnormally and all  transactions will be ___________
  Committed to the database
  Rolled back
  
  
    Companies collect  and record their own operational data, but at the same time they also  use reference data obtained from _______ sources such as codes, prices etc.
  Operational
  None of these
  Internal
  External
     
    
  Ad-hoc access means to run such queries which are known already.
  True
  False
     
    
  ____________ in agriculture  extension is that pest population beyond which the benefit of spraying  outweighs its cost.
  Profit Threshold Level
  Economic Threshold Level
  Medicine Threshold Level
  None of these
    
  People that design and build the data warehouse must be capable of  working across the organization at all levels
  True
  False
  
  
    The _________ is  only a small part in realizing the true business value buried within  the mountain of data collected and stored within organizations business systems  and operational databases.
  Independence on  technology
  Dependence on technology
  None of these
  
  
    Many  data warehouse project teams waste enormous amounts of time searching  in vain for a ___________________.
  Silver Bullet
  Golden Bullet
  Suitable Hardware
  Compatible Product
     
    Multidimensional  databases typically use proprietary __________ format to store pre-summarized  cube structures.
  File
  Application
  Aggregate
  Database
  
  
    A dense index, if  fits into memory, costs only ______ disk I/O access to locate a record by given  key.
  One
  Two 
  lg (n)
  n
  
  
    All data is  ______________ of something real.
  I An Abstraction
  II A Representation
  Which of the following option is true?
  I Only
  II Only
  Both I & II
  None of I & II
  
  
    The key idea  behind ___________ is to take a big task and break it into subtasks that can be  processed concurrently on a stream of data inputs in multiple, overlapping  stages of execution.
  Pipeline Parallelism
  Overlapped Parallelism
  Massive Parallelism
  Distributed Parallelism
  
  
    Non uniform  distribution, when the data is distributed across the processors, is called  ______.
  Skew in Partition
  Pipeline Distribution
  Distributed Distribution
  Uncontrolled Distribution
  
  
    The goal of ideal  parallel execution is to completely parallelize those parts of a computation  that are not constrained by data dependencies. The smaller the portion of the  program that must be executed __________, the greater the scalability of the  computation.
  None of these
  Sequentially
  In Parallel
  Distributed
     
    Data mining is  a/an __________ approach, where browsing through data using data mining  techniques may reveal something that might be of interest to the user as  information that was unknown previously.
  Exploratory
  Non-Exploratory
  Computer Science
  
  
    Data mining  evolve as a mechanism to cater the limitations of ________ systems  to dealmassive data sets with high dimensionality, new data types,  multiple heterogeneous data resources etc.
  OLTP
  OLAP
  DSS
  DWH 
  
  
    ________ is the  technique in which existing heterogeneous segments are reshuffled, relocated  into homogeneous segments.
  Clustering
  Aggregation
  Segmentation
  Partitioning
  
  
    To measure or  quantify the similarity or dissimilarity, different techniques are available.  Which of the following option represent the name of available techniques?
  Pearson correlation is the only technique
  Euclidean distance is the only technique
  Both Pearson correlation  and Euclidean distance
  None of these
     
    For a DWH  project, the key requirement are ________ and product experience.
  Tools
  Industry
  Software
  None of these
  
  
    Pipeline parallelism focuses  on increasing throughput of task execution, NOT on __________ sub-task  execution time.
  Increasing
  Decreasing
  Maintaining
  None of these
  
  
    Focusing  on data warehouse delivery only often end up _________.
  Rebuilding
  Success
  Good Stable Product
  None of these
  
  
    Pakistan is one of the five major ________  countries in the world.
  Cotton-growing
  Rice-growing
  Weapon Producing
  
  
    _____________ is  a process which involves gathering of information about column through  execution of certain queries with intention to identify erroneous records.
  Data profiling
  Data Anomaly Detection
  Record Duplicate Detection
  None of these
  
  
    Relational  databases allow you to navigate the data in ____________ that is  appropriate using the primary, foreign key structure within the data model.
  Only One Direction
  Any Direction
  Two Direction
  None of these
  
  
    DSS queries do  not involve a primary key
  True
  False
  
  
    __________________  contributes to an under-utilization of valuable and expensive historical data,  and inevitably results in a limited capability to provide decision support and  analysis.
  The lack of data  integration and standardization
  Missing Data
  Data Stored in Heterogeneous Sources
  
  
     
     
    DTS allows us to  connect through any data source or destination that is supported by  ____________
  OLE DB
  OLAP
  OLTP
  Data Warehouse
  
  
    Data  Transformation Services (DTS) provide a set of _____ that lets you extract,  transform, and consolidate data from disparate sources into single or multipledestinations supported  by DTS connectivity.
  Tools
  Documentations
  Guidelines
  
  
    If some error  occurs, execution will be terminated abnormally and all transactions will be  rolled back. In this case when we will access the database we will find it in  the state that was before the ____________.
  Execution of package
  Creation of package
  Connection of package
  
  
    To judge  effectiveness we perform data profiling twice. 
  One before Extraction and the other after Extraction
  One before Transformation  and the other after Transformation
  One before Loading and the other after Loading
  
  
    The need to  synchronize data upon update is called
  Data Manipulation
  Data Replication
  Data Coherency
  Data Imitation
  
  
    Taken jointly,  the extract programs or naturally evolving systems formed a spider web, also  known as
  Distributed Systems Architecture
  Legacy Systems  Architecture
  Online Systems Architecture
  Intranet Systems Architecture
     
    Node of a B-Tree  is stored in memory block and traversing a B-Tree involves ______ page faults.
  O (n)
  O (n2)
  O (n lg n)
  O (lg n)
  Which statement is true for De-Normalization?
  Redundant data is a performance liability at query time, but is a performance  benefit at update time.
  Redundant data is a performance benefit at both query time and update time.
  Redundant data is a performance liability at both query time and update time.
  Redundant data is a  performance benefit at query time, but is a performance liability at update  time.
  
  
    It is observed  that every year the amount of data recorded in an organization is
    Doubles   
    Triples
    Quartiles
    Remains same as  previous year
     
    Pre-computed  _______ can solve performance problems
    Aggregates   
    Facts
    Dimensions
     
    The degree of  similarity between two records, often measured by a numerical value between  _______, usually depends on application characteristics.
    0 and 1   
    0 and 10
    0 and 100
    0 and 99
  
  
    The purpose of  the House of Quality technique is to reduce ______ types of risk.
    Two   
    Three
    Four
    All
  
  
    NUMA stands for  __________
    Non-uniform Memory Access
    Non-updateable  Memory Architecture
    New Universal  Memory Architecture
     
    There are many  variants of the traditional nested-loop join. If the index is built as part of  the query plan and subsequently dropped, it is called
    Naive nested-loop  join
    Index nested-loop  join
    Temporary index nested-loop join  
    None of these
    The Kimball s  iterative data warehouse development approach drew on decades of experience to  develop the _____________.
    Business Dimensional Lifecycle 
    Data Warehouse  Dimension
    Business  Definition Lifecycle
    OLAP Dimension
     
    For a smooth DWH  implementation we must be a technologist.
    True
    False   
  
  
    During the  application specification activity, we also must give consideration to the  organization of the applications.
    True   
    False
     
    The most recent  attack is the ________ attack on the cotton crop during 2003- 04, resulting in  a loss of nearly 0.5 million bales.
    Boll Worm   
    Purple Worm
    Blue Worm
    Cotton Worm
     
    The users of data  warehouse are knowledge workers in other words they are_________ in the  organization.
    Decision maker 
    Manager
    Database Administrator
    DWH Analyst
     
    _________ breaks  a table into multiple tables based upon common column values.
    Horizontal splitting  
    Vertical  splitting
     
    As apposed to the  out come of classification , estimation deal with ____________ valued 
    outcome. 
    Discrete 
    Isolated 
    Continuous  
    Distinct 
     
     
     
     
    The goal of  ______is to look at as few block as possible to find the matching records.  Indexing 
    Partitioning 
    Joining 
    none of these 
    nested loop join 
    none of these 
     
    The technique  that is used to perform these feats in data mining modeling, and this act of  model building is something that people have been doing for long time,  certainly before the _______ of computers or data mining technology.
    Access Advent 
    Ascent Avowal 
     
    A data  warehouse may include
  Legacy systems
  Only internal data sources
  Privacy restrictions
  Small data mart
  
  
    De-Normalization  normally speeds up 
  Data Retrieval
  Data Modification
  Development Cycle
  Data Replication
  
  
    In horizontal  splitting, we split a relation into multiple tables on the basis  of 
  Common Column Values
  Common Row Values
  Different Index Values
  Value resulted by ad-hoc query
     
    For a given data  set, to get a global view in un-supervised learning we use
  One-way Clustering
  Bi-clustering
  Pearson correlation
  Euclidean distance
  
  
    In DWH project,  it is assured that ___________ environment is similar to the production  environment.
  Designing
  Development
  Analysis
  Implementation
  
  
    For good decision  making, data should be integrated across the organization to cross the LoB  (Line of Business). This is to give the total view of organization from:
  Owner's Perspective
  Customer's Perspective
  Decision Maker's Perspective
  Employee's Perspective
  
  
    Which is the  least appropriate join operation for Pipeline parallelism?
    Hash Join
    Inner Join
    Outer Join
    Sort-Merge Join
     
    Data mining  derives its name from the similarities between searching for valuable business  information in a large database, for example, finding linked products in  gigabytes of store scanner data, and mining a mountain for a _________ of  valuable ore.
    Furrow
    Streak
    Trough
    Vein
     
    With data mining,  the best way to accomplish this is by setting aside some of your data in a  ________ to isolate it from the mining process; once the mining is complete,  the results can be tested against the isolated data to confirm the model's  validity.
    Cell
    Disk
    Folder
    Vault
  
  
    We must try to  find the one access tool that will handle all the needs of their users.
    True
    False
  
  
    Investing years  in architecture and forgetting the primary purpose of solving business  problems, results in inefficient application. This is the example of _________  mistake.
    Extreme  Technology Design
    Extreme Architecture Design
     
    The automated,  prospective analyses offered by data mining move beyond the analysis of past 
    events provided  by respective tools typical of ___________. 
    OLTP 
    OLAP 
    Decision Support  systems
    None of these 
    There are many  variants of the traditional nested-loop join, if there is an index is  exploited, then it is called…… 
    Naïve nested loop  join index 
    Nested loop join temporary index 
    Index nested-loop  joins
     
    
  A data warehouse implementation without an OLAP tool is always possible.
    True
    False
     
     
    _____modeling  technique is more appropriate for data warehouses.
    entity-relationship
    dimensional
    physical
    None of the given
     
     
    The performance in a MOLAP cube comes from the O(1) look-up time for  the array data structure. 
     
    True
    False
     
     
    Multi-dimensional databases (MDDs) typically use ___________ formats  to store pre-summarized cube structures. 
     
    SQL
    proprietary file
    Object oriented
    Non- proprietary  file
     
    Slice and Dice is  changing the view of the data.
    True
    False
     
     
     
    Data warehousing and on-line analytical processing (OLAP) are _______  elements of decision support system. 
     
    Unusual
    Essential
    Optional
    None of the given
     
     
    Virtual cube is used to query two similar cubes by creating a third  "virtual" cube by a join between two cubes. 
     
    True
    False
     
    Analytical processing uses ____________ , instead of record level  access. 
    multi-level aggregates
    Single-level aggregates
    Single-level hierarchy
    None of the Given
     
     
    The divide&conquer cube partitioning approach helps alleviate the  ____________ limitations of MOLAP implementation. 
    Flexibility
    Maintainability
    Security
    Scalability
     
     
    In a traditional MIS system, there is an almost linear sequence of  queries. 
    True
    False
     
     
    Data Warehouse provides the best support for analysis while OLAP  carries out the _________ task. 
    Mandatory
    Whole
    Analysis
    Prediction
     
     
    DOLAP allows  download of "cube" structures to a desktop platform with the need for shared  relational or cube server. 
     
    True 
    False
     
     
    The STAR schema  used for data design is a __________ consisting of fact and dimension tables.
    Select correct  option:
    Network model
    Relational model
    Hierarchical data  model
    None of the given
     
    Data Warehouse  provides the best support for analysis while OLAP carries out the _________  task.
    Select correct  option:
    Mandatory
    Whole
    Analysis
    Prediction
     
     
    Virtual cube is  used to query two similar cubes by creating a third "virtual" cube by a join  between two cubes.
    Select correct  option:
     True
     False
     
    Data warehousing  and on-line analytical processing (OLAP) are _______ elements of decision  support system.
    Select correct  option:
    Unusual
    Essential
    Optional
    None of the given
-- 
Zindagi mein 2 Logo ka buhat khayal rahkoooo 
Ist woh jiss ney tumhari jeet ke  Liye buhat kuch hara hoo 
  (Father)
2nd woh jiss ko tum ney har dukh me  pukaara hoo (Mother)
  
Regards, 
Umair Saulat Mc100403250
           -- 
  We say, "Be one as Pakistani Nation and grow up for Pakistan's Future". Wish you all the best. Join www.vuaskari.com, 
  To post to this group, send email to vuaskari_com@googlegroups.com
  Visit these groups:
  This (Main) Group:
http://groups.google.com/group/vuaskari_com?hl=en?hl=en  MIT/MCS Group: 
http://groups.google.com/group/vu_askarimit?hl=en?hl=en  HRM Group: 
http://groups.google.com/group/askari_hrm?hl=en?hl=en  Banking Group: 
http://groups.google.com/group/askari_banking?hl=en?hl=en  Management: 
https://groups.google.com/group/vuaskari_mgt?hl=en  Marketing: 
https://groups.google.com/group/vuaskari_mkt?hl=en  MIS Group: 
http://groups.google.com/group/askari_mis?hl=en