Saturday, September 1, 2007

Miniature data warehousing is now possible

.FLYINGHEAD TECHNOLOGY ANALYSIS
.TITLE Miniature data warehousing is now possible
.AUTHOR Harikrishna S. Aravapalli
.SUMMARY Data warehousing is always considered the province of big hardware. But in this fascinating article, Harikrishna S. Aravapalli explains how very small devices, even iPods and smartphones, can provide miniature data warehouses wherever you go.
.OTHER
The moment you hear the phrase "data warehouse", often the first thought or first visualization that comes to mind is that of a huge server infrastructure, with thousands of gigabytes (or even terabytes) of data being churned in, processed, and churned out. This also creates an illusion that unless the business intelligence reporting systems do not connect to these servers and also to the tons of data that they hold, we cannot derive any meaningful operational and analytical data from them.

However, with the rising maturity of virtualization software, the easy availability of the high-capacity mobile storage devices and feature rich mobile phones, it is possible — with a little innovation — to create personalized business intelligence and data warehouse systems, thus leading to what I call the "miniaturization of the data warehouse systems". This article discusses how that might happen.

.TEASER Tap here to read the full article.

.H1 Typical characteristics of Large Data warehouses
Let’s briefly discuss the typical characteristics of a large data warehouse (DW). A large DW accesses data from multiple source systems. A large DW has an intermediate staging area. It has an ODS (operational data store) to hold short term data. It has a large database which holds a significantly large amount of historical data and also live data. There are ETL (extract, transform, and load) tools to move the data from one stage to the next stage. Finally there are business intelligence tools and technologies which connect to these large DW systems and present the data in the form of reports to business users.

.H1 Maturity of Mobile storage devices and virtualization technologies
Storage devices have matured enough to the point of holding a few gigabytes of data in small mobile storage devices like USB flash drives and even up to the terabyte level in external in USB hard drives. Even smaller USB hard drives (like you’ll find in an iPod) can hold 160GB or more.

To further boost the potential usage of these small, mobile storage devices, there is OS virtualization software which can run on these mobile storage devices. These OS virtualized mobile storage devices can then be used to connect to any host PC to utilize the hardware resources, thus converting the OS virtualized mobile storage devices into mobile PCs which can run in a plug-n-play model. There are already vendors which can provide these kinds of systems.

.H1 Concept connectivity for creating a miniature data warehouse
Now that we have mobile storage devices of reasonably large capacities (greater than 100GB) coupled with the fact that there is virtualization software readily available, the key question to answer is how can we create a "miniature data warehouse" that is mobile, personalised and yet has all the features of a regular business intelligence and DW applications.

The components that go into the making of a miniature data warehouse are:

.BEGIN_LIST
.BULLET Mobile storage device (preferable with a USB port/interface)
.BULLET OS virtualization software for the mobile storage device
.BULLET Database management system to be installed and run in the virtualized OS environment
.BULLET Business intelligence tools and technologies to be installed and run in the virtualized OS environment
.BULLET Host computer with a USB port/interface or a high-end mobile phone with a USB port/interface
.END_LIST

Shown below in Figure A is the concept connectivity between the various components of a miniaturized data warehouse system.

.FIGPAIR A Here’s how all the parts could work together.

Let’s briefly discuss the various components in the above mentioned concept connectivity for a miniaturized data warehouse system. There are three sub-groups of components in this miniaturized data warehouse system:

.BEGIN_LIST
.BULLET The mobile storage device which mainly hosts the miniature data warehouse.
.BULLET The host PC or high-end mobile phone which contains the hardware resources that are used by the mobile storage device for processing and display.
.BULLET A USB interface or USB port, the interface that is used to communicate with the other devices.
.END_LIST

The mobile storage device can be any mobile storage device like USB flash drive, USB hard drive, or even an iPod or high-end mobile phone with memory storage capacities. On it, you’ll want the following elements

.H2 OS virtualization layer
This is the virtual OS which runs on the mobile storage devices. These are readily available in the market, which mainly run Windows compatible applications. This virtual OS can create a virtual workspace that can sit SxS (side by side) with the host workspace in an isolated mode.

.H2 DBMS
This can be a lightweight database capable of handling up to 100GB of data (or more). This DBMS will have to be installed in the the virtualized OS environment and will host the post-processed data warehouse data that is personalized for a specific user.

.H2 Business intelligence applications
Business intelligence applications will have to be installed in the same virtualized OS environment as the DBMS. The business intelligence applications will host the business intelligence reports, which connect to the DBMS in the same virtualized environment.

.H2 Storage area
The DBMS and business intelligence applications will use this storage area on the mobile storage device, for storing data, files, configuration details, registry details, and more.

.H2 Virtualized mobile storage device
The combination of the mobile storage device mentioned above, the virtualized OS, and the storage area, will together provide a virtualized PC environment and the device is now transformed into a virtualized mobile storage device.

The core of the miniature data warehouse system resides in this virtualized PC environment and it is this virtualized PC environment that primarily makes the miniaturized data warehouse highly mobile and hardware agnostic.

Now, let’s take a look at what we need in the host PC or a high-end mobile phone-like device.

.H2 Host OS
This the OS that originally resides on the host PC or the high-end mobile phone. the virtualized OS interacts with this host OS to access the hardware resources of the host PC or the high-end mobile phone.

.H2 Processing unit
This unit consists of CPU, RAM, IO and other related hardware needed for the processing the requests from the virtualized mobile storage device.

.H2 Display Unit
This is the display screen or monitor which is used by the virtualized mobile storage device to display the responses to the requests from the miniaturized data warehouse which is hosted in the virtualized PC environment.

.H2 USB
The mobile storage device and the host PC or high-end mobile phone communicate with each other via a high-speed USB 2.0 port or USB interface.

It should be noted that USB is not strictly needed. Rather, there’s a need for some way of connecting the high-capacity storage to the processing environment. For example, this could well take the form of a Compact Flash card or SD card within, say, a Palm Treo smartphone.

.H1 Benefits of a Miniature Data warehouse
A miniature data warehouse can be used to bring about personalization in the business intelligence and data warehouse reporting space. The personalized and miniaturized data warehouses can be hardware agnostic and hence can be run from any PC at any location (with minimal constraints), thus freeing the dependencies on the connectivity related infrastructure.

The miniaturized data warehouse concept can herald the rise of personalized business intelligence tools, personalized miniature databases, thus creating a separate miniaturized data warehouse ecosystem. this can eventually lead to creation of a host of analytical hand-held business intelligence gadgets, for use by both consumers and enterprises alike.

.H1 Constraints of a miniature data warehouse
While there are no technology constraints with respect to high-capacity mobile storage devices and virtualized OS implementations for these storage devices, there can be some constaints with respect to the compatibility between the virtualized mobile storage devices and the host PC or high-end mobile phones.

.BEGIN_KEEP
There can also be some constaints with respect to using the current DBMSs like MySQL (and others) in this virtualized PC environment. Similarly there can be some constraits with respect to using the current business intelligence applications in virtualized PC environments. However most of the software constraints can be overcome by the respective vendors of the tools and technologies.

.H1 Summary
While there has been much emphasis on making data warehousing systems larger and larger, the same technologies can be used in combination with the mature mobile storage devices and OS virtualization technologies to take the data warehousing systems to the other end of the spectrum — miniaturization of data warehousing systems.

This article gives the concept connectivity between the various components that together form a miniaturized data warehousing system. These miniaturized data warehouses can be used for multiple applications like personalization, analytical hand-held business intelligence gadgets, mobile data access, and PC agnostic business intelligence in a virtualized PC environment. Together, I predict these will bring a paradigm shift in the traditional definitions of data warehousing systems, thus creating a separate miniaturized data warehouse ecosystem that is self-sustained and focused.

.BIO Harikrishna S. Aravapalli is a Senior Technical Architect at Infosys SETLabs and has 13 years of experience in databases, data warehouses, and business intelligence technologies. He worked for Wipro and Accenture prior to Infosys. He may be reached at harikrishna_sa@infosys.com.
.END_KEEP