Information Management: 2012

Thursday, November 1, 2012

Master Data Management (MDM) Landscape

yeah, yeah, reusing the same picture...it is MDM time again!
(and I don't mean Mobile Device Management)

MDM definition

"Master Data Management comprises a set of processes, governance, policies, standards and tools that consistently defines and manages master data which excludes transactional data but may include reference data." Wikipedia.

MDM trends for 2012-2013

Market Consolidation (see mega vendors below)
Tool Convergence (aka: multi-entity, multi-domain...yep called this 4 years ago)
BPM for MDM (more governance automation as expected)
Data Virtualization / MDM as a Service (have been writing about the MDM SOA intersection since 2007)

Data Quality as a Service

Address standardization / cleansing
Cleanse/Dedup/Merge across multiple Salesforce installations in the cloud

Data Profiling as a Service

IMDBs & Big Data (see two prior posts on in memory databases)
Forrester trends blog here: http://blogs.forrester.com/rob_karel/12-01-05-mdm_in_2012_what_was_what_will_be_and_what_wont_be

MDM mega vendors (and their associated product offerings)

IBM

InfoSphere MDM Server (formerly DWL and Trigo)
Initiate Master Data Service (formerly Initiate)

Informatica

MDM (formerly Siperian)

Oracle

Customer, Product, Supplier, and Site Hubs (Oracle and Siebel)
Data Relationship Management (formerly Hyperion)

SAP

NetWeaver MDM

Gartner Magic Quadrant leadership is very closely correlated to the mega vendors above and not much changed from 2011 to 2012. Only Informatica and IBM continue to show as leads in all three areas below:

2012 MDM (customer data solutions): IBM, Oracle, Informatica
2012 Data Integration: Informatica, IBM, SAP, Oracle, SAS
2012 Data Quality: Informatica, SAS, Trillium, IBM, SAP

2011 MDM (customer data solutions): IBM, Oracle, Informatica
2011 Data Integration: Informatica, IBM, SAP, Oracle, SAS
2011 Data Quality: SAS, Informatica, Trillium, IBM, SAP

According to Forrester, Informatica (Siperian) is the MDM leader followed by IBM (Initiate Systems), and per link below, the same two players are leading on data virtualization.

Without having a personal preference on a specific technology (have an IBM / Initiate Systems certification and experience with Siperian and Oracle PIM) it seems that Informatica is moving towards taking the gold here, and (somewhat) coincidentally it will be the product of choice for my next CDI POC to concentrate on hierarchy management, data stewardship and data governance. Below are a few more Informatica reference links supporting leadership claims:

Nice to see how far Informatica has come from the old ETL days over a decade ago...
John K

Wednesday, October 10, 2012

Starting with SAP HANA

SAP has made it quite easy to get started with SAP HANA. You can get a 30 day trial access to a hosted HANA test and evaluation environment here: http://scn.sap.com/docs/DOC-28191

At the same site, there is documentation, starting guides and a plethora of tutorial videos: http://scn.sap.com/community/developer-center/hana

Below are high level steps required to get started with a simple example in SAP HANA. First use a data source and Data Services to pull master and transaction data into SAP HANA. Then create a model and finally expose the model through explorer. Ok...seems too simple.

And here is the link for developers "The Road to HANA" with great information to get you going: http://scn.sap.com/docs/DOC-31723

Happy in-memory computing!
John K.

Tuesday, October 9, 2012

IMDBs (In Memory Databases) gaining popularity

As hardware and memory prices continue to decline, in memory databases (IMDB) are gaining traction for real-time analytics, applications, and platforms. There are good reasons for the accelerated adoption of IMDBs for big data, with the most obvious being performance (as in very fast delivery of large data, and on the fly aggregations of multi-dimensional data delivered quickly to the end user).

From an architecture perspective there are a few key differences, looking at HANA as an example, that power such enhanced performance.

Memory vs disk as primary storage (much faster performance)
Column and row store vs tables (no indices, materialized views, or cubes - aggregations on the fly)
Data compression
Parallelism (both on hardware and processing)

These changes are redefining the roles of data modelers, database developers and DBAs which are way more blurred in IMDB environments vs traditional databases.

Use Cases for IMDBs include: real-time analytics on operational databases, data warehousing, and predictive analysis on big data, as well as real-time applications and platforms (mobile, cloud)

Drawbacks for IMDBs include: Maturity of most product offerings, cost and availability of skills / expertise.

Enterprise IMDB Vendors and Products

SAP HANA (to be merged in the future with Sybase IQ offering). Is the leader in the enterprise space. I am involved with an SAP HANA POC/Pilot to be compared vs a traditional DW and fully expect the IMDB to easily outperform the traditional system. Noteworthy that SAP HANA comes as an appliance (option of leading HW vendors) and only with certified configurations by SAP.

Oracle TimesTen / Exadata X3

Oracle TimesTen site

IBM Solid DB (noteworthy that IBM is selling HW bundled with SAP HANA)

IBM Solid DB site

Microsoft (updated per announcement on Oct 24th 2012)

Microsoft HDInsight Server for Windows
HDInsight Server for Windows connects to Microsoft SQL Server, Microsoft Excel, PowerPivot (for high-scale in-memory data exploration), Power View (data visualizatio) and works with Microsoft's virtualization platform. Microsoft is the first to deliver Hadoop on virtualized infrastructure. Office 2013 Excel includes: PowerPivot and Power View.
http://www.informationweek.com/software/information-management/microsoft-releases-hadoop-on-windows/240009632

Opensource

Project Serengeti is an open-source effort that is coming right behind Microsoft from VMWare
MySQL, other smaller players here
Feedback welcome on opensource IMDB enterprise solutions you have had success with

Expecting the adoption of IMDBs as well as product maturity to continue to accelerate and the vendor battle to heat up, as large enterprise clients climb on-board. At this time everyone seems to be chasing SAP HANA.

Tuesday, January 31, 2012

Hadoop and MapReduce

Hadoop and MapReduce continue to pick up steam and generate "noise" in periodicals and online publications. As an Information Management Architect I get asked about the value of Hadoop and the impact to existing information and architecture roadmaps.

As such, a short entry to clarify the purpose of Hadoop, which I view as a complementary rather than a competing solution (assuming there is a majority of structured data to be managed) to addressing the need for highly distributed parallel cloud computing for unstructured and structured data in high volumes requiring fast performance. Thus the Yahoo, Facebook, and Google's of the world success stories and implementations.

Some of the drawbacks as of the day of this post include open-source components maturity & support (HDFS, HBASE, MapReduce, PIG, HIVE, AVRO, SQOOP, Chukwa, and Zookeeper), security, backup/recovery, integration to other systems, weak consistency, specialized resource skills/cost, and existing resource skills retraining.

Based on the expanding market, many of the enterprise Business Intelligence vendors (Informatica, SAP, etc..) are building plug-ins to ensure support of Hadoop in order to protect and expand their client base.