Thursday, November 1, 2012

Master Data Management (MDM) Landscape

yeah, yeah, reusing the same picture...it is MDM time again!
(and I don't mean Mobile Device Management)

MDM definition 

"Master Data Management comprises a set of processes, governance, policies, standards and tools that consistently defines and manages master data which excludes transactional data but may include reference data." Wikipedia.


MDM trends for 2012-2013
  • Market Consolidation (see mega vendors below)
  • Tool Convergence (aka: multi-entity, multi-domain...yep called this 4 years ago)
  • BPM for MDM (more governance automation as expected)
  • Data Virtualization / MDM as a Service (have been writing about the MDM SOA intersection since 2007)
    • Data Quality as a Service
      • Address standardization / cleansing 
      • Cleanse/Dedup/Merge across multiple Salesforce installations in the cloud
    • Data Profiling as a Service
  • IMDBs & Big Data (see two prior posts on in memory databases)
  • Forrester trends blog here: http://blogs.forrester.com/rob_karel/12-01-05-mdm_in_2012_what_was_what_will_be_and_what_wont_be 


MDM mega vendors  (and their associated product offerings)
  • IBM 
    • InfoSphere MDM Server (formerly DWL and Trigo)
    • Initiate Master Data Service (formerly Initiate)
  • Informatica 
    • MDM (formerly Siperian)
  • Oracle
    • Customer, Product, Supplier, and Site Hubs (Oracle and Siebel)
    • Data Relationship Management (formerly Hyperion)
  • SAP
    • NetWeaver MDM

Gartner Magic Quadrant leadership is very closely correlated to the mega vendors above and not much changed from 2011 to 2012. Only Informatica and IBM continue to show as leads in all three areas below: 
  • 2012 MDM (customer data solutions): IBM, Oracle, Informatica
  • 2012 Data Integration: Informatica, IBM, SAP, Oracle, SAS
  • 2012 Data Quality: Informatica, SAS, Trillium, IBM, SAP
  • 2011 MDM (customer data solutions): IBM, Oracle, Informatica
  • 2011 Data Integration: Informatica, IBM, SAP, Oracle, SAS
  • 2011 Data Quality: SAS, Informatica, Trillium, IBM, SAP

According to Forrester, Informatica (Siperian) is the MDM leader followed by IBM (Initiate Systems), and per link below, the same two players are leading on data virtualization.




Without having a personal preference on a specific technology (have an IBM / Initiate Systems certification  and experience with Siperian and Oracle PIMit seems that Informatica is moving towards taking the gold here, and (somewhat) coincidentally it will be the product of choice for my next CDI POC to concentrate on hierarchy management, data stewardship and data governance.  Below are a few more Informatica reference links supporting leadership claims:

Nice to see how far Informatica has come from the old ETL days over a decade ago...
John K

Wednesday, October 10, 2012

Starting with SAP HANA



SAP has made it quite easy to get started with SAP HANA. You can get a 30 day trial access to a hosted HANA test and evaluation environment here: http://scn.sap.com/docs/DOC-28191 

At the same site, there is documentation, starting guides and a plethora of tutorial videos: http://scn.sap.com/community/developer-center/hana

Below are high level steps required to get started with a simple example in SAP HANA. First use a data source and Data Services to pull master and transaction data into SAP HANA.  Then create a model and finally expose the model through explorer.  Ok...seems too simple.


And here is the link for developers "The Road to HANA" with great information to get you going: http://scn.sap.com/docs/DOC-31723

Happy in-memory computing!
John K. 

Tuesday, October 9, 2012

IMDBs (In Memory Databases) gaining popularity



As hardware and memory prices continue to decline, in memory databases (IMDB) are gaining traction for real-time analytics, applications, and platforms. There are good reasons for the accelerated adoption of IMDBs for big data, with the most obvious being performance (as in very fast delivery of large data, and on the fly aggregations of multi-dimensional data delivered quickly to the end user).

From an architecture perspective there are a few key differences, looking at HANA as an example, that power such enhanced performance. 
  • Memory vs disk as primary storage (much faster performance)
  • Column and row store vs tables (no indices, materialized views, or cubes - aggregations on the fly)
  • Data compression
  • Parallelism (both on hardware and processing)
These changes are redefining the roles of data modelers, database developers and DBAs which are way more blurred in IMDB environments vs traditional databases.

Use Cases for IMDBs include: real-time analytics on operational databases, data warehousing, and predictive analysis on big data, as well as real-time applications and platforms (mobile, cloud)

Drawbacks for IMDBs include: Maturity of most product offerings, cost and availability of skills / expertise.

Enterprise IMDB Vendors and Products


  • Microsoft (updated per announcement on Oct 24th 2012)
  • Opensource
    • Project Serengeti is an open-source effort that is coming right behind Microsoft from VMWare
    • MySQL, other smaller players here
    • Feedback welcome on opensource IMDB enterprise solutions you have had success with

Expecting the adoption of IMDBs as well as product maturity to continue to accelerate and the vendor battle to heat up, as large enterprise clients climb on-board.  At this time everyone seems to be chasing SAP HANA.

Other Links


Tuesday, January 31, 2012

Hadoop and MapReduce

Hadoop and MapReduce continue to pick up steam and generate "noise" in periodicals and online publications. As an Information Management Architect I get asked about the value of Hadoop and the impact to existing information and architecture roadmaps.

As such, a short entry to clarify the purpose of Hadoop, which I view as a complementary rather than a competing solution (assuming there is a majority of structured data to be managed) to addressing the need for highly distributed parallel cloud computing for unstructured and structured data in high volumes requiring fast performance. Thus the Yahoo, Facebook, and Google's of the world success stories and implementations.

Some of the drawbacks as of the day of this post include open-source components maturity & support (HDFS, HBASE, MapReduce, PIG, HIVE, AVRO, SQOOP, Chukwa, and Zookeeper), security, backup/recovery, integration to other systems, weak consistency, specialized resource skills/cost, and existing resource skills retraining.

Based on the expanding market, many of the enterprise Business Intelligence vendors (Informatica, SAP, etc..) are building plug-ins to ensure support of Hadoop in order to protect and expand their client base.