As with data warehousing, web stores or any IT platform, an infrastructure for big data has unique requirements. In considering all the components of a big data platform, it is important to remember that the end goal is to easily integrate your big data with your enterprise data to allow you to conduct deep analytics on the combined data set.
Analytics and Discovery is an evolving market for Big Data. We are getting new business demands from customers there is a lot of re-marketing, re-messaging, re-positioning of companies to make big data today sound different from yesterday's data analytics (smaller, structured data, multi-dimensional spreadsheets).
Along with internal unstructured and semi-structured content many companies are starting to investigate external web data sources like social media to provide new insight to the business. Again these sources can be analyzed on Hadoop clusters and the results and/or raw data piped back into In-Memory appliance or fed to a data warehouse to be combined with structured data.
Big Data Reference Architecture
Most Big Data projects use variations of a Big Data reference architecture. Understanding the high level view of this reference architecture provides a good background for understanding Big Data and how it complements existing analytics, BI, databases and systems. This architecture is not a fixed, one-size-fits-all approach. Each component of the architecture has at least several alternatives with its own advantages and disadvantages for a particular workload. Companies often start with a subset of the patterns in this architecture, and as they realize value for gaining insight to key business outcomes they expand the breadth of use.
Reference Architecture for Big Data must include a Focus on Governance and Integration with an Organization’s Existing Infrastructure.
Big Data Reference Architecture - SAP
Analyze big data at the speed of thought and drive rapid innovation – with our market-leading in-memory platform, SAP HANA. Optimized for both transactional and analytical processing.
Maximize performance with extreme transaction processing (XTP) and gain real-time insight, all delivered with anytime, anywhere access to mission-critical information. SAP Database solutions leverage in-memory, cloud, and mobile technologies
SAP has integrated HANA with Hadoop, enabling customers to move data between Hive and Hadoop's Distributed File System and SAP HANA or SAP Sybase IQ server. It has also set up a "big-data" partner council, which will work to provide products that make use of HANA and Hadoop. One of the key partners is Cloudera. SAP wants it to be easy to connect to data, whether it's in SAP software or software from another vendor.
Big Data Reference Architecture – IBM
IBM is unique in having developed an enterprise class big data platform that allows you to address the full spectrum of big data business challenges. The platform blends traditional technologies that are well suited for structured, repeatable tasks together with complementary new technologies that address speed and flexibility and are ideal for adhoc data exploration, discovery and unstructured analysis.
IBM’s integrated big data platform has four core capabilities: Hadoop-based analytics, stream computing, data warehousing, and information integration and governance.
Big Data Reference Architecture – ORACLE
Making the most of big data means quickly analysing a high volume of data generated in many different formats. Oracle offers a range of products for acquiring all your data like Oracle NOSQL Database & ORACLe Database.
A big data platform needs to process massive quantities of data in batch and in parallel—filtering, transforming and sorting it before loading it into an enterprise data warehouse. Oracle offers a choice of products for organising big data including:
Oracle Big Data Appliance
Oracle Data Integrator
Oracle Big Data Connectors
Analysing big data within the context of all your other enterprise data can reveal new insights that can have a significant impact on your bottom line. Oracle offers a portfolio of tools for statistical and advanced analysis that complement Oracle Exadata, including:
Oracle Advanced Analytics
Oracle Exadata Database Machine
Oracle Data Warehousing
Oracle Exalytics In-Memory Machine.
Big Data Reference Architecture - Informatica + EMC + SAS
EMC has centered its big-data technology on technology that it acquired when it bought Greenplum in 2010. It offers a unified analytics platform that deals with web, social, document, mobile machine and multimedia data using Hadoop's MapReduce and HDFS, while ERP, CRM and POS data is put into SQL stores. The data mining, neural nets and statistics analysis is carried out using data from both sets, which is fed in to dashboards.
SAS can help you manage the entire information continuum with unified technology solutions and strategy and implementation services that span data, analytics and decision management. SAS Information Management enables organizations to fully exploit and govern their information assets to provide competitive differentiation and sustained success.
The SAS high-performance analytics infrastructure forms the backbone of your ongoing analytic endeavors – no matter how big your big data demands get, nor how complex your analysis needs become. Several distributed processing options – in-memory, in-database and grid computing – let you take advantage of the latest technology advancements.
Big Data Reference Architecture - Open Source Technologies
Hadoop is considered as one of the best in storing the structured, semi-structured and unstructured data. Constructed to be an open source software framework for data-intensive distributed applications, the Apache Hadoop uses a series of nodes to store the data, Cutting structured a MapReduce facility with a distributed file system to meet the multi processing requirements . This technology known as Apache Hadoop became so popular that it’s considered as one of the best in open source technology.
R combines open source programming language along with software environment only to provide solutions for statistical computing and visualization. R is considered as one of the best in statistical analysis, as recently reports have emerged that R is now tracking services and supports from models stimulated by Red Hat’s support for Linux.
Apache HBase is one among a handful of open source technology that supports NoSQL data stores. Apache HBase runs on Hadoop Distributed Filesystem (HDFS), as it is designed as a non-relational columnar distributed database. In 2010 Facebook acquired HBase as it provides fault-tolerant storage and access to large quantities of spare data. Now Apache HBase is available under Apache License 2.0.
Thanks for your time.