HBase is an open source, distributed, versioned, column-oriented No-SQL/non-relational database management system that runs on the top of Hadoop. It adds transactional capability to hadoop, allowing users to update data records. Hadoop is designed for batch processing of large dataset, but with HBase on the top of Hadoop we can process real time dataset.
HBase allows many attributes to be grouped together into column families, such that the elements of a column family are all stored together. This is different from a row-oriented relational database, where all the columns of a given row are stored together. Hadoop has a NameNode and slave nodes, and MapReduce has Job Tracker and TaskTracker slaves, HBase is built on similar concepts.
In HBase a master node manages the cluster and region servers store portions of the tables and perform the work on the data. An HBase system comprises a set of tables. Each table contains rows and columns, much like a traditional database. Each table must have an element defined as a Primary Key, and all access attempts to HBase tables must use this Primary Key. An HBase column represents an attribute of an object.