Apache Lens is a unified Analytics Platform. Lens aims to cut the Data Analytics silos by providing a single view of data across multiple tiered data stores and optimal execution environment for the analytical query. It is an OLAP virtualization platform that provides multi-dimensional queries in a unified way over data-sets stored in multiple data stores. A single Cube in Lens can span across multiple query engines like Apache Hive, Spark, ElasticSearch and traditional data warehouses.

This allows users to make cost performance trade-offs for different kind of workloads on batch or real-time data. Real-time data can be fully materialized in a fast mutable store like ElasticSearch; and large but older data that is less frequently accessed can be stored in a batch store like Hive in star or snowflake schema. Lens allows users to be agnostic to the underlying physical tables and work with the data cube with a SQL like query language. It provides query life-cycle management and query performance statistics across multiple engines. It also comes with a User Interface to browse schema and run and track queries.


Advantages of Apache Lens

Like traditional OLAP systems, Lens understands the primitives of facts and dimensions. Facts are immutable, time-series data of actual events and comprise of dim-attributes and measures. Dim-attributes are the different domain attributes by which data needs to be analysed.  Lens also separates the query interface for facts and dimensions from the actual storage. Queries are fired against the Lens ‘Cube’ and ‘Dimension’ objects. Actual data is stored in abstractions called ‘fact tables’ and ‘dim tables’. Several ‘fact tables’ can be associated with the Cube. Similarly, several ‘dim tables’ can be associated with the Dimension. When the user queries for data against a cube, Lens provides the intelligence for picking up the right fact table.

Apart from the above mentioned features, Lens also has the following advantages as a standalone platform:  

  • Lens is a natural fit for the Hadoop ecosystem with its great integration with Hive.


  • Lens can use different storage platforms underneath, beyond Hive, as long as they are JDBC compliant. This would mean support for most of the commonly used RDBMS as well as columnar stores like Redshift and Infobright. In InMobi, we use Infobright for storing data that need very fast response times and Lens prioritizes access from that store over Hive automatically.


  • Lens has a metastore that allows for easy exploration and discovery of all the information that it manages. Through a REST interface or a Lens shell, users can have easy access to the metadata of all the cubes, dimensions and the underlying fact tables and dimension tables.


  • Lens also supports query life-cycle management. Since Lens serves most requests by actually firing queries to its underlying storage and aggregating them as required, there is a fair amount of delay to when the request is made to when the results are available. Lens help manage this by accepting and tracking queries, persisting, and formatting the results. Lens also provides mechanism to share results via email.


  • Lens DSL also has machine learning support over its cubes.


