The Technologies we love,

The Technologies we used to built Smart Data


The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Hadoop 2 environment provides scalable services including HDFS, Yarn, Zookeeper, HBase, PIG, Sqoop or Apache Drill.

Hadoop fuels Smart Data, with its powerful components, providing scalability and support of standard for large deployment, with limited software cost. Smart Data is taking advantage of the following Hadoop components.

  • Zookeeper : to enable highly reliable distributed coordination between nodes
  • Yarn/Map Reduce : to enable multi process, scheduling and monitoring of process
  • Solr/Cloud : the leading indexation engine, to process document indexation and search on indexed document
  • HDFS : Vanilla Hub can store data (text file, xml document) and document on hdfs storage
  • HBase : data can be stored and retrieved from Hbase database
View Site


Spark is a fast processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop input Format. It is designed to perform both batch processing and new workloads like streaming, interactive queries, and machine learning.

Spark is the de facto standard for in-memory computation, and is paving the way to modern way of computing data analysis. Both Vanilla Hub and Vanilla Air can take advantage of Spark nodes inside a Data Science architecture, by taking data stored inside Spark, and processing data using Spark in-memory data analysis process.

View Site


The mission of the Apache Software Foundation (ASF) is to provide software for the public good. We do this by providing services and support for many like-minded software project communities of individuals who choose to join the ASF.

Apache is the heart of Open Source, and the foundation for Hadoop. In addition to Hadoop,

Smart Data integrates these Apache module :

  • Nutch : Apache Nutch is a highly extensible and scalable open source web crawler software project. Nutch is available in Vanilla Hub, to provide a crawling function

    Please Visit: Nutch

  • Drill : Apache Drill enables analysts, business users, data scientists and developers to explore and analyze this data without sacrificing the flexibility and agility offered by these datastores. Drill processes the data in-situ without requiring users to define schemas or transform data

    Please Visit: Drill

  • Phoenix :Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets.

    Please Visit: Phoenix

  • Nifi :Apache nifi supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic.

    Please Visit: Nifi

  • Flink :Apache Flink is an open source platform for distributed stream and batch data processing. Flink’s core is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams.

    Please Visit:Flink

  • Tika : The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF) … it’s used together with Solr/Sloud to extract information from documents.

    Please Visit: Tika

View Site


Solr is an open source enterprise search platform, written in Java, from the Apache Lucene project. Its major features include full-text search, hit highlighting, faceted search, real-time indexing, dynamic clustering, database integration, NoSQL features and rich document handling. Providing distribute search and index replication, Solr is designed for scalability and Fault tolerance.

Smart Data is taking advantage of Solr/Cloud indexation server, and can be deployed together with the leading framework, Elastic Search and LucidWorks.

  • Elastic Search provides a growing platform of open source projects and commercial products designed to search, analyze, and visualize your data, allowing you to get actionable insight in real time. Our products are architected to seamlessly work together as a standalone solution or easily integrated into your existing infrastructure.

    Please Visit: Elastic

  • Lucidworks Fusion, the most advanced search and data analysis platform on the planet. It provides the enterprise-grade capabilities needed to design, develop, and deploy powerful search apps at any scale. Use it for enterprise search, ecommerce, real-time data analytics, and practically anything else you can think of requiring blazing fast data retrieval.

    Please Visit: Lucidworks

View Site

R Project

R is a free software environment for statistical computing and graphics. R and its libraries implement a wide variety of statistical and graphical techniques, including linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, and others. R is easily extensible through functions and extensions, and the R community is noted for its active contributions in terms of packages.

R is the heart of Vanilla Air, as it provides engine to run standard and custom R programs, in a clustered environment. Together with Vanilla Air, you can build complex predictive and forecasting model to take control of your data strategy. R provides powerful visualization such as 3D plot, maps … whatever you need, R can run it for you!

View Site

Vanilla BI

Vanilla is the only True Open Source BI Platform. Vanilla provides you all the necessary components to assure business data processing from the source applications until final restitutions.

Vanilla BI is the de facto Open Source BI standard, providing services such as :

  • Report : crosstab report, aggregate report, complex report.
  • Dashboard : multi folder dashboard, cube & report integration, maps integration.
  • Olap Cube : multi dimensional visualization, with support for view and dashboard creation.
  • Maps visualization, with support for KPI display, heatmap visualization
  • KPI : consistent Kpi platform, with axis and measure support, maps visualization
View Site

Vanilla ETL

Vanilla ETL platform provides an integrated suite of components to help enterprises extract value from their data. The Vanilla ETL platform addresses some of the key challenges in the data ETL value chain and processes

When it comes to support more complex transformation and aiming to standardize column’s value using a Master Data repository, Vanilla ETL provides reliable component to write complex transformation. Vanilla ETL provides the following components .

  • Vanilla ETL Management
  • Data Workflow Management
  • Master Data Management
  • Scheduler
View Site

Vanilla KPI

Vanilla KPI provides KPI infrastructure & services to deploy BSC & KPI applications.

Vanilla KPI comes with a set of full featured modules to take in charge the development & deployment of KPI applications, using data stored in various database, including relational databases and NoSql database such as Hbase.

Vanilla Kpi has 3 majors components

  • Designer : web interface to design Application, Axis, Kpi … stored in a Kpi dictionary.
  • Loader : web interface to manage Kpi values, using Kpi dictionary.
  • User : web interface to deploy Kpi, with support for various graph & maps displays.

Kpi values can be loaded by:

  • Vanilla Hub process
  • Vanilla ETL transformation
  • Vanilla Kpi Loader interface
  • Calculated inside Vanilla Air
View Site