Global Architecture
Global Schema
Network Architecture
Available Technologies ("Capsules")
Extraction & Processing
- SQOOP
- Java, Scala, Kotlin
- Apache Spark (version 1.5, 1.6, 2.0, 2.1)
- R
- Python (branch 2.x and 3.X)
- Docker
- Notebooks :
- Jupyter : Python
- Jupyter: Python with PySpark 2.1
- Jupyter: R
- Jupyter: Julia
- Jupyter: Haskell
- Spark Notebook : 1.5
- Spark Notebook : 1.6
Datalake
- HDFS : 2.6
- Hive
- Impala : 2.5
- Drill
- Kafka : 0.10
Datamart
- Mongo DB
- MySQL
Dataviz
- Docker
Ressource Management
Zoom on the hardware architecture
How jobs impact the available servers
Schema Full Saagie
Schema Saagie on Top of a Datalake
Rules
Node types | Resident Services | Scheduled or Streaming Jobs | Comments |
---|---|---|---|
Data Node | HDFS Yarn/Map reduce (aslo Hive) Impala Drill | Docker Spark R Python Sqoop Talend Java-Scala Datascience Notebook (depends of your settings) | |
Datamart | Mongo Db MySQL PostGreSQL (1.5) Elastic Search (1.5) | ||
Dataviz | Docker Datascience Notebook (depends of your settings) | ||
Kafka Node | Kafka | ||
Compute Edge Node | Datascience Notebook (depends of your settings) | ||
GPU Edge Node |