Description :
|
• Proficient understanding of distributed computing principles. • Management of Hadoop cluster, with all included services • Ability to solve any ongoing issues with operating the cluster and identifying performance bottlenecks. • Proficiency with Hadoop v2, MapReduce, HDFS • Experience with building stream-processing systems, using solutions such as Storm or Spark-Streaming. • Experience with Spark and SparkR • Experience with integration of data from multiple data sources • Knowledge of various ETL techniques and frameworks, such as Flume • Experience with various messaging systems, such as Kafka. • Experience with Big Data ML toolkits, such as Mahout, SparkML, or H2O • Good understanding of Lambda Architecture, along with its advantages and drawbacks • Experience with Cloudera/MapR/Hortonworks • Experience with Scala and Python
|