The Apache® Hadoop® project develops open-source software for reliable, scalable, distributed computing.
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
A wide variety of companies and organizations use Hadoop for both research and production. Users are encouraged to add themselves to the Hadoop PoweredBy wiki page(opens new window).
Ambari™(opens new window): A web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters which includes support for Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop. Ambari also provides a dashboard for viewing cluster health such as heatmaps and ability to view MapReduce, Pig and Hive applications visually alongwith features to diagnose their performance characteristics in a user-friendly manner.
Pig™(opens new window): A high-level data-flow language and execution framework for parallel computation.
翻译
第1句话:一种用于并行计算的高级数据流语言和执行框架。
重点词汇
第1句话
English
Phonetic
Chinese
high-level
/haɪ levəl/
高级的,高层的,高等的,在高处的,级别高的,位置高的
data-flow language
/ˈdeɪtə fləʊ ˈlæŋɡwɪdʒ/
数据流语言
execution
/ˌɛksɪˈkjuːʃən/
执行,实施,处决,实行,制作,表演,演奏
framework
/ˈfreɪmwɜːrk/
框架,结构,机制,构架,准则,观点,信仰
parallel computation
/ˈpærəlel ˌkɒmpjuˈteɪʃən/
并行计算,平行计算;并行运算
Spark™(opens new window): A fast and general compute engine for Hadoop data. Spark provides a simple and expressive programming model that supports a wide range of applications, including ETL, machine learning, stream processing, and graph computation.
Submarine(opens new window): A unified AI platform which allows engineers and data scientists to run Machine Learning and Deep Learning workload in distributed cluster.
Tez™(opens new window): A generalized data-flow programming framework, built on Hadoop YARN, which provides a powerful and flexible engine to execute an arbitrary DAG of tasks to process data for both batch and interactive use-cases. Tez is being adopted by Hive™, Pig™ and other frameworks in the Hadoop ecosystem, and also by other commercial software (e.g. ETL tools), to replace Hadoop™ MapReduce as the underlying execution engine.