Cascading

Maintainers and Contributors

Chris K Wensel

About

The Cascading is a collection of applications, languages, and APIs for developing data-intensive applications.

It was originally designed to provide a much more user friendly API over Apache Hadoop MapReduce, but has evolved over the years to support other computing platforms like Apache Tez, and offering a stand-alone (local-mode) backend for local data streaming.

Other projects like Apache Flink and Hazelcast have provided backend implementations.

Cascading, embedded in a serverless fabric, is currently used at Salesforce to ingress 8TB of data a day to power multiple internal stakeholder and external customer facing products.

Reach out to the authors to learn more.