Big Data/Pig
< Big Data
Apache Pig provides a high-level declarative query language for Hadoop MapReduce.
Pig provides the query language Pig Latin. A Pig Latin script specifies a sequence of steps. Each steps defined only a single, high-level data transformation. When executing this script, it is first transformed into a logical plan that describes its execution. This plan is used to compile several MapReduce jobs that are executed on the Hadoop cluster.
Additional features:
- user defined functions as first-class citizens
- arbitrary input and output file formats
- nested data model
Main operations:
- LOAD
- FOREACH
- FILTER
- COGROUP
- GROUP
- JOIN
- UNION
- CROSS
- ORDER
- DISTINCT
- STORE
References
edit- Apache Pig - official web site
- Wikipedia Article - Apache Pig
- C. Olston and B. Reed and U. Srivastava and R. Kumar and A. Tomkins "Pig Latin: A Not-so-foreign Language for Data Processing" Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD '08), 2008, Pages 1099-1110, ACM New York, NY, USA