Here is a dictionary of the big data technologies I most commonly run into. It is by no means, a complete list of all big data terms.

Name Type Definition
Heroku Server A cloud platform for hosting web applications, that focuses on scalability and ease of deployment.
EC2 Server A cloud platform owned by Amazon that allows users to rent virtual computers or running large scale applications. Stands for Elastic Compute Cloud.
Google App Engine Server A cloud platform, owned by Google, for hosting web applications.
Azure Server A cloud platform, owned by Microsoft, for running large scale applications.
Kudu Service A columnar storage engine that runs on Hadoop.
Cloudera Company A company that provides a commercial distribution of Hadoop along with support, services, and training to customers.
MapR Company A company that provides a commercial distribution of Hadoop for enterprises.
Java Language A computer programming language often used for building and connecting big data technology.
Scala Language A computer programming language used with big data applications.
Hive Service A data warehouse interface, developed at Facebook, that supports SQL-like language.
Hadoop Framework A framework, written in Java, for distributed storage and processing of data across computer clusters. It consists of the Hadoop Distributed File System, for storage, and MapReduce for data processing.
Spark Framework A framework, written in Scala, for in-memory computing across computer clusters.
Pig Service A high-level platform, designed for Hadoop, that supports SQL-like queries with a language called Pig Latin.
Lucene Search A Java library for indexing and searching documents.
Oozie Service A job control system for workflow scheduling.
Solr Search An enterprise search platform, built on the Lucene Java library.
MapReduce Programming Model An implementation for processing large data sets in parallel, across a distributed computing cluster.
Hypertable Database A NoSQL database, based on BigTable, written in C++, that focuses on performance.
Voldemort Database A NoSQL database, based on DynamoDB developed at LinkedIn. It focuses on fast lookups for large distributed clusters.
Accumulo Database A NoSQL database developed and open sourced by the National Security Agency that provides cell-level access labels.
Cassandra Database A NoSQL database, originally developed at Facebook, that has a distributed key/value store.
Riak Database A NoSQL database, written in Erlang, that focuses on high availability and fault tolerance.
Hbase Database A NoSQL database, written in Java, modeled after BigTable that uses a data structure of keys, column families, and column names.
CouchDB Database A NoSQL, document-oriented database with a JavaScript interface that uses JSON to store data.
MongoDB Database A NoSQL, document-oriented database with JSON like objects.
Redis Database A NoSQL, in memory database that uses a key/value store.
JSON Definition A notation for storing data objects.
Python Language A programing language used for analytical computation with big data technologies.
BigTable Database A proprietary datastore developed by Google, accessible through the Google App Engine.
DynamoDB Database A proprietary, NoSQL database developed at Amazon. It is offered as part of the Amazon Web Services portfolio.
Hortonworks Company A public company that provides a commercial distribution of Hadoop.
ElasticSearch Search A search engine platform, built on Lucene that focuses more on web applications.
Kafka Service A service for handling a large number of events related to real-time data feeds.
Flume Service A service that focuses on information gathering for log data.
Impala Service A SQL query engine, developed at Cloudera, that runs on Hadoop.
NoSQL Definition A term meaning “non SQL” or “not only SQL” for databases that have models differing from relational databases.
Hadoop Distributed File System File System Commonly called HDFS. A file system written in Java for the Hadoop framework.

Something missing? Let me know.