Here is a dictionary of the big data technologies I most commonly run into. It is by no means, a complete list of all big data terms.

Name Type Definition
MapR Company A company that provides a commercial distribution of Hadoop for enterprises.
Cloudera Company A company that provides a commercial distribution of Hadoop along with support, services, and training to customers.
Hortonworks Company A public company that provides a commercial distribution of Hadoop.
CouchDB Database A NoSQL, document-oriented database with a JavaScript interface that uses JSON to store data.
DynamoDB Database A proprietary, NoSQL database developed at Amazon. It is offered as part of the Amazon Web Services portfolio.
BigTable Database A proprietary datastore developed by Google, accessible through the Google App Engine.
Cassandra Database A NoSQL database, originally developed at Facebook, that has a distributed key/value store.
Hbase Database A NoSQL database, written in Java, modeled after BigTable that uses a data structure of keys, column families, and column names.
Voldemort Database A NoSQL database, based on DynamoDB developed at LinkedIn. It focuses on fast lookups for large distributed clusters.
MongoDB Database A NoSQL, document-oriented database with JSON like objects.
Redis Database A NoSQL, in memory database that uses a key/value store.
Hypertable Database A NoSQL database, based on BigTable, written in C++, that focuses on performance.
Riak Database A NoSQL database, written in Erlang, that focuses on high availability and fault tolerance.
Accumulo Database A NoSQL database developed and open sourced by the National Security Agency that provides cell-level access labels.
JSON Definition A notation for storing data objects.
NoSQL Definition A term meaning “non SQL” or “not only SQL” for databases that have models differing from relational databases.
Hadoop Distributed File System File System Commonly called HDFS. A file system written in Java for the Hadoop framework.
Hadoop Framework A framework, written in Java, for distributed storage and processing of data across computer clusters. It consists of the Hadoop Distributed File System, for storage, and MapReduce for data processing.
Spark Framework A framework, written in Scala, for in-memory computing across computer clusters.
Python Language A programing language used for analytical computation with big data technologies.
Scala Language A computer programming language used with big data applications.
Java Language A computer programming language often used for building and connecting big data technology.
MapReduce Programming Model An implementation for processing large data sets in parallel, across a distributed computing cluster.
Lucene Search A Java library for indexing and searching documents.
ElasticSearch Search A search engine platform, built on Lucene that focuses more on web applications.
Solr Search An enterprise search platform, built on the Lucene Java library.
Azure Server A cloud platform, owned by Microsoft, for running large scale applications.
EC2 Server A cloud platform owned by Amazon that allows users to rent virtual computers or running large scale applications. Stands for Elastic Compute Cloud.
Google App Engine Server A cloud platform, owned by Google, for hosting web applications.
Heroku Server A cloud platform for hosting web applications, that focuses on scalability and ease of deployment.
Pig Service A high-level platform, designed for Hadoop, that supports SQL-like queries with a language called Pig Latin.
Oozie Service A job control system for workflow scheduling.
Kudu Service A columnar storage engine that runs on Hadoop.
Flume Service A service that focuses on information gathering for log data.
Kafka Service A service for handling a large number of events related to real-time data feeds.
Impala Service A SQL query engine, developed at Cloudera, that runs on Hadoop.
Hive Service A data warehouse interface, developed at Facebook, that supports SQL-like language.

Something missing? Let me know.