The evolution of the Internet has generated an exponential growth in the volume of data to be stored and processed. This data is usually unstructured and very diverse (log files, online discussions, user traffic logs, banking, weather or satellite information, etc.). The great volume and diversity of the information often makes it impossible to use conventional technology to store and process data based on relational systems (RDBMS) or objects. A new approach has emerged in the last few years, discarding relational concepts (decomposition into normal form, relational algebra) and based on using structured data manipulation languages like SQL (Structured Query Language), giving birth to the growing family of “big data” technologies, also known as “NoSQL.”
Most big data systems rely on distributed storage solutions for structured data (Google/BigTable), parallel and distributed processing methods, and the MapReduce concept this document addresses. These technologies, now mature, have given rise to various open source or commercial applications like Hadoop  (Apache), Cassandra  (Facebook) or MongoDB .