Sunday, 19 July 2015

The Animal Kingdom of Data Science

The Animal Kingdom of Data Science


It was 4.54 billion years ago when this beautiful planet, that we now call earth, came into existence. A couple of billion years later came living organisms that in some way or the other we call animals. 200,000 years ago, we humans came by and evolved much faster than most living organisms. 74 years ago, some of us contributed towards making a new world having 0s and 1s at its core. Few years ago, another set of animals came along to make our lives comfortable. I would like to tell you about some of these animals from behind the screen.




Python-
Python is a high level programming language. The origin of Python can be dated back to late 1980s. The major purpose of python is to provide code readability. Its syntax allows programmers to express concepts in fewer lines of code than would be possible in languages such as C++ or Java. Python when run along with Spider proves to be an even more user friendly coding space. The following is an example of a program for Fibonacci series-
def fib(n):
 a,b = 1,1
 for i in range(n-1):
  a,b = b,a+b
 return a
print fib(5)



Anaconda - Anaconda is a free distribution of the Python programming language for large-scale data processing, predictive analytics, and scientific computing, that aims to simplify package management and deployment. Its package management system is conda.

 




Spyder- Spyder (formerly Pydee) is an open source cross-platform IDE for scientific programming in the Python language. Spyder integrates NumPy, SciPy, Matplotlib and IPython, as well as other open source software.






Hadoop – This tiny elephant is capable of making big wonders. Technically put, “Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.” Now what is interesting is how the name Hadoop came into existence. Hadoop is the name given to a yellow baby elephant toy by Doug Cutting's(The creator of Hadoop) son. Apparently, even the name Google has similar origins.   





Hive - The Apache Hive ™ data warehouse software facilitates querying and managing large datasets residing in distributed storage. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.

So, we can successfully conclude that Data scientists are animal lovers! Also, at a serious level, the reason for this pattern of naming can be understood in Doug Cutting's own words, "The rules of names for software is they're meaningless because sometimes the use of a particular piece of software drifts, and if your name is too closely associated with that, it could end up being wrong over time"


(Source : Wikipedia, hive.apache.org)

1 comment: