Big Data analytics tools are so important these days the federal government is getting directly involved in their nurturing and advancement.
DARPA (the U.S. Defense Advanced Research Projects Agency) has awarded $3 million to software provider Continuum Analytics to help fund the development of Python's data processing and visualization capabilities for big data jobs.
The money will go toward developing new techniques for data analysis and for visually portraying large, multi-dimensional data sets. The work aims to extend beyond the capabilities offered by the NumPy and SciPy Python libraries are widely used by programmers for mathematical and scientific calculations. The work is part of DARPA's XData research program, a four-year, $100 million effort to give the Defense Department and other U.S. government agencies tools to work with large amounts of sensor data and other forms of big data."
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Its syntax is said to be clear and expressive. Python has a large and comprehensive standard library. Python supports multiple programming paradigms, including object-oriented, imperative and functional programming styles.
It features a fully dynamic type system and automatic memory management, similar to that of Scheme, Ruby, Perl, and Tcl. Like other dynamic languages, Python is often used as a scripting language, but is also used in a wide range of non-scripting contexts. Using third-party tools, Python code can be packaged into standalone executable programs. Python interpreters are available for many operating systems.
More mathematically centered languages such as the R Statistical language might seem better suited for big-data number crunching, but Python offers an advantage of being easy to learn.
"Python is a very easy language to learn for non-programmers," said Peter Wang, president of Continuum Analytics. That's important because most big-data analysts will probably not be programmers. If they can learn an easy language, they won't have to rely on an external software development group to complete their analysis, Wang said.
At the PyData 2012 conference in New York last November, Continuum engineer Stephen Diehl discussed how Blaze would operate, describing the library as a potential successor to NumPy.
NumPy has limitations that Blaze seeks to correct, Diehl said. Most notably, NumPy only offers the ability to store a series of numbers as one continuous string of data. "It is a single buffer, a continuous block of memory. That may be OK for some uses, but the real world is more heterogenous," he said in a presentation.
Blaze can "endow data with structure," Diehl said. It will also allow programmers to establish multidimensional arrays and store these arrays in a distributed architecture, across multiple machines.