It's likely that even after last week's highly-publicized announcement of multiple "Big Data" initiatives set to launch in Massachusetts, big data knows more about you than you know about it.

"It does provide some imprint of an individual's behavior," said Dr. Bhabani Misra, a member of the team that runs the Center of Excellence for Big Data at the University of St. Thomas in Minnesota. "We all click on the Internet many, many times. As we click on many different sites, the clicks stream data."

Big data, or vast, complex data collections generated by businesses, social networking sites and online transactions, doesn't just refer to a person's digital fingerprint - what someone posts to Twitter or buys online. It can even include the tests doctors order for their patients that are then reported to health insurance companies.

"We live in a world where there is a ton of data being collected about us," said Sam Madden, the director of MIT's new big data initiative at the school's Computer Science and Artificial Intelligence Laboratory. "It's when we talk about combining and correlating that data that it leads to applications that have value."


Madden said the team at MIT will begin working on projects that attempt to grasp big data collections and turn them into programs that has a general and useful purpose for a lot of people.

However, one of the issues is privacy and security of the data.

Madden said it's partly a societal and policy problem based around how much information people are OK with putting up online about themselves.

"There is value in sharing data, but at same time don't want to compromise an individual's information," he said.

Data tsunami

A report released this year by the Mass Technology Leadership Council said that one of the challenges of Big Data can be described in three words: volume, velocity and variety.

The world will store about 1,800 exabytes of digital data this year, a volume that is growing at a 60 percent compound annual growth rate, according to market research company International Data Corp. An Exabyte of data equals one trillion gigabytes.

All of that data is generated at previously unfathomable speeds and makes big data collection, processing and management a 24-hour, 7-days-per-week task, according to the report.

The data is also comprised of structured data - such as online financial transactions - and unstructured data that includes social media postings, audio and video files, etc.