Written by Aravind H.U on Jul 9, 2014
If you are passionate about BigData and want to learn and experiment the indexing and parsing big data, your primary requirement is some huge freely available data.
When I started to learn Hadoop, I found these free data sources where you can download and experiment and gather insights by using a statistical language like R or you can give this as an input to a mapreduce task which you have developed for practice.
AWS public datasetshttp://aws.amazon.com/datasets
Public Data Sets on AWS provides a centralized repository of public data sets that can be seamlessly integrated into AWS cloud-based applications. AWS is hosting the public data sets at no charge for the community, and like all AWS services, users pay only for the compute and storage they use for their own applications.
Infochimps free datasetshttp://www.infochimps.com/datasets
You can find lot of public datasets such as US censes data to learn and experiment Bigdata technologies.
Phew, I didn't knew that wikipedia database is available for public to download for free of cost. Its massively huge so please try if you have good internet speed.
The list had lot of links by the time of this writing, but now people are aware of analytics so it is very difficult to get any valuable data to process and experiment.