What you will learn?
Use of Bash to quickly sort, search, match, replace, clean and optimize various aspect of your data (small and big), and you wouldn’t need to go through any tough learning curve! Example based learning experience.
This beginner's level book will help you to become an expert in bash and learn to explore real-world large data sets. Bash may not the best way to handle all kinds of data! But, there often comes a time when you are provided with a pure Bash environment, such as what you get in the common Linux based super computers and you just want an early result or view of the data before you drive into the real programming, using Python, R and SQL, SPSS, and so on. Expertise in these data-intensive languages also comes at the price of spending a lot of time on them.
In contrast, bash scripting is simple, easy to learn and perfect for mining textual data! We strongly believe, learning and using Bash shell scripting should be the first step if you want to say, Hello Big Data!
This book starts with some practical bash-based flat file data mining projects involving:
- University ranking data
- Facebook data
- Crime Data
- Shakespeare-era plays and poems data
(All data sets are provided in the data.zip file)
If you haven’t used Bash before, feel free to skip the projects and get to the tutorials part. Read the tutorials and then come back to the projects again. The tutorial section will introduce with bash scripting, regular expressions, AWK, sed, grep and so on.