Edit product


How to Clean Data at the Command Line

Cleaning data is a widely known process that can let us explore data and see beyond its raw form. Multiple technologies can solve this task, but we have a problem.  

The data-driven problem we face

Whenever you want to import a CSV file, by habit, you go to Google and see how to find the two lines that you always forget (in Python for example) so you get them open up your text editor to make a file and paste what you found in it.

Why the command line?

The simplest data cleaning tasks might sound frustrating or time-wasting and maybe you use a higher-level library like Pandas but I bet you still write more code than just dealing with the terminal which can pack a bunch of lines of codes into just one-liner at the command line.

This ebook makes dealing with CSV files, JSON, or in general any text file much easier.

What's in it for you?

In this ebook, I'm trying to save your time and the hassle of dealing with files at the system level. You may also like the adventure of exploring command-line tools and programs that you may not have heard of. I encourage you to try these tools as I do on my workdays.

While dealing with the command line may sound a bit geeky, this ebook is simple and easy to follow, and it's a lot of fun.

There are real examples from a scientific paper, COVID tracking project data, Reddit user data, and more that you can practice with and try useful programs and tools at the comfort of your command line.


In this ebook you'll be able to clean data using command-line tools: tr, grep, sort, uniq, sort, awk, sed, and csvlook and practice on cleaning a COVID-19 CSV file using command-line programs: csvkit and xsv comparing the performance of each.

You'll also see how to sort and concatenate a large CSV file with csvkit and xsv, and calculate their performance with respect to Pandas.

In the last chapter, you'll get to know how to clean a JSON file using command-line program jq.

Read this before you buy

The content is a curated list of blog posts I published on my personal site distributed among the book chapters:

Chapter 1: https://www.ezzeddinabdullah.com/posts/how-to-clean-text-data-at-the-command-line

Chapter 2: https://www.ezzeddinabdullah.com/posts/how-to-clean-csv-data-at-the-command-line

Chapter 3: https://www.ezzeddinabdullah.com/posts/how-to-clean-csv-data-at-the-command-line-part-2

Chapter 4: https://www.ezzeddinabdullah.com/posts/how-to-clean-json-data-at-the-command-line

What makes the ebook different from these blog posts?

I've made some fixes to some benchmark results and some command lines used besides syntax-highlighting of the codes and avoiding call-to-actions inside chapters. Also, the format is PDF so you can get the pack of information to the four chapters in just one package and can read it on your laptop or phone.

Buying this ebook will encourage me to publish more whether ebooks or courses as well.


Here is a screenshot of how it looks (with a sample of the syntax-highlighted code snippets):

Who is this for?

If you are a data scientist, data engineer, data analyst, software developer, or you use data a lot (like TXT, CSV, or JSON), this ebook is for you.

You've purchased this product

See it in your library

View in Library
Sorry, this item is not available in your location.
Sold out, please go back and pick another option.

0 ratings

  • Size654 KB
  • Length36 pages


How to Clean Data at the Command Line

Enter your info to complete your purchase


···· ···· ···· 4242
Test card



Use a different card?


pp paypal

or pay with

We do not keep any of your sensitive credit card information on file with us unless you ask us to after this purchase is complete.

or pay with

You'll be charged US$8.

Your purchase was successful!

We charged your card and sent you a receipt

    Gumroad Library

    Download from the App Store or text yourself a link to the app

    Good news! Since you already have a Gumroad account, it's also been added to your library.

    Powered by Gumroad