Learn Python Series (#29) - Handling CSV

Repository

https://github.com/python/cpython

What will I learn?

You will learn what a CSV file containing tabular data looks like, both in a spreadsheet program, inside a CSV file using a code editor, and using a Python interpreter.
how to use the csv module for reading, parsing and writing CSV files,
how to handle both CSV list and dictionary elements,
how to change the standard CSV field delimiter

Requirements

A working modern computer running macOS, Windows or Ubuntu;
An installed Python 3(.6) distribution, such as (for example) the Anaconda Distribution;
The ambition to learn Python programming.

Difficulty

Beginner

Curriculum (of the `Learn Python Series`):

Additional sample code files

The full - and working! - iPython tutorial sample code file is included for you to download and run for yourself right here:
https://github.com/realScipio/learn-python-series/blob/master/csv-tut01.ipynb

GitHub Account

https://github.com/realScipio

Learn Python Series (#29) - Handling CSV

Welcome to already episode #29 of the Learn Python Series! We've already discussed handling files in general, handling JSON, in the previous epispde we talked about pickling, so we must also briefly touch upon a well-known and widely used data format: CSV.
When working with data, then CSV (Comma Separated Values) is a very commonly used format to import and export tabular data to and from spreadsheets and also databases.

However, because a well-defined CSV standard is missing, a uniform way to implement field delimiters (such as commas) is absent. These subtle differences in CSV formats make self-parsing of CSV files a bit cumbersome. Luckily most Python distributions (such as Anaconda) come with a bundled csv module help a lot with reading and writing "out of the box".

Let's find out how the csv module works!

What does a CSV file look like?

When importing / opening a CSV-file in a spreadsheet program (such as MS Excel, OpenOffice Calc), the data contained in the CSV is displayed in a tabular (table) format, like this:

ID	Name	City
1	Jack	New York
2	Paula	Dublin
3	Marly	Melbourne
4	Tony	London
5	Andrea	Rome
6	Julie	Paris
7	Bernhard	Berlin
8	Frank	San Francisco
9	Johan	Amsterdam
10	Maria	Barcelona

The underlying data structure of the above displayed CSV file, is structured in a human-readable form. It's actually just like a regular text file (you could open a CSV file with your favorite code editor as well), and each field is simply separated by a comma (hence the name CSV), like so:

ID,Name,City
1,Jack,New York
2,Paula,Dublin
3,Marly,Melbourne
4,Tony,London
5,Andrea,Rome
6,Julie,Paris
7,Bernhard,Berlin
8,Frank,San Francisco
9,Johan,Amsterdam
10,Maria,Barcelona

Reading CSV files

Let's suppose we have a file named names.csv containing the exact data as displayed above. Using Python's csv module, and the with keyword we've been using in the previous episode(s), reading a .csv file is done like so:

First import the csv module:

import csv

Then we'll open our file names.csv in read-mode, using flag r, and we'll assign it to file object f. Then we need to use the csv.reader() method and pass in our file object. Let's assign it as data:

with open('names.csv', 'r') as f:
    data = csv.reader(f)
    print(type(data))

<class '_csv.reader'>

As you can see, the CSV file data is now extracted into variable data which is a reader object, which is an iterable. So let's now first create an empty list content, then iterate over the reader object, append each data row as a list to the content list and then print it:

import pprint

with open('names.csv', 'r') as f:
    data = csv.reader(f)
    content = []
    for data_row in data:
        content.append(data_row)
    pprint.pprint(content)

[['ID', 'Name', 'City'],
 ['1', 'Jack', 'New York'],
 ['2', 'Paula', 'Dublin'],
 ['3', 'Marly', 'Melbourne'],
 ['4', 'Tony', 'London'],
 ['5', 'Andrea', 'Rome'],
 ['6', 'Julie', 'Paris'],
 ['7', 'Bernhard', 'Berlin'],
 ['8', 'Frank', 'San Francisco'],
 ['9', 'Johan', 'Amsterdam'],
 ['10', 'Maria', 'Barcelona']]

As you can see, each returned data_row is a list of 3 items.

Writing CSV files

We can apply the same technique to write data to a persistent .csv file on disk, this time using the csv.writer() method.

Let's create a list of lists containing some data, again use the with keyword and this time open a new file in write-mode (using the w, for write, flag), againi assign f as a file object, and the apply the csv.writer() method to create a writer object.

Then we'll use that writer object's method writerows() to actually write each item in the cryptos list to file, by passing in cryptos as writerows()'s argument, like so:

cryptos = [
    ['ID', 'Name', 'Abbreviation'],
    ['1', 'Bitcoin','BTC'],
    ['2', 'Litecoin','LTC'],
    ['3', 'Steem','STEEM'],
    ['4', 'Steem Backed Dollar','SBD'],
    ['5', 'IoTeX','IOTX'],
    ['6', 'Stellar','XLM'],
    ['7', 'EOS','EOS'],
    ['8', 'Ethereum','ETH'],
    ['9', 'Cardano','ADA'],
    ['10', 'Dash','DASH']
]

with open('cryptos.csv', 'w') as f:
    writer = csv.writer(f)
    print(type(writer))
    writer.writerows(cryptos)

<class '_csv.writer'>

At this point, a valid CSV file named cryptos.csv is written to disk, and to check if it's contents are valid, we can read it back in like we did before:

import pprint

with open('cryptos.csv', 'r') as f:
    data = csv.reader(f)
    content = []
    for data_row in data:
        content.append(data_row)
    pprint.pprint(content)

[['ID', 'Name', 'Abbreviation'],
 ['1', 'Bitcoin', 'BTC'],
 ['2', 'Litecoin', 'LTC'],
 ['3', 'Steem', 'STEEM'],
 ['4', 'Steem Backed Dollar', 'SBD'],
 ['5', 'IoTeX', 'IOTX'],
 ['6', 'Stellar', 'XLM'],
 ['7', 'EOS', 'EOS'],
 ['8', 'Ethereum', 'ETH'],
 ['9', 'Cardano', 'ADA'],
 ['10', 'Dash', 'DASH']]

Works like a charm!

`csv`'s classes DictReader and DictWriter

Thus far we've been reading and writing lists with strings inside them to and from CSV files. But the csv module also contains the DictReader and DictWriter classes using Python dictionaries instead of lists containing string items.

Reading with `DictReader`

DictReader creates an object and maps it to a dictionary. The dictionary keys are either set using the optional fieldnames parameter, or - when not passed-in as an argument - read from the first line of the CSV file (and now you know why I have been consistently adding fieldnames in the first row of each CSV file ;-) ).

Let's now use the DictReader class to read in the names.csv file again, print a simple multiline F-String (see episode #27 on F-Strings) and please observe that I now use the same dictionary keys as contained in the first line of the names.csv file for reference.

import csv

content = []
with open('names.csv') as f:
    data = csv.DictReader(f)
    for data_row in data:
        content.append(data_row)
        print(
            f"Person #{data_row['ID']}, "
            f"{data_row['Name']}, "
            f"lives in {data_row['City']}"
        )

Person #1, Jack, lives in New York
Person #2, Paula, lives in Dublin
Person #3, Marly, lives in Melbourne
Person #4, Tony, lives in London
Person #5, Andrea, lives in Rome
Person #6, Julie, lives in Paris
Person #7, Bernhard, lives in Berlin
Person #8, Frank, lives in San Francisco
Person #9, Johan, lives in Amsterdam
Person #10, Maria, lives in Barcelona

Writing with `DictWriter`

Let's now try to write a CSV file using the DictWriter class, where we begin with creating a list of objects utopian_contributors (instead of a list of lists, or a list of strings). Because we're dealing with tabular data, every object inside the list has the same data structure.

We also create a list containing the fieldnames:
fieldnames = ['name', 'category']

When writing the CSV file, we first call the method writeheader(), in order to write the first row containing the fieldnames; as you may have noticed, this time I did not include the fieldnames as the first list item in the utopian_contributors data list.

And finally, we'll call the method writerows() and pass in the utopian_contributors data list.

Nota bene: it's also possible to use the method writerow(), for example in a for loop, to write each data row individually.

import csv

utopian_contributors = [
    {'name': 'scipio', 'category': 'tutorials'},
    {'name': 'rosatravels', 'category': 'tutorials'},
    {'name': 'holger80', 'category': 'development'},
    {'name': 'fabiyamada', 'category': 'graphics'}
]

with open('contributors.csv', 'w') as f:
    fieldnames = ['name', 'category']
    data = csv.DictWriter(f, fieldnames=fieldnames)    
    data.writeheader()
    data.writerows(utopian_contributors)

Changing delimiters (other than the default comma `,`)

It's also possible to set an alternative field delimiter prior to executing a reader or writer method. For example you don't want to use the standard comma field delimiter, but a semi-colon, or a space.

This works as follows:

with open('contributors_semicolon.csv', 'w') as f:
    fieldnames = ['name', 'category']
    data = csv.DictWriter(f, fieldnames=fieldnames, delimiter=';')    
    data.writeheader()
    data.writerows(utopian_contributors)

Please observe the ; delimiter being applied!

This technique applies to both reading and writing CSV data.

Learn Python Series (#29) - Handling CSV

Learn Python Series (#29) - Handling CSV

Repository

What will I learn?

Requirements

Difficulty

Curriculum (of the Learn Python Series):

Additional sample code files

GitHub Account

Learn Python Series (#29) - Handling CSV

What does a CSV file look like?

Reading CSV files

Writing CSV files

csv's classes DictReader and DictWriter

Reading with DictReader

Writing with DictWriter

Changing delimiters (other than the default comma ,)

Thank you for your time!

Curriculum (of the `Learn Python Series`):

`csv`'s classes DictReader and DictWriter

Reading with `DictReader`

Writing with `DictWriter`

Changing delimiters (other than the default comma `,`)