Project Software

1. Introduction

An overview of some of the tools used to work with the datasets covered on this website

2. Core Tools

2.1. Operating System

Most of the work was done using Cygwin which provides a Linux like environment on a Windows host. With Cygwin installed you can work in a shell command line environment writing scripts and taking full advantage of Unix utilities. Extensive use was made of the best text editor around 'vi' An excellent vi Reference is available. This website is largely written in vi using AsciiDoc mark up language having been converted from Wordpress. Other tools include awk which is a programming language in itself, ideal for processing text files and scripting. For processing CSV files into a UTF-8 character set iconv proved invaluable as did dos2unix for file preparation. Unix is ideal for one off manipulation of files, pattern matching and prototyping methods with utilities such as grep and sed. These can the be formalised and written into shell scripts that can chaned together to create data handling piplines that help automate processing.

2.2. Databases

When handling large amounts of data spreadsheets have their limits, for the heavy lifting PostgreSQL was used. I had some contact with PostgreSQL commercially in the past, I found installing and running the database to be very straight forward I ran it under Cygwin and it performed perfectly and fully endorse PostgreSQL’s own opening description

PostgreSQL is a powerful, open source object-relational database system with over 30 years of active development that has earned it a strong reputation for reliability, feature robustness, and performance.

— From https://www.postgresql.org/ Website Viewed on 06/11/2021

Interaction with the data was largely through PLpgSQL SQL Procedural Language having used Transact-SQL extensively in the past I found PLpgSQL SQL easy to 'pick-up'.

For handling smaller datasets I found Google’s free offering Google Sheets useful, setting up a Google account gives you full access with 15GB cloud storage. Sheets handle Excel’s spreadsheet format well providing a means to ingest and export them to CSV . I found Liza Bolton’s Data Journalism Workshop Notes a useful primer to using Sheets for handling datasets.

2.3. Other Tools

  • Data Miner - A Google Chrome Extension and Edge Browser Extension that helps you crawl and scrape data from web pages and into a CSV file or Excel spreadsheet.

  • CSV Lint - A CSV validator

  • JSON to CSV Converter - Convert .json to .csv

  • SQL Formatter - Format your SQL code for presentation, PLSQL. T-SQL and more.

3. Data Visualisation Tools

  • Datawrapper - Upload data and create charts, maps and tables. Create image files or embed in a website.

  • Flourish - Upload data and create charts, maps and tables. Create image files or embed in a website

  • Google Sheet Charts - Create charts from Google spreadsheets

  • Google Data Studio - Build interactive reports and dashboards, similar to Microsoft’s Power BI

4. Documentation