Data Cleaning with OpenRefine

Self-learning course

Site Updated On: March 07, 2023
For More Info Email: rsginfo@soton.ac.uk
Data Cleaning with OpenRefine

General Information

Requirements: Participants must have access to a computer with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed (listed below).

Accessibility:

We are dedicated to providing a positive and accessible learning environment for all. Please get in touch you require any accommodations or if there is anything we can do to make this lesson more accessible to you.

Contact: Please email or rsginfo@soton.ac.uk for more information.


Surveys

Please be sure to complete this survey after the lesson.

Please input the date as the date you started the materials.

Post-Lesson Survey


Lesson Outline

Before you can analyze data you need to clean it. Data cleaning identifies errors and corrects formatting to create consistent data. This step must be taken with extreme care and attention because without clean data the results of analysis may be false and non-reproducible. OpenRefine is a powerful free and open source tool for working with messy data: cleaning it and transforming it from one format into another. This lesson will teach you to use OpenRefine to clean and format data effectively and automatically track any changes that you make. Many people comment that this tool saves them literally months of work trying to make these edits by hand.


Schedule

1. Introduction How is OpenRefine useful?
2. Opening and Exploring Data How can we import our data into OpenRefine?
How can we summarise our data?
How can we find errors in our data?
How can we edit data to fix errors?
How can we convert column data from one data type to another?
3. Transforming Data How can we transform our data to correct errors?
4. Filtering and Sorting Data How can we select only a subset of our data to work with?
How can we sort our data?
5. Exporting Data Cleaning Steps How can we document the data-cleaning steps we’ve applied to our data?
How can we apply these steps to additional data sets?
6. Exporting and Saving Data How can we save and export our cleaned data from OpenRefine?
7. Further Resources on OpenRefine What other resources are available for working with OpenRefine?
8. Survey
Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.


Setup

To participate in this lesson, you will need access to software as described below. In addition, you will need an up-to-date web browser.

The instructions for all the software can be found on the setup page.