Exporting and Saving Data
Overview
Teaching: 10 min
Exercises: 5 minQuestions
How can we save and export our cleaned data from OpenRefine?
Objectives
Save an OpenRefine project.
Export cleaned data from an OpenRefine project.
OpenRefine saves your project as you work automatically, so you don’t need to worry about saving. It saves projects in a slightly obscure and hidden location (check where OpenRefine stores project data for different operating systems), most likely to prevent users from accidentally tempering with them. You can, however, export the “OpenRefine project” in a location of your choice, which will package together the data and all the information about the cleaning and data transformation steps you’ve performed. You can then use the exported project to transfer your work to another computer, share the project with collaborators, archive it with your research, etc.
Exporting a project
-
Click the
Export
button in the top right and selectOpenRefine project archive to file
. You might be moved to a new blank tab in your browser might whilst this function is executing. - A
.tar.gz
file will download to your defaultDownload
directory. Thetar.gz
extension tells you that this is a compressed file, which means that it contains multiple files (the most common version of this file type is.zip
which you may be aware of). - You can share the
.tar.gz
file with collaborators, or copy it to a different computer and import the project back into OpenRefine (say, to do further work at home). Note: sharing of this type is best performed via version control software, such as Git.
If you wish to investigate the files, you can double-click on the tar.gz
file and it will expand into a directory
(this process can be more complicated in Windows - see below for details). A folder icon will now appear. Investigate
the files in this folder. What files appear? What information do you think they contain?
Opening a .tar.gz
file on Windows:
- You may require additional software such as 7-zip or WinZip. Download and run the installer of your choice.
- Double-click the exported
tar.gz
file. If Windows asks how you want to open the file, check the “Always use this app to open.gz
files” box, then select “More apps”. - If your chosen application is not listed, select ‘Look for another app on this PC’.
- In the file browser, navigate to
C:\Program Files
, find the application you installed, and double-click on its executable (7zFM
, for example).
Once you open the .tar.gz
OpenRefine project fine, you should see:
- a
history
folder which contains threezip
files. Each of these contains achange.txt
file, which lists each of the individual transformation that you performed on your data. - a
data.zip
file. When expanded, thiszip
file includes a file calleddata.txt
which is a copy of your raw data. You may also see other files.
Importing a new project
You can import an exported project into OpenRefine as follows:
- Click
Open...
in the top-right of the screen, which will take you back to OpenRefine’s main interface. - Select
Import Project
from the left-hand panel and, clickChoose files
and navigate to the.tar.gz
file in the window that opens. Click the file and selectOpen
(or just double-click the file). - The project will open. It include all of the raw data and the cleaning steps that were part of the original project.
Opening an existing project
When you open OpenRefine (or navigate to http://localhost:3333/
from an already open project),
you will see a list of projects already saved on your machine that you have created or imported earlier.
You can click on any one of them to open them and continue working on them in OpenRefine.
Exporting Cleaned Data
You can also export just your cleaned
data from an OpenRefine project (as opposed to the whole project), if you wish to save it in a form more suited to
further analysis (e.g. as CSV) and so that it can be used by other programs. For example, you might wish
to save the data into a .csv
file so that you can conduct further analysis using Python or R.
- Click
Export
in the top right and select the file type you want to exportComma-separated values
(csv
) is a good choice, because it’s a file type that can be read by most other data analysis programs.Tab-separated values
(tsv
) is also a popular format.
- The cleaned data will be exported to your default
Download
directory.
Using widely-supported, open, static and non-proprietary file formats like .csv
or .tsv
make it easier for yourself and others
to use your data in the future.
Key Points
Cleaned data or entire projects can be exported from OpenRefine.
Projects can be shared with collaborators, enabling them to see, reproduce and check all data cleaning steps you performed.