Open Refine is a free and open-source tool for performing mass transformations on tabular data.
OpenRefine was originally developed as proprietary software by Metaweb under the name "Freebase Gridworks." It was later renamed "Google Refine" following Google's acquisition of Metaweb in 2010, at which point Google made its source code open under the "modified" BSD License. The software was renamed "OpenRefine" in 2012 when it transitioned to a community-supported project.
The source code for OpenRefine is available on GitHub, but the project continues to be supported by the Google News Initiative.
Help students explore complex data
While OpenRefine is not spreadsheet software itself, it uses a similar interface, enabling students to investigate complex data using a familiar spreadsheet interface.
Teach common data-wrangling techniques and best practices
OpenRefine's features include the ability to perform common data-cleaning actions with one click, including removal of leading/trailing whitespace or splitting multiple values stored in one cell/column into multiple cells/columns. These along with the software's more advanced features provide an opportunity for instructors to introduce common data-wrangling practices and concepts such as tidy data.
Teach concepts of version control and reproducibility
OpenRefine tracks all changes made to data, allowing users to see and reproduce every step taken to transform the data from its original "raw" form. This feature of the software offers instructors the chance to introduce concepts of file versioning and reproducibility.
How to use it
In addition to allowing users to perform simple tasks such as removing duplicate rows, fixing typos everywhere they occur, or transforming textual representations of currency or dates into appropriate numeric values or standard date formats, OpenRefine has a few more powerful capabilities: 1. Clustering: condense all variations and misspellings of a value into a single entity. 2. Filtering/Sorting: view only a subset of the data that matches particular criteria, make changes to matching rows, or perform custom sorts by multiple columns. 3. Faceting: view the distribution of values across a dataset by "faceting" on a column (counts unique values within a column or visualizes the distribution of values within a column).
- Cleaning Data with OpenRefine, by Seth van Hooland, Ruben Verborgh, and Max De Wilde on The Programming Historian
- OpenRefine for Social Science Data, on Data Carpentry
Commercial software for tabular data transformation, including Microsoft Excel, Apple Numbers, or Trifacta Wrangler. Also offers an interactive user interface in place of command-line or scripting approaches to interactive data transformation.
RELATED RESEARCH AND PUBLICATIONS
RELATED PAGES AROUND THE WEB