Whilst at the Forum, we took the opportunity to meet with colleagues at the OECD to discuss various topics of mutual interest, including the new OECD Data Portal Application Program Interface (with SDMX export functionality), the opening up of OECD data and we were able to attend the Statistical Information Systems Collaboration Community meeting face to face – where we discussed each Member organisation’s migration to a new version of DotStat (see UKDS.Stat blog post for further info!).
Metadata issues
Big data also present metadata challenges. Variables may not be labelled, or the variables may not match the documentation (if there is any). Similarly, if categorical variables are included, they may not israel rcs data have value labels and there may be no explanation of derived data. The existence of good metadata is paramount; it is needed to explain the content of data, its provenance and context. For example, the term ’employment’ in one dataset may not mean the same as that term in the context of another dataset. It is necessary to understand whether different employment coding frames have been used and if they can be matched between the two datasets. Metadata also allow a researcher to ‘match’ multiple disparate sources. An example illustrating the importance of using metadata for big data is the Energy Demand Research Project: Early Smart Meter Trials, 2007-2010 (study number 7591) which covers 14,621 households, 246m gas meter readings, 413m electricity meter readings, all collected every half hour.