FAIR principle implementation

Here you will find guidance and useful examples on how to make your data FAIR. It begins with preparing and properly organizing your files, using preferred file formats, as well as ensuring that data set is described with essential metadata. For security and ethical considerations, researcher has to assess which data are to be published and not. Also, after publishing your data, it is advisable to link them to scientific articles and other research outputs, that way data shall gain more visibility, whereas articles - credibility. Also, you can find useful links to improve your research data management knowledge and skills, as well as find good practice examples.

Preferred file formats

The choice of a preferred file format is crucial in order to ensure that your data will be readable also in the future. Some file formats are more likely to allow long-term readability than others are. Such formats are usually:

non-proprietary
open, with documented international standards
using standard character encoding, preferably Unicode (e.g. UTF-8)
uncompressed (space permitting)

Source

Prepare your data

To increase the accessibility and reusability of spreadsheet data (i.e. large tables or raw data), they should adhere to the following best practices:

DO

Give each column a descriptive heading
Use a single header row
Ensure you have used the first cell, i.e. A1
Include a title and a legend to describe each spreadsheet
Save each data file with a name that appropriately reflects the content of that file
Deposit each table that is part of the dataset as a separate file
Deposit each worksheet as a separate file

DO NOT

Embed charts, comments or tables within a spreadsheet
Use color coding (machine-based data mining cannot interpret this)
Include special (i.e. non alphanumeric) characters within the spreadsheet, including commas
Use merged cells
Deposit multiple worksheets within a spreadsheet (such as those used in Microsoft Excel), as these are not supported by CSV and TAB formats

Source

Spreadsheets should be deposited in CSV or TAB format; EXCEPT if the spreadsheet contains variable labels, code labels, or defined missing values, as these should be deposited in SAV, SAS or POR format, with the variable defined in English.

More info

Clean data script for R

Describe and Document your data

In order to make your data and research replicable, you shall document your data, describe it, as well as attach documents alongside your data. Data should be coded using variables (for example, numeric). For that helps standard classifications commonly used by researchers in your field, that way your data become accessible and interoperable around the world. There can be cases that already existing classifications are not suitable for your data or could give inaccurate interpretations of your data, in that case you have to give detailed descriptions of your own approach (and add that to the data set).

Documentation

To make your data better understandable and replicable, it is essential to attach also relevant documentation. Most frequently these types of documents are being published alongside with data (but not only):

methodology descriptions - a description of your research methodology and methodology towards data collection/production
codebooks - a technical description of the data, describes variables and numbers used, structure of data sets and other contextual information depending on the field
questionnaires - especially in case of survey data it is significant to attach questionnaire files
laboratory notebooks and experimental protocols - notes documenting your research
software relevant documentation - in case of unusual and open source code software it is good practice to attach documentation and code
readme.txt – a file with the instructions to reproduce your analysis. See the sample Readme file

In cases of large projects researchers can publish also information of used taxonomies/ontologies (if these are not publicly available yet), different kind of mappings (in cases of wealth of data files), contextual information describing project, policies related to research topic etc. Also, in different research fields there are different traditions for documentation and necessary additional materials.

Name and organize your files

The way you name and organise your files holds a great significance to keep your research transparent and easy to manage, especially in cases when you have several projects and wealth of research data.

For file naming and organisation you shall follow these principles:

distinctive, human-readable title giving indication of the content
consistent pattern, preferably machine-friendly
organize files into directories with a consistent pattern
avoid repetition of semantic elements
file extension that matches the file format

Please ensure that all files are labelled clearly so readers will understand the contents of, and difference between, the files. For each file/group, we suggest you provide:

A single short title describing the content of the files
A more detailed legend describing each data set, so it is clear that the files are distinct and downloadable (including the explanation of any acronyms used in the dataset)
Collect all your data and put them in order, numbering them from 01 onwards. Note that you must occupy 01, 02...10, 12, 22, 34, etc.

Following good practice makes it much easier to find the right data file, not just for you, but also for your collaborators, and later on for other researchers who may re-use your data.

Some more tips for file naming:

Do not use spaces. Instead, use underscores (e.g. first_study), hyphens (e.g. first-study) or camel case (FirstStudy)
Avoid characters like \ / ? : * ” > < | : # % ” { } | ^ [ ] ` ~ æÆ øØ åÅ äÄ öÖ …
Use the international date convention YYYY-MM-DD (e.g. 2017-10-25)
The name of a file in original file format must be identical with the name of the corresponding file in preferred file format

The way your files should be organized depends on the file type and the discipline. You should follow best practice recommendations within your field.

Metadata

Without dataset metadata, a catalogue of published data could not exist. Many open data portals include the necessary tools to create dataset metadata when publishing new data. Some open data portals automatically update the metadata when editing datasets. Each dataset you publish will include many of the following metadata elements.

RSU Dataverse also has minimal requirements for needed metadata (see here).

Basic elements

Basic metadata elements provide the most important pieces of information to help visitors find data and determine if it is what they need. Many of these items will appear directly in catalogue navigation pages or search results.

Essential

Title: Human-readable name of the asset. Should be in plain English and include sufficient detail to facilitate search and discovery.

Description: Human-readable description (e.g., an abstract) with sufficient detail to enable a user to quickly understand whether the asset is of interest.

Author(s): main and other authors of the dataset, also by providing information about their affiliation and ORCID number.

Contact: persons who will be contacted in cases of inquiry (e-mail).

Field of Science: main field of science according to the classification given.

Keywords: Tags (or keywords) help users discover your dataset; please include terms that would be used by technical and non-technical users. Can include also keyword controlled vocabularies.

Publisher: For example, Riga Stradins University Dataverse

Unique identifier: DOI (This DOI will be assigned automatically when the dataset is published in the RSU Dataverse).

Public Access Level: The degree to which this dataset could be made publicly-available, regardless of whether it has been made available. Choices in RSU Dataverse: open (anyone is able to access data without restrictions);restricted, request access (access is restricted, but request with collaboration proposal could be submitted to the authors); restricted, no access (access is denied, files will be opened only through contact person to the authors). If there is an embargo period planned, it has to be noted and justified.

License: In case of RSU Dataverse, Public Domain status will be attributed automatically.

Language: The language of the dataset.

Production date and production place: date and place when the dataset was produced (not published).

Date of collection: time period when the data were collected/generated.

Kind of data: for example, survey data or clinical data.

Version number: most recent date on which the dataset was changed, updated or modified, including information of main changes.

Optional

Grant information: funding organisation, project number/ID, as well as project title.

Time period covered: Period to which the data corresponds (especially for historical data)

Software: indication of the relevant software which should be used to open files.

Related materials and data sets: publications and other data sets which are related to this one (providing DOI or a link).

Other

There are many other elements which shall be addressed in metadata. If you want to make your data set more findable and accessible, you can indicate classifications, controlled vocabularies, taxonomies and ontologies you use, geographical data etc. based on your research field and metadata standards required. If you want to add more metadata for your data set in RSU Dataverse, write to dataversersu[pnkts]lv.

Data that cannot be shared

There are cases when it is impossible to publish your data. Please always check with your experienced colleagues or data manager in cases when your data consists of personal data or sensitive information. There can also be different scenarios, depending of your research topics, funding agreement or contracts with industry.

Data that cannot be shared:

personal data that can identify individual
trade secrets
security rules
intellectual property rights
large (not feasible to deposit)

In these cases researchers have to provide:

extensive metadata (not including confidential information)
justification of the restrictions
conditions to apply for access to the data

Source

In case the data contains personal data, make sure to anonymize it properly. We suggest using the R anonymizer package. For special cases also sensitive information may be coded and grouped to escape identification of individuals.

If you are unable to share your data for any reason not included here, or have additional questions about data sharing, please contact datukuratorirsu[pnkts]lv.

Link Your Datasets to Your Article

Sharing information on your data enables validation and transparency of the scientific articles and their contents, as well as it can lead to new cooperation initiatives.

When submitting your article to the journal you are encouraged to link it to the data sets used to create it. Depending on the journal, mostly it shall be possible by indicating the unique identifier of the data set or other accession information. All the most relevant journal databases have enabled that as well. In cases when your data cannot be publicly accessible, you still can add unique identifier and a statement why your data is with restricted access.

Also, once your article is published, we strongly advise that you update your data set with the DOI for your article, which will be emailed to you upon article publication. Linking your data to your article will enable your data and article to be reciprocally connected, ensuring you receive credit for your work.

RSU researchers are invited to inform RSU Dataverse (dataversersu[pnkts]lv) about new articles or data sets related to your deposited data set also after depositing your data set. The same goes to Pure system where it is possible to add new links to scientific articles to already registered data sets.

Useful Links

FAIR principle implementation

DO

DO NOT

Documentation

Basic elements

General guidelines and training

Preparing data

Metadata