Library guides: Research data: Get started

Get started

Find and use data for secondary research

Secondary analysis of existing data is common practice in many academic fields. Existing data can be used to conduct new research, to test hypotheses or replicate findings from previous studies.

Existing data can complement your own data collection and provide rich enhancements including demographic, temporal or geospatial layers. Using existing data can save time and effort in the data collection process.

Researchers can access existing clinical trials data for health research, including evidence synthesis, secondary analysis, reproducibility, replication and validation studies, or education and methods development.

Australia's Productivity Commission report Data Availability and Use highlights the benefits of existing data for research, policy, decision making and innovation. Australian Research Data Commons (ARDC) provide case studies and examples on the many uses of existing data for research. The datasets available via this guide are good examples of F.A.I.R data, as they are findable, accessible, interoperable and reusable.

Use the discipline tabs to find existing data available publicly or to researchers by application and approval. Consider browsing outside your discipline as datasets from different scientific domains can be of value to diverse research projects.

For information on sources and tools for textual data, refer to the Text mining and analysis library guide.

For guidance on respectful and ethical access and use of data from research relating to First Nations peoples, refer to the Aboriginal and Torres Strait Islanders Research guide.

Evaluate data sources

Assessing secondary data is much like evaluating the quality of a research paper. Consider factors that relate to the reliability and validity of research results, such as whether:

the source is trusted
the sample characteristics, time of collection, and response rate (if relevant) of the data are appropriate
the methods of data collection are appropriate and acceptable in your discipline
the data were collected in a consistent way
any data coding or modification is appropriate and sufficient
the documentation of the original study in which the data were collected is detailed enough for you to assess its quality
there is enough information in the metadata or data to properly cite the original source.

A data reuser is often not familiar with the secondary data. Take some time to:

read user and technical manuals about how data collection was designed and carried out.
find out about any instruments used to collect the data.
read study protocols and interview/survey questions.
understand the characteristics of the sample from which the data was drawn.
find out if and how the data have been modified from their original form; e.g. have they been confidentialised, weighted, or treated for missing data?
find out what variables are included in the dataset and how these were constructed. (ARDC)
know the reuse restrictions for the data

More guidance on assessing existing data for secondary research is available from Rosinger & Ice (2019).

For advice on assessing data from research relating to First Nations peoples, including factors such as bias and diversity and complexity vs stereotypes and generalisation, refer to the Aboriginal and Torres Strait Islanders Research guide.

Use it Cite it

Published research data is cited in the same way as other scholarly outputs, with variation in styles and formats.

Griffith University Library Referencing Guides illustrates how to cite data according to common styles.

Ethics, copyright, licences, ownership

The same professional and ethical treatment is required for research that uses existing or new data. See Griffith University's Responsible Conduct of Research Policy for further details. In many cases access to external data is by application and reviewed by the custodian organization, who may assess the validity of the research project proposal and ethics approvals. Explore The Data Science Ethos tool for ways to apply an ethics centred approach to research projects.

Responsibilities for the Secondary Sharing of Clinical Trial Data in Australia (draft) guides researchers on legal and ethical responsibilities governing the secondary use of clinical trial data in Australia.

Terms and conditions will be placed on most external datasets which may include, use for research purposes only, statistical training or user competency, strict methods of de-identification, secure storage and destruction, attribution, and submission of findings or publications for review or approval.

Whilst raw data is not protected under Australian copyright law, the presentation of data, including images, tables, or database structures can still be protected by copyright and require expressed permission for publishing replications. Reach out to Griffith's Copyright and Information Policy Officer for advice on how to comply with copyright if reusing secondary data. See the Australian Bureau of Statistics (ABS) copyright example.

Some data may be available under Creative Commons or other open licenses, providing clear guidance how the data can be used and attributed. Most data from data.qld.gov.au is available under an open license.

Access to existing data can also be provided via contractual agreements, seek advice from the Office for Research on appropriate or standard terms and conditions.

Data analytics, visualisation techniques and computational methods

Training, specialist advice and support

Griffith University digital skills self-paced tutorials
See the Collect and analyse tab under Workshops for self-paced tutorials on LimeSurvey, REDCap at Griffith, Introduction to data wrangling with OpenRefine, and Advanced data wrangling with OpenRefine.

Griffith University data workshops for researchers via RED
Training includes, managing data, data cleaning and processing, data analysis, visualisation methods and tools, survey tools and more.

eResearch Services at Griffith University
Specialist IT services for researchers including high performance computing, research data storage, data collection tools, and programming workshops.

Hacky Hour
Get online help from eResearch & Library staff with OpenRefine, R, Python, SQL and Bash coding. Practice your new programming skills in a supportive environment. Learn about HPC or virtual machines, what is available and how to use them, and catch up with other researchers learning to wrangle their data. Access the online sessions each Thursday via the calendar.

External training
Programming Historian - novice-friendly, peer-reviewed online tutorials that help humanists learn a wide range of digital tools, techniques, and workflows to facilitate research and teaching. Available in English, Spanish and French.

Ebooks

Data Mining and Learning Analytics : applications in educational research by Samira ElAtia (Editor) et. al.
Publication Date: 2016
Big data for qualitative research by Kathy Mills
Publication Date: 2019
Storytelling with data: let's practice! by Cole Nussbaumer Knaflic and Catherine Madden
Publication Date: 2020
Practical statistics for data scientists: 50 essential concepts by Peter Bruce and Andrew Bruce
Publication Date: 2017
Trends of data science and applications: theory and practices by Siddharth Swarup Rautaray, Phani Pemmaraju and Hrushikesha Mohanty
Publication Date: 2021
Encyclopedia of data science and machine learning by John Wang
Publication Date: 2022
Practical Python data wrangling and data quality : Getting started with reading, cleaning, and analyzing data by Susan McGregor
Publication Date: 2021
Why external data needs to be part of your data and analytics strategy by Joseph Stec
Publication Date: 2022
R cookbook : proven recipes for data analysis, statistics, and graphics by JD Long and Paul Teetor
Publication Date: 2019
Python for data analysis: data wrngling with Pandas, NumPy and Jupyter by Wes McKinney
Publication Date: 2022
Twitter data analytics by Axel Bruns, Katrin Weller, and Dirk Lewandowski
Publication Date: 2014