Get help

Text mining and analysis

A curated list of licenced and open text mining resources and tools.

Tools

The following tools and platforms can assist you with analysing texts and creating visualisations. Identify the text analysis methods and level of coding skill required for your research project when deciding on which tool to use. A number cloud-based platforms are available to researchers where you can access data or upload your own, then process and analyse text using inbuilt tools and programming.

 

Commercially Licensed Tools and Software Description, training and Griffith access information
Digital Scholar Lab (Gale)

A one-stop virtual research environment to clean and text mine content from Griffith University’s Gale primary source holdings or from a user’s uploaded plain text files. Sign in with your Griffith Microsoft credentials to build, clean and analyse a content set. The tools available in the Digital Scholar lab are plug and play. Analysis tools are available for document clustering, named entity recognition, ngrams, parts of speech, sentiment analysis and topic modelling.  

Try our self-paced Digital Scholar Lab and Gale Primary Sources tutorial.

Leximancer

Leximancer uses statistics-based algorithms to automatically analyse text and creates concept maps, network clouds and concept thesauruses to visualise the analysis outputs. Concepts and their interrelationships can be identified without the need for manual intervention.

Available through Griffith’s software download service.

Researcher Education and Development offer Leximancer training.

NVivo

nVivo is a qualitative data analysis software program used to organise and analyse on a deep level qualitative data like interviews, open-ended survey responses, journal articles, social media and web content. The researcher is required to code the data and develop themes and categories during the analysis process.

Available through Griffith’s software download service

Researcher Education and Development offer access to NVivo training.

 

Free and Open Source Tools Description and training information
Constellate

A free text and data analysis platform from JSTOR. While it does not require your Griffith log in, using them allows you to save and build larger datasets. Log in via the dashboard.

Cytoscape

An open-source bioinformatics and network visualisation tool. Primarily designed for scientists who look at the statistical measures of networks however, it is increasingly being used by Digital Humanists for viewing and analysing networks in humanist data. 

Try these Cytoscape tutorials.

Gephi

An open-source network data analysis and visualisation tool used to explore and visualise all kinds of graphs and networks.

Learn how to use Gephi.

InfraNodus

A text network analysis tool that can be used for top modelling, detecting the main topics or influential terms in any text. Programming knowledge is needed to install the free open-source version

Learn how to use InfraNodus.

Language Technology and Data Analysis Laboratory (LADAL)

A free, open-source, collaborative support infrastructure for digital and computational humanities established at the University of Queensland. Researchers can undertake data processing, visualization, and analysis with guidance on matters relating to language technology and digital research tools. LADAL offers introductions to topics and concepts related to digital and computational humanities, online tutorials, interactive Jupyter notebooks, and events including workshops and webinar series.
Natural Language Toolkit (NLTK)

A free, open source platform for building Python programs to work with human language data. Suitable for linguists, engineers and researchers who want to work in computational linguistics using Python and contains text processing libraries for tokenization, parsing, classification, stemming, tagging and semantic reasoning.

Python coding skills needed, see Support and Training.

Learn to use Python's NLTK

Orange

A free open source graphical user interface for data analysis and visualisation using Python. The interface is intuitive and visual, allowing you to drag-and-drop widgets and connect them up to create your data analysis workflows. 

Download and Get Started

Scikit- Learn

A Python machine learning library containing a vast collection of machine learning algorithms. It also contains easy-to-use tools to perform both tokenization and feature extraction of your text data.

Python coding skills needed, see Support and Training.

spaCy

A free open source library for Natural Language Processing (NLP) in Python. Useful for cleaning text, that is, creating corpa, tokenising, lemmanising, and removing stop words, and for analysing text data. It’s built for production use and provides a concise and user-friendly API. 

Python coding skills needed, see Support and Training.

Learn to use spaCy.

Text Analyser 

JSTOR Labs provides this text analyser tool in beta. Drag and drop files or copy and paste text directly to identify key topics and terms. Find similar content in JSTOR with terms the tool has prioritised from uploaded content.

Learn to use Text Analyser

Vosviewer

Used to construct and visualise bibliometric networks based on citation, bibliographic coupling, co-citation or co-authorship relations. The text mining functionality can be used to construct and visualise co-occurring networks of key terms from a body of scientific literature.

Get started.

Voyant

An open source, web-based plug and play environment for simple text analysis and visualisation. It accepts multiple file formats and urls or text can be copied and pasted directly in to the search box. Choose from over 20 tools to conduct analyses such as term frequencies, collocation, topic modelling and concordances and produce visualisations such as word clouds, network and line graphs or tables.

It is an open source project and the code is available through GitHub. Do not use personal, confidential or sensitive data in Voyant Tools.

Learn how to use Voyant tools.