Text Analysis Project (TAP)

The ITPS leads a multi-disciplinary project with the Computer Science, English, and History departments, which develops novel methodologies for (semi) automated software-based identification of the creator(s) of historical documents, whose authorship is either unknown or disputed. The project uses advanced natural language processing and machine learning techniques to identify and learn the writing styles of known eighteenth century authors. It then compares the style of the writer of an unattributed document to the known authors’ styles, identifying a potential match. The project has clarified much of the Paine Canon, and contributed numerous new works to it, thereby adding to the field of computer author attribution methodology. This project recently began widening its scope beyond Thomas Paine in order to pursue a wider corpus of writers in the late eighteenth century, especially involving newspaper publication in the 1790s.

To learn more about the project and how to use the TAP software, watch the instructional videos below produced by ITPS Coordinator Gary Berton, in which the process and methodology of TAP are explained in detail. Special thanks to Dr. Smiljana Petrovic and Dr. Lubomir Ivanov of Iona University, and Iona alumnus Sean Campbell for adapting the Java Graphical Author Attribution Program (JGAAP) in developing TAP, and maintaining the project’s source code.

Access the TAP files here: https://github.com/ionacollege/itps-tap/tree/master/jgaap-6.0.0

TAP Instructional Video 1: Creating Author Files

ITPS Coordinator Gary Berton Discusses TAP Methodology, Creating Author Files

TAP Instructional Video 2: Generating Author Packages

ITPS Coordinator Gary Berton Discusses TAP Methodology, Generating Author Packages

TAP Instructional Video 3: Testing Documents

ITPS Coordinator Gary Berton Discusses TAP Methodology, Testing Documents

TAP Instructional Video 4: Analyzing Graphs

ITPS Coordinator Gary Berton Discusses TAP Methodology, Analyzing Graphs