About Constellate¶

We’ll be using some resources created by ITHAKA to create our dataset and run text analysis on it. If you’re unfamiliar with ITHAKA, JSTOR, which you probably have heard of, is one of their organizations.

ITHAKA has created a resource called Constellate that allows you to build and analyze datasets and perform text analysis on their materials as well as some other datasets. As of this writing, that’s more than 32 million documents! They’ve been developing some really useful resources for text analysis, which are still in sort of a beta mode, and so we’re one of the groups that get to experiment with these.

What will I be searching?

The text mining platform includes all of JSTOR plus the content from those Portico publishers who choose to participate (currently, 30 publishers including John Wiley & Sons, Inc., Project Muse, Thieme Publishing Group, and Hindawi).
The platform includes additional content from Chronicling America (historic newspapers), DocSouth (materials from and about the American South), Reveal Digital (primary source materials from underrepresented voices), and others. Discussions with more content providers are underway.

Note: Northeastern has an agreement in place with ITHAKA, so we have permission to use these materials.

Python and Text Analysis for Absolute Beginners

About Constellate¶