Implement TextRank algorithm in TypeScript
TextRank algorithm was introduced by Rada Mihalcea and Paul Tarau in their paper “TextRank: Bringing Order into Texts” in 2004. It applies the same principle that Google’s PageRank used to discover relevant web pages.
The idea is to split a text into sentences, and then calculate a score for each sentence in terms of its similarity to the other sentences. TextRank treats sentences having common words as a link between them (like hyperlinks between web pages). Then, it applies a weight to that link based on how many words the sentences have in common. ts-textrank uses Sorensen-Dice Similarity for this.
The sentences with the higher score will be those that share the most words with the rest and can be used as a summary of the whole text.
ts-textrank implements the TextRank algorithm in TypeScript, and is published as a NPM module.