Interface KeywordExtractor
-
- All Implemented Interfaces:
public interface KeywordExtractorImplementations can extract keywords from text
-
-
Method Summary
Modifier and Type Method Description abstract Set<String>extractKeywords(String text)Extract keywords from the given text DoublematchCountToScore(Integer matchCount)Converts a match count to a similarity score between 0 and 1. abstract Set<String>getKeywords()All known keywords -
-
Method Detail
-
extractKeywords
abstract Set<String> extractKeywords(String text)
Extract keywords from the given text
- Parameters:
text- the text to extract keywords from- Returns:
the set of extracted keywords
-
matchCountToScore
Double matchCountToScore(Integer matchCount)
Converts a match count to a similarity score between 0 and 1.
This default implementation uses an exponent of 0.4 to provide nonlinear scoring that is generous to partial matches while still giving diminishing returns. For example, matching 2 out of 15 keywords yields ~0.45 rather than 0.13, reflecting that even partial matches can be quite valuable.
The formula is: (matchCount / totalKeywords)^0.4
This approach aligns with information retrieval principles where early matches are most significant, but avoids being overly harsh on documents that match only a subset of keywords.
- Parameters:
matchCount- the number of keywords that matched- Returns:
a similarity score from 0.0 to 1.0
-
getKeywords
abstract Set<String> getKeywords()
All known keywords
-
-
-
-