New category property on cluster, with a supervised category
clusters are now created within their respective category: first categorize all tweets, then create clusters within those categories if category is big enough.
If category is not big enough (smaller than 2 * MIN_CLUSTER_SIZE), all tweets are added to the cluster with isUnassigned=True
All tweets that were not assigned a cluster while clustering are also added to unassigned cluster
Filter all tweets for english-only (required for classifier)
contains a train script for training multilingual model as well, i did not have success with it yet (low accuracy)