-
Notifications
You must be signed in to change notification settings - Fork 142
Description
Nice work, thanks !
Using GloVe embeddings as indicated and increasing the number of epochs to 70 without touching anything else, I obtained a f-score of 89.16 averaged over 10 runs (to take into account the impact of random seeds), best f-score is 89.7, worst is 88.5. So my results are far below what is reported in the readme 90.9 (I tried to raise the number of epochs to 80 without change)
If we consider that the results of (Chui & Nichols, 2016) are reproducible, they report 90.88 f-score with a model without lexical features (Table 6, emb + char + caps). One difference with your implementation I think is that they report results with models trained with the validation set, which normally increase f-score by +.0.3 and +0.4, while you are not using the validation for training.
Another point is that they report f-score average over 10 runs.
Trying to reproduce (Chui & Nichols, 2016), so far I've never been able to get above 90.0 f-score with their architecture and hyper-parameters, and so I got similar f-score as your implementation.