Actual results lower than 90.9

Nice work, thanks !

Using GloVe embeddings as indicated and increasing the number of epochs to 70 without touching anything else, I obtained a f-score of 89.16 averaged over 10 runs (to take into account the impact of random seeds), best f-score is 89.7, worst is 88.5. So my results are far below what is reported in the readme 90.9 (I tried to raise the number of epochs to 80 without change)

If we consider that the results of (Chui & Nichols, 2016) are reproducible, they report 90.88 f-score with a model without lexical features (Table 6, emb + char + caps). One difference with your implementation I think is that they report results with models trained with the validation set, which normally increase f-score by +.0.3 and +0.4, while you are not using the validation for training.  

Another point is that they report f-score average over 10 runs. 

Trying to reproduce (Chui & Nichols, 2016), so far I've never been able to get above 90.0 f-score with their architecture and hyper-parameters, and so I got similar f-score as your implementation.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Actual results lower than 90.9 #8

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Actual results lower than 90.9 #8

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions