Skip to content

Actual results lower than 90.9 #8

@kermitt2

Description

@kermitt2

Nice work, thanks !

Using GloVe embeddings as indicated and increasing the number of epochs to 70 without touching anything else, I obtained a f-score of 89.16 averaged over 10 runs (to take into account the impact of random seeds), best f-score is 89.7, worst is 88.5. So my results are far below what is reported in the readme 90.9 (I tried to raise the number of epochs to 80 without change)

If we consider that the results of (Chui & Nichols, 2016) are reproducible, they report 90.88 f-score with a model without lexical features (Table 6, emb + char + caps). One difference with your implementation I think is that they report results with models trained with the validation set, which normally increase f-score by +.0.3 and +0.4, while you are not using the validation for training.

Another point is that they report f-score average over 10 runs.

Trying to reproduce (Chui & Nichols, 2016), so far I've never been able to get above 90.0 f-score with their architecture and hyper-parameters, and so I got similar f-score as your implementation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions