In README, you said: “The model produces a test F1_score of 90.9 % with ~70 epochs. The results produced in the paper for the given architecture is 91.14 ” In fact, the paper said the result 91.14 is produced under the situation "All other hyper-parameters and features remain the same as our best model in Table 5", that is , lex feature is used, while you do not use that feature, so this architecture can not reach 91.14.