Skip to content

YuRuM0/ML4NLP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

ML4NLP

This is a ReMA course at Vrije Universiteit, more information on cltl.

main.py

The main.py file includes the functions that creates the classifier based on the given input, generates predictions based on the file that is provided through the terminal and evaluates by producing a classifiation metrics.

Note:

In the main function, the path to the new_train_file and trainfile is set. After downloading the files conll2003.train.conll and new.conll2003.train.conll, the path to these files must be updated in the main function, depending on the file path on the local computer. The training, development and test file are present in the Results folder.

Inputs

The main function requires 3 inputs, with one optional input, which is only required in the following cases when training a classifier without word embeddings as feautures.

1.model The first input is model, it is a required input and has three choices: 'LR', 'NB' and 'SVM'. Each labels refer to Logistic Regression, Naive Bayes and Support Vector Machine. This can be specified by writing the model type after --model on the terminal.

  1. test_or_dev_file The second input allows the user to manually write in the path to the development or test file that will be used to predict the labels and evaluate the performance on. This is a required argument and it is recommended to use the new.conll2003.dev.conll file that is located under the Results folder.

  2. use_word_embedding This argument decides whether or not the classifier will use the word embeddings, and if so what kind. This is a required parameter to provide, which has 3 choices, yes, no and mixed. Mixed means that the sparse and dense features will be combined and vectorised when creating the classifier.

  3. outputfile The outputfile specifies the path on where the predictions will be saved on. This will only be used when the predictions are made based on the models without word embeddings.

Final_Assignment.ipynb

This jupyter notebook is a complete notebook file that includes feature exploration, creating baseline model, creating models with elaborate features and embeddings, feature ablation, hyperparameter tuning and error analysis.

About

This is a ReMA course at Vrije Universiteit, more information on cltl.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors