Illinois Data Bank

Words_Selected_by_Information_Gain

File Name: WordsSelectedByInformationGain.csv
Data Preparation: Xiaoru Dong, Linh Hoang
Date of Preparation: 2018-12-12
Data Contributions: Jingyi Xie, Xiaoru Dong, Linh Hoang
Data Source: Cochrane systematic reviews published up to January 3, 2018 by 52 different Cochrane groups in 8 Cochrane group networks.
Associated Manuscript authors: Xiaoru Dong, Jingyi Xie, Linh Hoang, and Jodi Schneider.
Associated Manuscript, Working title: Machine classification of inclusion criteria from Cochrane systematic reviews.

Description: the file contains a list of 1655 informative words selected by applying information gain feature selection strategy.
Information gain is one of the methods commonly used for feature selection, which tells us how many bits of information the presence of the word are helpful for us to predict the classes, and can be computed in a specific formula [Jurafsky D, Martin JH. Speech and language processing. London: Pearson; 2014 Dec 30].We ran Information Gain feature selection on Weka -- a machine learning tool.

Notes: In order to reproduce the data in this file, please get the code of the project published on GitHub at: https://github.com/XiaoruDong/InclusionCriteria and run the code following the instruction provided.

Social Sciences
Inclusion criteria; Randomized controlled trials; Machine learning; Systematic reviews
CC0
U.S. National Institutes of Health (NIH)-Grant:R01LM010817
Linh Hoang
518 times
Version DOI Comment Publication Date
1 10.13012/B2IDB-9837167_V1 2018-12-20

15.2 KB File

Contact the Research Data Service for help interpreting this log.

Dataset update: {"version_comment"=>[nil, ""], "subject"=>[nil, "Social Sciences"]} 2019-01-02T16:30:07Z
Research Data Service Illinois Data Bank
Access and Use Policies Web Privacy Notice Contact Us