Illinois Data Bank

Diversity - PubMed Dataset

A newer version of this dataset is available. View the latest version.

Diversity - PubMed dataset
Contact: Apratim Mishra (March 22, 2024)

This dataset presents article-level (pmid) and author-level (auid) diversity data for PubMed articles. The selection chosen includes articles retrieved from Authority 2018 [1], a total of 228 040 papers and 440 310 authors. The sample of papers is based on the top 40 journals in the dataset, limited to 2-10 authors published between 1990 – 2010, and stratified on paper count per year. Additionally, this dataset is limited to papers where the lead author is affiliated with one of the four countries: the US, the UK, Canada, and Australia. Files are encoded with ‘utf-8’.
################################################
File1: auids_plos.csv (Important columns defined, 7 in total)
• AUID: a unique ID for each author
• Ethnea: ethnicity prediction
• Genni: gender prediction
#################################################
File2: pmids_plos.csv (Important columns defined, 33 in total)
• pmid: unique paper ID
• year: Year of paper publication
• no_authors: Author count
• journal: Journal name
• years: first year of publication for every author
• age_bin: Binned age for every author
• Country-temporal: Country of affiliation for every author
• h_index: Journal h-index
• TimeNovelty: Paper Time novelty [2]
• nih_funded: Binary variable indicating NIH funding for any author
• prior_cit_mean: Mean of all authors’ prior citation rate
• Insti_impact_all: All authors’ respective institutions’ citation count
• Insti_impact: Maximum of all institutions’ citation count
• mesh_vals: Top MeSH values for every author for that paper
• outer_mesh_vals: MeSH qualifiers for every author for that paper
• relative_citation_ratio: RCR

The ‘Readme’ includes a description for all columns.
[1] Torvik, Vetle; Smalheiser, Neil (2021): Author-ity 2018 - PubMed author name disambiguated dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2273402_V1
[2] Mishra, Shubhanshu; Torvik, Vetle I. (2018): Conceptual novelty scores for PubMed articles. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5060298_V1

Social Sciences
Diversity; PubMed; Citation
CC BY
Apratim Mishra
229 times
Version DOI Comment Publication Date
3 10.13012/B2IDB-5259667_V3 updated data formulation 2024-10-10
2 10.13012/B2IDB-5259667_V2 expanded dataset 2024-08-19
1 10.13012/B2IDB-5259667_V1 2024-03-25

3.28 KB File
30.2 MB File
347 MB File

Contact the Research Data Service for help interpreting this log.

RelatedMaterial create: {"material_type"=>"Dataset", "availability"=>nil, "link"=>"https://doi.org/10.13012/B2IDB-5259667_V2", "uri"=>"10.13012/B2IDB-5259667_V2", "uri_type"=>"DOI", "citation"=>" (2024): Diversity - PubMed Dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5259667_V2", "dataset_id"=>2665, "selected_type"=>"Dataset", "datacite_list"=>"IsPreviousVersionOf", "note"=>nil, "feature"=>nil} 2024-08-16T14:43:01Z
Dataset update: {"version_comment"=>[nil, ""], "subject"=>[nil, "Social Sciences"]} 2024-03-25T18:36:50Z
Research Data Service Illinois Data Bank
Access and Use Policies Web Privacy Notice Contact Us