Dataset Search

Displaying 201 - 225 of 1004 in total

Filters

Subject Area

Life Sciences (616)

Social Sciences (148)

Physical Sciences (143)

Technology and Engineering (84)

Uncategorized

Arts and Humanities (2)

Funder

Other (270)

U.S. National Science Foundation (NSF) (242)

U.S. Department of Energy (DOE) (239)

U.S. National Institutes of Health (NIH) (88)

U.S. Department of Agriculture (USDA) (62)

Illinois Department of Natural Resources (IDNR) (26)

U.S. Geological Survey (USGS) (8)

U.S. National Aeronautics and Space Administration (NASA) (6)

Illinois Department of Transportation (IDOT) (4)

U.S. Army (3)

Publication Year

2025 (288)

2021 (108)

2022 (106)

2024 (105)

2020 (96)

2023 (75)

2019 (72)

2018 (61)

2017 (36)

2016 (30)

2026 (22)

2009 (1)

2011 (1)

2012 (1)

2014 (1)

2015 (1)

License

CC BY (514)

CC0 (463)

custom (27)

Illinois Data Bank Dataset Search Results

Results

published: 2024-11-19

Dataset for Reassessment of the agreement in retraction indexing across 4 multidisciplinary sources: Crossref, Retraction Watch, Scopus, and Web of Science

Salami, Malik Oyewale; McCumber, Corinne (2024)

This project investigates retraction indexing agreement among data sources: Crossref, Retraction Watch, Scopus, and Web of Science. As of July 2024, this reassesses the April 2023 union list of Schneider et al. (2023): https://doi.org/10.55835/6441e5cae04dbe5586d06a5f. As of April 2023, over 1 in 5 DOIs had discrepancies in retraction indexing among the 49,924 DOIs indexed as retracted in at least one of Crossref, Retraction Watch, Scopus, and Web of Science (Schneider et al., 2023). Here, we determine what changed in 15 months. Pipeline code to get the results files can be found in the GitHub repository https://github.com/infoqualitylab/retraction-indexing-agreement in the iPython notebook 'MET-STI2024_Reassessment_of_retraction_indexing_agreement.ipynb' Some files have been redacted to remove proprietary data, as noted in README.txt. Among our sources, data is openly available only for Crossref and Retraction Watch. FILE FORMATS: 1) unionlist_completed_2023-09-03-crws-ressess.csv - UTF-8 CSV file 2) unionlist_completed-ria_2024-07-09-crws-ressess.csv - UTF-8 CSV file 3) unionlist-15months-period_sankey.png - Portable Network Graphics (PNG) file 4) unionlist_ria_proportion_comparison.png - Portable Network Graphics (PNG) file 5) README.txt - text file FILE DESCRIPTION: Description of the files can be found in README.txt

keywords: retraction status; data quality; indexing; retraction indexing; metadata; meta-science; RISRS

published: 2025-10-10

Data for Hydrothermal Pretreatment for Valorization of Genetically Engineered Bioenergy Crop for Lipid and Cellulosic Sugar Recovery

Singh, Ramkrishna; Liu, Hui; Shanklin, John; Singh, Vijay (2025)

Lipids accumulated in the vegetative tissues of cellulosic feedstocks can be a potential raw material for biodiesel and bioethanol production. In this work, bagasse of genetically engineered sorghum was subjected to liquid hot-water pretreatment at 170, 180, and 190 °C for different reaction time. Under the optimal pretreatment condition (170 °C, 20 min), the residue was enriched in glucan (57.39 ± 2.63 % w/w) and xylan (13.38 ± 0.49 % w/w). The total lipid content of the pretreated residue was 6.81% w/w, similar to that observed in untreated bagasse (6.30% w/w). Pretreatment improved the enzymatic digestibility of bagasse, allowing a recovery of 79% w/w and 86% w/w of glucose and xylose, respectively. The pretreatment and enzymatic saccharification resulted in a 2-fold increase in total lipid in enzymatic residue compared to the original bagasse. Thus, pretreatment and enzymatic hydrolysis enabled high sugar recovery while concentrating triglycerides and free fatty acids in the residue.

keywords: Conversion;Feedstock Production;Feedstock Bioprocessing

published: 2018-11-21

Scripts for testing the error rate of polyRAD

Clark, Lindsay V.; Lipka, Alexander E.; Sacks, Erik J. (2018)

This set of scripts accompanies the manuscript describing the R package polyRAD, which uses DNA sequence read depth to estimate allele dosage in diploids and polyploids. Using several high-confidence SNP datasets from various species, allelic read depth from a typical RAD-seq dataset was simulated, then genotypes were estimated with polyRAD and other software and compared to the true genotypes, yielding error estimates.

keywords: R programming language; genotyping-by-sequencing (GBS); restriction site-associated DNA sequencing (RAD-seq); polyploidy; single nucleotide polymorphism (SNP); Bayesian genotype calling; simulation

published: 2018-12-20

Inclusion_Criteria_Annotation

Dong, Xiaoru; Xie, Jingyi; Linh, Hoang (2018)

File Name: Inclusion_Criteria_Annotation.csv Data Preparation: Xiaoru Dong Date of Preparation: 2018-12-14 Data Contributions: Jingyi Xie, Xiaoru Dong, Linh Hoang Data Source: Cochrane systematic reviews published up to January 3, 2018 by 52 different Cochrane groups in 8 Cochrane group networks. Associated Manuscript authors: Xiaoru Dong, Jingyi Xie, Linh Hoang, and Jodi Schneider. Associated Manuscript, Working title: Machine classification of inclusion criteria from Cochrane systematic reviews. Description: The file contains lists of inclusion criteria of Cochrane Systematic Reviews and the manual annotation results. 5420 inclusion criteria were annotated, out of 7158 inclusion criteria available. Annotations are either "Only RCTs" or "Others". There are 2 columns in the file: - "Inclusion Criteria": Content of inclusion criteria of Cochrane Systematic Reviews. - "Only RCTs": Manual Annotation results. In which, "x" means the inclusion criteria is classified as "Only RCTs". Blank means that the inclusion criteria is classified as "Others". Notes: 1. "RCT" stands for Randomized Controlled Trial, which, in definition, is "a work that reports on a clinical trial that involves at least one test treatment and one control treatment, concurrent enrollment and follow-up of the test- and control-treated groups, and in which the treatments to be administered are selected by a random process, such as the use of a random-numbers table." [Randomized Controlled Trial publication type definition from https://www.nlm.nih.gov/mesh/pubtypes.html]. 2. In order to reproduce the relevant data to this, please get the code of the project published on GitHub at: https://github.com/XiaoruDong/InclusionCriteria and run the code following the instruction provided.

keywords: Inclusion criteria, Randomized controlled trials, Machine learning, Systematic reviews

published: 2020-06-02

NEXUS file for phylogenetic analysis of Eurymelinae (Hemiptera: Cicadellidae)

Xue, Qingquan; Dietrich, Christopher; Zhang, Yalin (2020)

The text file contains the original data used in the phylogenetic analyses of Xue et al. (2020: Systematic Entomology, in press). The text file is marked up according to the standard NEXUS format commonly used by various phylogenetic analysis software packages. The file will be parsed automatically by a variety of programs that recognize NEXUS as a standard bioinformatics file format. The first six lines of the file identify the file as NEXUS, indicate that the file contains data for 89 taxa (species) and 2676 characters, indicate that the first 2590 characters are DNA sequence and the last 86 are morphological, that gaps inserted into the DNA sequence alignment and inapplicable morphological characters are indicated by a dash, and that missing data are indicated by a question mark. The file contains aligned nucleotide sequence data for 5 gene regions and 86 morphological characters. The positions of data partitions are indicated in the mrbayes block of commands for the phylogenetic program MrBayes at the end of the file (Subset1 = 16S gene; Subset2 = 28S gene; Subset3 = COI gene; Subset 4 = Histone H3 and H2A genes). The mrbayes block also contains instructions for MrBayes on various non-default settings for that program. These are explained in the original publication. Descriptions of the morphological characters and more details on the species and specimens included in the dataset are provided in the supplementary document included as a separate pdf, also available from the journal website. The original raw DNA sequence data are available from NCBI GenBank under the accession numbers indicated in the supplementary file.

keywords: phylogeny; DNA sequence; morphology; Insecta; Hemiptera; Cicadellidae; leafhopper; evolution; 28S rDNA; 16S rDNA; histone H3; histone H2A; cytochrome oxidase I; Bayesian analysis

published: 2024-03-21

TextTransfer: Datasets for Impact Detection

Becker, Maria; Han, Kanyao; Werthmann, Antonina; Rezapour, Rezvaneh; Lee, Haejin; Diesner, Jana; Witt, Andreas (2024)

Impact assessment is an evolving area of research that aims at measuring and predicting the potential effects of projects or programs. Measuring the impact of scientific research is a vibrant subdomain, closely intertwined with impact assessment. A recurring obstacle pertains to the absence of an efficient framework which can facilitate the analysis of lengthy reports and text labeling. To address this issue, we propose a framework for automatically assessing the impact of scientific research projects by identifying pertinent sections in project reports that indicate the potential impacts. We leverage a mixed-method approach, combining manual annotations with supervised machine learning, to extract these passages from project reports. This is a repository to save datasets and codes related to this project. Please read and cite the following paper if you would like to use the data: Becker M., Han K., Werthmann A., Rezapour R., Lee H., Diesner J., and Witt A. (2024). Detecting Impact Relevant Sections in Scientific Research. The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING). This folder contains the following files: evaluation_20220927.ods: Annotated German passages (Artificial Intelligence, Linguistics, and Music) - training data annotated_data.big_set.corrected.txt: Annotated German passages (Mobility) - training data incl_translation_all.csv: Annotated English passages (Artificial Intelligence, Linguistics, and Music) - training data incl_translation_mobility.csv: Annotated German passages (Mobility) - training data ttparagraph_addmob.txt: German corpus (unannotated passages) model_result_extraction.csv: Extracted impact-relevant passages from the German corpus based on the model we trained rf_model.joblib: The random forest model we trained to extract impact-relevant passages Data processing codes can be found at: https://github.com/khan1792/texttransfer

keywords: impact detection; project reports; annotation; mixed-methods; machine learning

published: 2025-06-30

Desiccation rate of white-tailed deer (Odocoileus virginianus) retropharyngeal lymph nodes

Mori, Jameson; Skowron, Nicholas; Barr, Daniel; Johnson, Ben; Novakofski, Jan; Mateus-Pinilla, Nohra (2025)

This dataset contains measurements of water loss as white-tailed deer (Odocoileus virginianus) retroypharyngeal lymph nodes air-dried in a refrigerator for 31 days. Daily weights for lymph nodes are recorded every 24 hours, as are the variables "firmness" and "surface wetness". "Firmness" is a categorical variable measuring how much the tissue deforms to the touch (soft, medium, or hard). "Surface wetness" is the amount of visible moisture on the outside of the lymph node (all, some, or none). Lymph node weights were measured until their weights stabilized for 3 consecutive days at two decimal places (ex. 3.02, 3.02, 3.02) or until the weights fluctuated only by 0.01 (ex. 3.02, 3.03, 3.02). Lymph nodes were from northern Illinois white-tailed deer collected as part of the Illinois Department of Natural Resources' ongoing chronic wasting disease (CWD) management efforts.

keywords: cervid; lymph node; chronic wasting disease; cwd; diagnostic testing; dessication; drying; tissue

published: 2025-10-24

Data for MALDI-MS Screening of Microbial Colonies With Isomer Resolution to Select Fatty Acid Desaturase Variants

Choe, Kisurb; Jindra, Michael A.; Hubbard, Susan; Pfleger, Brian; Sweedler, Jonathan (2025)

Creating controlled lipid unsaturation locations in oleochemicals can be a key to many bioengineered products. However, evaluating the effects of modifications to the acyl-ACP desaturase on lipid unsaturation is not currently amenable to high-throughput assays, limiting the scale of redesign efforts to <200 variants. Here, we report a rapid mass spectrometry (MS) assay for profiling the positions of double bonds on membrane lipids produced by Escherichia coli colonies after treatment with ozone gas. By MS measurement of the ozonolysis products of Δ6 and Δ8 isomers of membrane lipids from colonies expressing recombinant Thunbergia alata desaturase, we screened a randomly mutagenized library of the desaturase gene at 5 s per sample. Two variants with altered regiospecificity were isolated, indicated by an increase in 16:1 Δ8 proportion. We also demonstrated the ability of these desaturase variants to influence the membrane composition and fatty acid distribution of E. coli strains deficient in the native acyl-ACP desaturase gene, fabA. Finally, we used the fabA deficient chassis to concomitantly express a non-native acyl-ACP desaturase and a medium-chain thioesterase from Umbellularia californica, demonstrating production of only saturated free fatty acids.

keywords: Conversion;Lipidomics;Mass Spectrometry

published: 2025-07-28

An attempt to identify 199 PubMed items excluded from "Analyzing the consistency of retraction indexing"

McCumber, Corinne; Salami, Malik Oyewale (2025)

This project investigates retraction indexing agreement in PubMed between 2024-07-03 and 2025-05-09 in order to address an API limitation that resulted in 199 items being excluded from analysis in "Analyzing the consistency of retraction indexing". PubMed was queried on 2024-07-03 and on 2025-05-09 using the search “Retracted Publication[PT]”. PubMed is only able to return 10,000 items when queried via the E-Utilities API. When the pipeline was run 2024-07-03, the search between 2020 and 2024 returned 10,199 items, meaning that an expected 199 items indexed as retracted in PubMed were excluded. This dataset uses and compares information from PubMed as of 2025-05-09 to attempt to identify those 199 items.

keywords: retraction status; data quality; indexing; retraction indexing; metadata; meta-science; RISRS; PubMed

published: 2025-11-06

Data for Scale-up of Microbial Lipid and Bioethanol Production from Oilcane

Deshavath, Narendra Naik; Woodruff, William; Eller, Fred; Susanto, Vionna; Yang, Cindy; Rao, Christopher V.; Singh, Vijay (2025)

Microbial oils are a sustainable biomass-derived substitute for liquid fuels and vegetable oils. Oilcane, an engineered sugarcane with superior feedstock characteristics for biodiesel production, is a promising candidate for bioconversion. This study describes the processing of oilcane stems into juice and hydrothermally pretreated lignocellulosic hydrolysate and their valorization to ethanol and microbial oil using Saccharomyces cerevisiae and engineered Rhodosporidium toruloides strains, respectively. A bioethanol titer of 106 g/L was obtained from S. cerevisiae grown on oilcane juice in a 3 L fermenter, and a lipid titer of 8.8 g/L was obtained from R. toruloides grown on oilcane hydrolysate in a 75 L fermenter. Oil was extracted from the R. toruloides cells using supercritical CO2, and the observed fatty acid profile was consistent with previous studies on this strain. These results demonstrate the feasibility of pilot-scale lipid production from oilcane hydrolysate as part of an integrated bioconversion strategy.

keywords: Conversion;Bioproducts;Feedstock Bioprocessing;Hydrolysate

published: 2022-09-29

3DIFICE: A Synthetic Dataset for Training Computer Vision Algorithms to Recognize Earthquake Damage to Reinforced Concrete Structures

Levine, Nathaniel (2022)

3DIFICE: 3-dimensional Damage Imposed on Frame structures for Investigating Computer vision-based Evaluation methods This dataset contains 1,396 synthetic images and label maps with various types of earthquake damage imposed on reinforced concrete frame structures. Damage includes: cracking, spalling, exposed transverse rebar, and exposed longitudinal rebar. Each image has an associated label map that can be used for training machine learning algorithms to recognize the various types of damage.

keywords: computer vision; earthquake engineering; structural health monitoring; civil engineering; structural engineering;

published: 2026-01-15

Data for "Adsorptive Separation and Recovery of Triacetic Acid Lactone from Fermentation Broth"

Singh, Ramkrishna; Bhagwat, Sarang; Viswanathan, Mothi Bharath; Cortes-Pena, Yoel; Eilts, Kristen; Mingfeng, Cao; Guest, Jeremy; Zhao, Huimin; Singh, Vijay (2026)

Triacetic acid lactone (TAL) can be microbially produced and further chemically upgraded to several high-value chemicals. In this work, several acidic and basic ion-exchange resins and activated charcoal were evaluated for their ability to adsorb microbially produced TAL. Activated charcoal and a weak base resin, Dowex 66, showed similar TAL adsorption capacity of 0.18 ± 0.002 g/g. At 15% w/v activated charcoal, about 98% of TAL present in fermentation broth could be adsorbed. Further, ethanol washing allowed recovery of 72% of adsorbed TAL. A biorefinery producing TAL from sucrose was designed, simulated, and evaluated (through technoeconomic analysis) under uncertainty, for an estimated TAL minimum product selling price (MPSP) of $4.27/kg [$3.71−4.94/kg; 5th-95th percentiles] for the current state of technology and $2.83/kg [$2.46–3.29/kg] following potential near-term improvements to fermentation. Thus, this work provides an adsorptive process to recover microbially produced TAL that can be chemically upgraded to several industrial products.

keywords: Bioproducts; Feedstock Bioprocessing

published: 2024-10-07

SoyFACE Fumigation Data Files

Kole Aspray, Elise; Ainsworth, Elizabeth; McGrath, Jesse; McGrath, Justin; Montes, Christopher; Whetten, Andrew; Ort, Donald; Long, Stephen; Puthuval, Kannan; Mies, Timothy; Bernacchi, Carl; DeLucia, Evan; Dalsing, Bradley; Leakey, Andrew; Li, Shuai; Herriott, Jelena; Miglietta, Franco (2024)

This data set is related to the SoyFACE experiments, which are open-air agricultural climate change experiments that have been conducted since 2001. The fumigation experiments take place at the SoyFACE farm and facility in Champaign County, Illinois during the growing season of each year, typically between June and October. This V4 contains new experimental data files, hourly fumigation files, and weather/ambient files for 2022 and 2023, since the original dataset only included files for 2001-2021. The MATLAB code has also been updated for efficiency, and explanatory files have been updated accordingly. Below are new changes in V4: - The "SoyFACE Plot Information 2001 to 2021" file is renamed to “SoyFACE ring information 2001 to 2023.xlsx”. Data for 2022 and 2023 were added. File contains information about each year of the SoyFACE experiments, including the fumigation treatment type (CO2, O3, or a combination treatment), the crop species, the plots (also referred to as 'rings' and labeled with numbers between 2 and 31) used in each experiment, important experiment dates, and the target concentration levels or 'setpoints' for CO2 and O3 in each experiment. - The "SoyFACE 1-Minute Fumigation Data Files" were updated to contain sub-folders for each year of the experiments (2001-2023), each of which contains sub-folders for each ring used in that year's experiments. This data set also includes hourly data files for the fumigation experiments ("SoyFACE Hourly Fumigation Data Files" folder) created from the 1-minute files, and hourly ambient/weather data files for each year of the experiments ("Hourly Weather and Ambient Data Files" folder which has also been updated to include 2022 and 2023 data). The ambient CO2 and O3 data are collected at SoyFACE, and the weather data are collected from the SURFRAD and WARM weather stations located near the SoyFACE farm. - “Rings.xlsx” is new in this version. This file lists the rings and treatments used in each year of the SoyFACE experiments between 2001 and 2023 and is used in several of the MATLAB codes. - “CMI Weather Data Explanation.docx” is newly added. This file contains specific information about the processing of raw weather data, which is used in the hourly weather and ambient data files. - Files that were in RAR format in V3 are now updated and saved as ZIP format, including: Hourly Weather and Ambient Data Files.zip , SoyFACE 1-Minute Fumigation Data Files.zip , SoyFACE Hourly Fumigation Data Files.zip, and Matlab Files.zip. - The "Fumigation Target Percentages" file was updated to add data for 2022 and 2023. This file shows how much of the time the CO2 and O3 fumigation levels are within a 10 or 20 percent margin of the target levels when the fumigation system is turned on. - The "Matlab Files" folder contains custom code (Aspray, E.K.) that was used to clean the "SoyFACE 1-Minute Fumigation Data" files and to generate the "SoyFACE Hourly Fumigation Data" and "Fumigation Target Percentages" files. Code information can be found in the various "Explanation" files. The Matlab code changes are as follows: 1. “Data_Issues_Finder.m” code was changed to use the “Ring.xlsx” file to gather ring and treatment information based on the contents of the file rather than being hardcoded in the Matlab code itself. 2. “Data_Issues_Finder_all.m” code is new. This code is the same as the “Data_Issues_Finder.m” code except that it identifies all CO2 and O3 repeats. In contrast, the “Data_Issues_Finder.m” code only identifies CO2 and O3 repeats that occur when the fumigation system is turned on. 3. “Target_Yearly.m” code was changed to use the “Ring.xlsx” file to gather ring and treatment information based on the contents of the file rather than being hardcoded in the Matlab code itself. 4. “HourlyFumCode.m” code is new. This code uses the “Rings.xlsx” file to gather ring and treatment information based on the contents of the file instead of the user needing to define these values explicitly. This code also defines a list of all ring folders for the year selected and runs the hourly code for each ring, instead of the user having to run the hourly code for each ring individually. Finally, the code generates two dialog boxes for the user, one which allows user to specify whether they want the hourly code to be run for 1-minute fumigation files or 1-minute ambient files, and another which allows user to specify whether they would like the hourly fumigation averages to be replaced with hourly ambient averages when the fumigation system is turned off. 5. “HourlyDataFun.m” code was changed to run either “HourlyData.m” code or “HourlyDataAmb.m” code, depending on user input in the first dialog box. 6. “HourlyData.m” code was changed to replace hourly fumigation averages with hourly ambient averages when the fumigation system is turned off, depending on user input in the second dialog box. 7. “HourlyDataAmb.m” code is new. This code is similar to “HourlyData.m” code but is used to calculate hourly averages for 1-minute ambient files instead 1-minute fumigation files. 8. “batch.m” code was changed to account for new function input variables in “HourlyDataFun.m” code, along with adding header columns for “FumOutput.xlsx” and “AmbOutput.xlsx” output files generated by “HourlyData.m” and “HourlyDataAmb.m” code. - Finally, the " * Explanation" files contain information about the column names, units of measurement, steps needed to use Matlab code, and other pertinent information for each data file. Some of them have been updated to reflect the current change of data.

keywords: SoyFACE; agriculture; agricultural; climate; climate change; atmosphere; atmospheric change; CO2; carbon dioxide; O3; ozone; soybean; fumigation; treatment

published: 2019-03-06

Experimental data on bulk and unjacketed moduli of porous rocks

Makhnenko, Roman; Tarokh, Ali (2019)

This dataset is provided to support the statements in Tarokh, A., and R.Y. Makhnenko. 2019. Remarks on the solid and bulk responses of fluid-filled porous rock, Geophysics. The unjacketed bulk modulus is a poroelastic parameter that can be directly measured in a laboratory test under a loading that preserves the difference between the mean stress and pore pressure constant. For a monomineralic rock, the measurement of the unjacketed bulk modulus is ignored because it is assumed to be equal to the bulk modulus of the solid phase. To examine this assumption, we tested porous sandstones (Berea and Dunnville) and limestones (Apulian and Indiana) mainly composed of quartz and calcite, respectively, under the unjacketed condition. The presence of microscale inhomogeneities, in the form of non-connected (occluded) pores, was shown to cause a considerable difference between the unjacketed bulk modulus and the bulk modulus of the solid phase. Furthermore, we found the unjacketed bulk modulus to be independent of the unjacketed pressure and Terzaghi effective pressure and therefore a constant.

keywords: Poroelasticity; anisotropic solid skeleton; unjacketed bulk modulus; non-connected porosity

published: 2025-09-17

Data for "Decompartmentalization of the yeast mitochondrial metabolism to improve chemical production in Issatchenkia orientalis"

Zhao, Huimin; Rabinowitz, Joshua; Guest, Jeremy; Zhu, Zhixin; Bhagwat, Sarang; Li, Xi; Weilandt, Daniel; Xu, Hao; Tan, Shih-I; Tran, Vinh (2025)

Microbial production of chemicals may suffer from inadequate cofactor provision, a challenge further exacerbated in yeasts due to compartmentalized cofactor metabolism. Here, we perform cofactor engineering through the decompartmentalization of mitochondrial metabolism to improve succinic acid (SA) production in Issatchenkia orientalis. We localize the reducing equivalents of mitochondrial NADH to the cytosol through cytosolic expression of its pyruvate dehydrogenase (PDH) complex and couple a reductive tricarboxylic acid pathway with a glyoxylate shunt, partially bypassing an NADH-dependent malate dehydrogenase to conserve NADH. Cytosolic SA production reaches a titer of 104 g/L and a yield of 0.85 g/g glucose, surpassing the yield of 0.66 g/g glucose constrained by cytosolic NADH availability. Additionally, expressing cytosolic PDH, we expand our I. orientalis platform to enhance acetyl-CoA-derived citramalic acid and triacetic acid lactone production by 1.22- and 4.35-fold, respectively. Our work establishes I. orientalis as a versatile platform to produce markedly reduced and acetyl-CoA-derived chemicals.

keywords: bioproducts; metabolic engineering

published: 2022-04-19

Dataset for On the Importance of Firth Bias Reduction in Few-Shot Classification

Saleh, Ehsan; Ghaffari, Saba; Forsyth, David; Yu-Xiong, Wang (2022)

This data repository includes the features and the trained backbone parameters used in the ICLR 2022 Paper "On the Importance of Firth Bias Reduction in Few-Shot Classification". The code accompanying this data is open-source and available at https://github.com/ehsansaleh/firth_bias_reduction The code and the data have three modules: 1. The "code_firth" module (10 files) relates to the basic ResNet backbones and logistic classifiers (e.g., Figures 2 and 3 in the main paper). 2. The "code_s2m2rf" module (2 files) relates to the S2M2R feature backbones and cosine classifiers (e.g., Figure 4 in the main paper). 3. The "code_dcf" module (3 files) relates to the few-shot Distribution Calibration (DC) method (e.g., Table 1 in the main paper). The relevant files for each module have the module name as a prefix in their name. 1. For instance, the "code_dcf_features.tar" file should be placed at the "features" directory of the "code_dcf" module. 2. As another example, "code_firth_features_cifarfs_novel.tar" should be placed in the "features" directory of the "code_firth" module, and it includes the features extracted from the novel split of mini-ImageNet dataset. Each tar-ball should be extracted in its relevant directory, and the md5 check-sums of the extracted files are also provided in the open-source code repository for verification. Please note that the actual datasets of images are not included here (since we do not own those datasets). However, helper scripts for automatically downloading the original datasets are also provided in the every module and sub-directory of the GitHub code repository.

keywords: Computer Vision; Few-Shot Classification; Few-Shot Learning; Firth Bias Reduction

published: 2018-08-06

Comparison of data extraction on 6 clinical trial papers, extraction by RobotReviewer, by 3 novice data extractors, and from a published Cochrane review.

Hoang, Linh; Cao, Linh ; Guan, Yingjun; Cheng, Yi-Yun; Schneider, Jodi (2018)

This annotation study compared RobotReviewer's data extraction to that of three novice data extractors, using six included articles synthesized in one Cochrane review: Bailey E, Worthington HV, van Wijk A, Yates JM, Coulthard P, Afzal Z. Ibuprofen and/or paracetamol (acetaminophen) for pain relief after surgical removal of lower wisdom teeth. Cochrane Database Syst Rev. 2013; CD004624; doi:10.1002/14651858.CD004624.pub2 The goal was to assess the relative advantage of RobotReviewer's data extraction with respect to quality.

keywords: RobotReviewer; annotation; information extraction; data extraction; systematic review automation; systematic reviewing;

published: 2019-03-06

Chronic contact with realistic soil concentrations of imidacloprid affects the mass, immature development speed, and adult longevity of solitary bees

Anderson, Nicholas L.; Harmon-Threatt, Alexandra N. (2019)

Chronic contact exposure to realistic soil concentrations (0, 7.5, 15, and 100 ppb) of the neonicotinoid pesticide imidacloprid had species- and sex-specific effects on bee adult longevity, immature development speed, and mass. This dataset contains a life table tracking the development, mass, and deaths of a single cohort of Osmia lignaria and Megachile rotundata over the course of two summers. Other data files include files created for multi-event survival analysis to analyze the effect on development speed. Detected effects included: decreased adult longevity for female O. lignaria at the highest concentration, a trend for a hormetic effect on female M. rotundata development speed and mass (longest development time and greatest mass in the 15 ppb treatment), and decreased adult longevity and increased development speed at high imidacloprid concentrations as well as a hormetic effect on mass (lowest in the 15 ppb treatment treatment) on male M. rotundata.

keywords: neonicotinoid; imidacloprid; bee; habitat restoration;

published: 2021-02-24

Southeastern South America Soil Moisture Alteration Experiment Using CESM2

Bieri, Carolina A.; Dominguez, Francina (2021)

This dataset contains model output from the Community Earth System Model, Version 2 (CESM2; Danabasoglu et al. 2020). These data were used for analysis in Impacts of Large-Scale Soil Moisture Anomalies in Southeastern South America, published in the Journal of Hydrometeorology (DOI: 10.1175/JHM-D-20-0116.1). See this publication for details of the model simulations that created these data. Four NetCDF (.nc) files are included in this dataset. Two files correspond to the control simulation (FHIST_SP_control) and two files correspond to a simulation with a dry soil moisture anomaly imposed in southeastern South America (FHIST_SP_dry; see the publication mentioned in the preceding paragraph for details on the spatial extent of the imposed anomaly). For each simulation, one file corresponds to output from the atmospheric model (file names with "cam") of CESM2 and the other to the land model (file names with "clm2"). These files are raw CESM output concatenated into a single file for each simulation. All files include data from 1979-01-02 to 2003-12-31 at a daily resolution. The spatial resolution of all files is about 1 degree longitude x 1 degree latitude. Variables included in these files are listed or linked below. Variables in atmosphere model output: Vertical velocity (omega) Convective precipitation Large-scale precipitation Surface pressure Specific humidity Temperature (atmospheric profile) Reference temperature (temp. at reference height, 2 meters in this case) Zonal wind Meridional wind Geopotential height Variables in land model output: See https://www.cesm.ucar.edu/models/cesm1.2/clm/models/lnd/clm/doc/UsersGuide/history_fields_table_40.xhtml Note that not all of the variables listed at the above link are included in the land model output files in this dataset. This material is based upon work supported by the National Science Foundation under Grant No. 1454089. We acknowledge high-performance computing support from Cheyenne (doi:10.5065/D6RX99HX) provided by NCAR's Computational and Information Systems Laboratory, sponsored by the National Science Foundation. The CESM project is supported primarily by the National Science Foundation. We thank all the scientists, software engineers, and administrators who contributed to the development of CESM2. References Danabasoglu, G., and Coauthors, 2020: The Community Earth System Model Version 2 (CESM2). Journal of Advances in Modeling Earth Systems, 12, e2019MS001916, https://doi.org/10.1029/2019MS001916.

keywords: Climate modeling; atmospheric science; hydrometeorology; hydroclimatology; soil moisture; land-atmosphere interactions

published: 2025-09-15

Data for "A 13-year record indicates differences in the duration and depth of soil carbon accrual among potential bioenergy crops"

Kantola, Ilsa; Masters, Michael; DeLucia, Evan (2025)

Data sets for material included in "A 13-year record indicates differences in the duration and depth of soil carbon accrual among potential bioenergy crops" by Kantola et al., 2025, in Global Change Biology Bioenergy. Data include soil organic carbon (SOC), carbon stable isotope ratios, annual belowground biomass, and annual post-harvest litter for four crops, maize/soybean, miscanthus, switchgrass, and prairie, between 2008 and 2021.

keywords: bioenergy crops; soil organic carbon; miscanthus; switchgrass; prairie

published: 2025-08-17

ME-MKM codes and associated files

Peters, Baron (2025)

These codes implement the master equation microkinetic modeling (ME-MKM) calculations of Adams et al. (J. Phys. Chem. C 2025, 129, 15, 7285–7294), as well as the automatic derivatives for activation energies and reaction orders in their follow-up work (in review).

keywords: Microkinetic model; master equation; periodic tiling; catalysis; adsorption;

published: 2025-09-17

Data for "Anti-Pdc1p Nanobody as a Genetically Encoded Inhibitor of Ethanol Production Enables Dual Transcriptional and Post-translational Controls of Yeast Fermentations"

Avalos, Jose L; Mantri, Krishi (2025)

Microbial fermentation provides a sustainable method of producing valuable chemicals. Adding dynamic control to fermentations can significantly improve titers, but most systems rely on transcriptional controls of metabolic enzymes, leaving existing intracellular enzymes unregulated. This limits the ability of transcriptional controls to switch off metabolic pathways, especially when metabolic enzymes have long half-lives. We developed a two-layer transcriptional/post-translational control system for yeast fermentations. Specifically, the system uses blue light to transcriptionally activate the major pyruvate decarboxylase PDC1, required for cell growth and concomitant ethanol production. Switching to darkness transcriptionally inactivates PDC1 and instead activates the anti-Pdc1p nanobody, NbJRI, to act as a genetically encoded inhibitor of Pdc1p accumulated during the growth phase. This dual transcriptional/post-translational control improves the production of 2,3-BDO and citramalate by up to 100 and 92% compared to using transcriptional controls alone in dynamic two-phase fermentations. This study establishes the NbJRI nanobody as an effective genetically encoded inhibitor of Pdc1p that can enhance the production of pyruvate-derived chemicals.

keywords: metabolic engineering

published: 2017-09-28

Biotic homogenization of regional wetland plant communities within short timescales in the presence of an aggressive invader

Price, Edward P. F.; Spyreas, Greg; Matthews, Jeffrey (2017)

This is the dataset used in the Journal of Ecology publication of the same name. It is a site by species matrix of species relative abundances. The file BH.veg.data.csv contains a site by species matrix of species relative abundance (percent cover across all sampling quadrats within site). Data under the heading Year refers to sampling periods. Year 1 refers to the first set of samples taken between 1997 and 2000, Year 2 refers to the second set taken between 2002 and 2005, Year 3 refers to the third set taken between 2007 and 2010, and Year 4 refers to the fourth set taken between 2012 and 2015. All sites met Critical Trends Assessment Program (CTAP) size criteria of being at least 2 ha in size with a minimum of 500 m2 of suitable sampling area. The data in file BH.site.location.csv contains Public Land Survey System ranges and townships in which specific sites were located. All sites were located within the U.S. state of Illinois. More information about this dataset: Interested parties can request data from the Critical Trends Assessment Program, which was the source for the data on the wetlands in this study. More information on the program and data requests can be obtained by visiting the program webpage. Critical Trends Assessment Program, Illinois Natural History Survey. http://wwx.inhs.illinois.edu/research/ctap/

keywords: biodiversity; biotic homogenization; invasive species; Phalaris arundinacea; plant population and community dynamics; similarity index; wetlands

published: 2020-07-15

Data from: Supertree-like methods for genome-scale species tree estimation

Molloy, Erin K. (2020)

This repository includes scripts and datasets for Chapter 6 of my PhD dissertation, " Supertree-like methods for genome-scale species tree estimation," that had not been published previously. This chapter is based on the article: Molloy, E.K. and Warnow, T. "FastMulRFS: Fast and accurate species tree estimation under generic gene duplication and loss models." Bioinformatics, In press. https://doi.org/10.1093/bioinformatics/btaa444. The results presented in my PhD dissertation differ from those in the Bioinformatics article, because I re-estimated species trees using FastMulRF and MulRF on the same datasets in the original repository (https://doi.org/10.13012/B2IDB-5721322_V1). To re-estimate species trees, (1) a seed was specified when running MulRF, and (2) a different script (specifically preprocess_multrees_v3.py from https://github.com/ekmolloy/fastmulrfs/releases/tag/v1.2.0) was used for preprocessing gene trees (which were then given as input to MulRF and FastMulRFS). Note that this preprocessing script is a re-implementation of the original algorithm for improved speed (a bug fix also was implemented). Finally, it was brought to my attention that the simulation in the Bioinformatics article differs from prior studies, because I scaled the species tree by 10 generations per year (instead of 0.9 years per generation, which is ~1.1 generations per year). I re-simulated datasets (true-trees-with-one-gen-per-year-psize-10000000.tar.gz and true-trees-with-one-gen-per-year-psize-50000000.tar.gz) using 0.9 years per generation to quantify the impact of this parameter change (see my PhD dissertation or the supplementary materials of Bioinformatics article for discussion).

keywords: Species tree estimation; gene duplication and loss; statistical consistency; MulRF, FastRFS

published: 2020-10-14

Multiple stem and environmental variables dataset

Dalling, James W.; Heineman, Katherine D. (2020)

Data on permanent plots at Fortuna and the Panama Canal Watershed, Republic of Panama, containing counts and percent of trees with one or more multiple stems >10cm diameter, with and without palms. Accompanying environmental data includes elevation, precipitation, soil type and soil chemical variables (pH, total N, NO3, NO4, resin P, mehlich Ca, K and Mg.

keywords: multiple stems; resprouting; Panama Canal Watershed; Fortuna Forest Reserve