Illinois Data Bank Dataset Search Results
Results
published:
2022-04-29
Wedell, Eleanor; Warnow, Tandy
(2022)
Thank you for using these datasets!
These files contain trees and reference alignments, as well as the selected query sequences for testing phylogenetic placement methods against and within the SCAMPP framework.
There are four datasets from three different sources, each containing their source alignment and "true" tree, any estimated trees that may have been generated, and any re-estimated branch lengths that were created to be used with their requisite phylogenetic placement method.
Three biological datasets (16S.B.ALL, PEWO/LTP_s128_SSU, and PEWO/green85) and one simulated dataset (nt78) is contained. See README.txt in each file for more information.
keywords:
Phylogenetic Placement; Phylogenetics; Maximum Likelihood; pplacer; EPA-ng
published:
2024-03-01
Chen, Chu-Chun; Dominguez, Francina
(2024)
This dataset contains model output from the Community Earth System Model, Version 1 (CESM1; Hurrell et al., 2013) and variables from the European Centre for Medium-Range Weather Forecast (ECMWF) Reanalysis v5 (ERA5; Hersbach et al., 2020). These data were used for analysis in “The location of large-scale soil moisture anomalies affects moisture transport and precipitation over southeastern South America”, published in Geophysical Research Letters.
Acknowledgments:
This work was supported by NSF Award AGS-1852709. We acknowledge high-performance computing support from Cheyenne (doi:10.5065/D6RX99HX) provided by NCAR's Computational and Information Systems Laboratory, sponsored by the NSF. We thank Dr. Haiyan Teng for providing guidance on setting up the CESM experiments and offering valuable advice.
References:
Hersbach H, Bell B, Berrisford P, et al. The ERA5 global reanalysis. Q J R Meteorol Soc. 2020; 146: 1999–2049. https://doi.org/10.1002/qj.3803
Hurrell, J. W., and Coauthors, 2013: The Community Earth System Model: A Framework for Collaborative Research. Bull. Amer. Meteor. Soc., 94, 1339–1360, https://doi.org/10.1175/BAMS-D-12-00121.1
keywords:
atmospheric sciences; climate modeling; land-atmosphere interactions; soil moisture; regional atmospheric circulation; southeastern South America
published:
2025-08-27
Jang, Chunhwa; Namoi, Nictor; Lee, Jung Woo; Becker, Talon; Rooney, William; Lee, DoKyoung
(2025)
Data were collected from agronomy fields in Urbana and Ewing, IL, during the 2022 and 2023 growing seasons. The dataset includes dry biomass yield, nitrogen, phosphorus, and potassium concentrations and removals, and chemical composition elements (cellulose, hemicellulose, lignin, and soluble fractions) for 13 high-biomass sorghum hybrids.
data_sharing.xlsx contains 20 columns and 104 rows. Below is the explanation of all variables in the file:
Year: 2022; 2023
Location: Urbana, IL; Ewing, IL
N rate (kg-N/ha): 0; 112
Hybrid #: H1-H13
Pedigree: Pedigree for 13 hybrids
Dry biomass yield (Mg/ha): Aboveground dry biomass yield
N (g/kg): Nitrogen concentration in plant tissue
P (g/kg): Phosphorus concentration in plant tissue
K (g/kg): Potassium concentration in plant tissue
N (kg/ha): Nitrogen removal by aboveground biomass
P (kg/ha): Phosphorus removal by aboveground biomass
K (kg/ha): Potassium removal by aboveground biomass
Cellulose (g/kg): Cellulose concentration in plant tissue
Hemicellulose (g/kg): Hemicellulose concentration in plant tissue
Lignin (g/kg): Lignin concentration in plant tissue
Soluble (g/kg): Soluble concentration in plant tissue
Cellulose (Mg/ha): Cellulose content in aboveground biomass
Hemicellulose (Mg/ha): Hemicellulose content in aboveground biomass
Lignin (Mg/ha): Lignin content in aboveground biomass
Soluble (Mg/ha): Soluble content in aboveground biomass
keywords:
high-biomass sorghum hybrids; yield potential; environmental adaptability; feedstock quality; nutrient removal; N fertilization
published:
2025-09-26
Arora, Amit; Singh, Vijay
(2025)
In this study, different process schemes were designed and evaluated for biodiesel production from engineered cane lipids with uncertain fatty acid compositions. Four different process schemes were compared under (i) thermal glycerolysis and (ii) enzymatic glycerolysis approaches. These schemes were based on the biodiesel yield and economic indicators such as the net present value (NPV) and the minimum selling price (MSP) of biodiesel. A scheme with polar lipid separation under thermal glycerolysis resulted in the maximum NPV ($96.5 million) and minimum MSP ($1107/ton biodiesel), respectively. Through local sensitivity analysis, it was concluded that the cane lipid percentage is the most significant factor influencing process economics. A conjoint analysis of the lipid procurement price and cane lipid percent suggested that 15% cane lipids with a low lipid procurement price ($0.536/kg) results in a positive NPV. When the cane lipid price is higher (>$0.80/kg), a 20% lipid content should be considered to achieve a positive NPV. At 20% cane lipids, the worst-case and best-case scenarios were evaluated by analyzing the interplay of the three most important parameters, The best-case scenario revealed that the minimum NPV under any process scheme could yield more than $100 million (or MSP: $0.80/L), and the worst-case analysis showed that losses incurred by the plant could be as high as $80 million (MSP: $1.36/L). A Monte Carlo simulation indicated that there is a 70% chance of the plant being profitable (NPV > 0).
keywords:
Conversion;Economics;Feedstock Bioprocessing;Modeling
published:
2025-10-21
Trieu, Anthony; Belaffif, Mohammad B.; Hirannaiah, Pradeepa; Manjunatha, Shilpa; Wood, Rebekah; Bathula, Yokshitha; Billingsley, Rebecca L.; Arpan, Anjali; Sacks, Erik; Clemente, Tom; Moose, Stephen; Reichert, Nancy A.; swaminathan, kankshita
(2025)
Miscanthus, a C4 member of the family Poaceae, is a promising perennial crop for bioenergy, renewable bioproducts, and carbon sequestration. Species of interest include nothospecies Miscanthus x giganteus and its parental species M. sacchariflorus and M. sinensis. Use of biotechnology-based procedures to genetically improve miscanthus, to date, have only included plant transformation procedures for introduction of exogenous genes into the host genome at random, non-targeted sites.
keywords:
Feedstock Production;Biomass Analytics;Genomics
published:
2023-07-14
Punyasena, Surangi W.; Urban, Michael A.; Adaime, Marc-Elie; Romero, Ingrid; Jaramillo, Carlos
(2023)
This dataset includes a total of 300 images of 45 extant species of Podocarpus (Podocarpaceae) and nine images of fossil specimens of the morphogenus Podocarpidites. The goal of this dataset is to capture the diversity of morphology within the genus and create an image database for training machine learning models.
The images were taken using Airyscan confocal superresolution microscopy at 630x magnification (63x/NA 1.4 oil DIC). The images are in the CZI file format. They can be opened using Zeiss propriety software (Zen, Zen lite) or open microscopy software, such as ImageJ. More information on how to open CZI files can be found here: [https://www.zeiss.com/microscopy/us/products/software/zeiss-zen/czi-image-file-format.html]
Please cite this dataset and listed publications when using these images.
keywords:
optical superresolution microscopy; Zeiss Airyscan; CZI images; conifer; saccate pollen; Podocarpus; Podocarpidites; Smithsonian Tropical Research Institute
published:
2019-05-31
The data are provided to illustrate methods in evaluating systematic transactional data reuse in machine learning. A library account-based recommender system was developed using machine learning processing over transactional data of 383,828 transactions (or check-outs) sourced from a large multi-unit research library. The machine learning process utilized the FP-growth algorithm over the subject metadata associated with physical items that were checked-out together in the library. The purpose of this research is to evaluate the results of systematic transactional data reuse in machine learning. The analysis herein contains a large-scale network visualization of 180,441 subject association rules and corresponding node metrics.
keywords:
evaluating machine learning; network science; FP-growth; WEKA; Gephi; personalization; recommender systems
published:
2020-06-26
Gasparik, Jessica T.; Ye, Qing; Curtis, Jeffrey H.; Presto, Albert A.; Donahue, Neil M.; Sullivan, Ryan C.; West, Matthew; Riemer, Nicole
(2020)
This dataset contains the PartMC-MOSAIC simulations used in the article "Quantifying Errors in the Aerosol Mixing-State Index Based on Limited Particle Sample Size". The 1000 simulations of output data is organized into a series of archived folders, each containing 100 scenarios. Within each scenario directory are 25 NetCDF files, which are the hourly output of a PartMC-MOSAIC simulation containing all information regarding the environment, particle and gas state. This dataset was used to investigate the impact of sample size on determining aerosol mixing state. This data may be useful as a data set for applying different types of estimators.
keywords:
Atmospheric aerosols; single-particle measurements; sampling uncertainty; NetCDF
published:
2022-07-19
Parmar, Dharmeshkumar; Jia, Jin; Shrout, Joshua; Sweedler, Jonathan; Bohn, Paul
(2022)
#### Details of Pseudomonas aeruginosa biofilm dataset ####
----------------*Folder Structure*-------------------------------------
This dataset contains peak intensity tables extracted from mass spectrometry imaging (MSI) data using tools, SCiLS and MSI reader. There are 2 folders in "MSI-Data-Paeruginosa-biofilms-UIUC-DP-JVS-July2022.zip", each folder contains 3 sub-folders as listed below.
1. PellicleBiofilms-and-Supernatant [Pellicle biofilms collected from air-liquid interface and spend supernatant medium after 96 h incubation period]:
(1) Full-Scan-Data-96h; (2) MSMS-data-from-C7-Quinolones-96h; and (3) MSMS-data-from-C9-Quinolones-96h
2. StaticBiofilms [Static biofilms grown on mucin surface]:
(1) Full-Scan-Data; (2) MSMS-data-from-C7-Quinolones; and (3) MSMS-data-from-C9-Quinolones
----------------*File name*----------------------------------------------
Sample information is included in the file names for easy identification and processing. Attributes covered in file names are explained in the example below.
*Example file name "Rep1-Stat-FRD1-mPat-48-FS"*
~ Each unit of information is separated by "-"
~Unit 1 - "Rep1" - Biological replicate ( Rep1, Rep2, and Rep3)
~Unit 2 - "Stat" - Sample type (Stat = Static Biofilm, Pel = Pellicle biofilm, Sup = Supernatant)
~Unit 3 - "FRD1" - Strain (FRD1 = Mucoid strain, PAO1C = Non-mucoid strain)
~Unit 4 - "mPat" - Type of mucin surface used (mPat = patterned mucin surface, mUni = uniform mucin surface)
~Unit 5 - "48" - Sample time point (hours = 48, 72, 96)
~Unit 6 - "FS" - Scan type used in MSI (FS = high resolution full-scan, 260 = targeted MS/MS of C7 quinolones (m/z 260), 288 = targeted MS/MS of C9 quinolones (m/z 288))
----------------*File structure*------------------------------------------
All MSI data has been exported to CSV format. Each CSV files contains information about scan number, Coordinates (x,y,z), m/z values, extraction window (absolute), and corresponding intensities in the form of a matrix.
----------------*End of Information*--------------------------------------
keywords:
mass spectrometry imaging (MSI); biofilm; antibiotic resistance; Pseudomonas aeruginosa; quorum sensing; rhamnolipids
published:
2024-08-19
Ward, Michael; Stewart, Sarah; Benson, Thomas
(2024)
Data on the nesting success and post-fledgling survival of Eastern Whip-poor-wills in central Illinois. Data was part of Sarah Stewart's MS project at the University of Illinois.
keywords:
bird nesting success; post-fledgling survival; eastern whip-poor-will
published:
2025-07-09
Kim, Ahyoung; Kim, Chansong; Waltmann, Tommy; Vo, Thi; Kim, Eun Mi; Kim, Junseok; Shao, Yu-Tsun; Michelson, Aaron; Crockett, John R.; Kalutantirige, Falon C.; Yang, Eric; Yao, Lehan; Hwang, Chu-Yun; Zhang, Yugang; Liu, Yu-Shen; An, Hyosung; Gao, Zirui; Kim, Jiyeon; Mandal, Sohini; Muller, David; Fichthorn, Kristen; Glotzer, Sharon; Chen, Qian
(2025)
This dataset contains the raw transmission electron microscopy (TEM) and scanning electron microscopy (SEM) images used to calculate the synthesis yield of patchy nanoparticles (NPs), as described in Supplementary Table 1 of the paper “Patchy Nanoparticles by Atomic “Stencilling” (2025).” All the images were taken at the Materials Research Laboratory, University of Illinois at Urbana-Champaign by Qian Chen group.
1. We have 21 subfolders, each with a name corresponding to one of the 21 patchy NPs listed in Supplementary Table 1 of the paper “Patchy Nanoparticles by Atomic “Stencilling” (2025)."
2. In TEM images, the bright and dark regions indicate the polymer patches and NP cores, respectively.
3. In SEM images, the bright and dark regions indicate the NP cores and polymer patches, respectively.
4. Each subfolder contains a “readme (subfolder name).txt” file with more detailed information about each sample.
keywords:
Patchy nanoparticle; polymer; synthesis; self-assembly
published:
2025-10-10
Field, John L.; Richard, Tom; Smithwick, Erica A. H.; Cai, Hao; Laser, Mark; LeBauer, David; Long, Stephen; Paustian, Keith; Qin, Zhangcai; Sheehan, John; Smith, Pete; Wang, Michael Q.; Lynd, Lee
(2025)
This zip file contains a UNIX-format DayCent model executable, input files, automation code, and associated directory structure necessary to re-produce the DayCent analysis underlying the manuscript. The main script “autodaycent.py” (written for Python 2.7) opens an interactive command line routine that facilitates: Calibrating the DayCent pine growth model; Initializing DayCent for a set of case studies sites; Executing an ensemble of model runs representing case study site reforestation, grassland restoration, or conversion to switchgrass cultivation; and Results analysis & generation of manuscript Fig. 3. Note that the interactive analysis code requires that all input files to be contained in the directory structure as uploaded, without modification. Executable versions of the DayCent model compatible with other operating systems are available upon request.
keywords:
Feedstock Production;Modeling
published:
2018-01-03
Sweet, Andrew; Bush, Sarah; Gustafsson, Daniel; Allen, Julie; DiBlasi, Emily; Skeen, Heather; Weckstein, Jason; Johnson, Kevin
(2018)
Concatenated sequence alignment, phylogenetic analysis files, and relevant software parameter files from a cophylogenetic study of Brueelia-complex lice and their avian hosts. The sequence alignment file includes a list of character blocks for each gene alignment and the parameters used for the MrBayes phylogenetic analysis.
1) Files from the MrBayes analyses:
a) a file with 100 random post-burnin trees (50% burnin) used in the cophylogenetic analysis - analysisrandom100_trees_brueelia.tre
b) a majority rule consensus tree - treeconsensus_tree_brueelia.tre
c) a maximum clade credibility tree - mcc_tree_brueelia.tre
The tree tips are labeled with louse voucher names, and can be referenced in Supplementary Table 1 of the associated publication.
2) Files related to a BEAST analysis with COI data:
a) the XML file used as input for the BEAST run, including model parameters, MCMC chain length, and priors - beast_parameters_coi_brueelia.xml
b) a file with 100 random post-burnin trees (10% burnin) from the BEAST posterior distribution of trees; used in OTU analysis - beast_100random_trees_brueelia.tre
c) an ultrametric maximum clade credibility tree - mcc_tree_beast_brueelia.tre
3) A maximum clade credibility tree of Brueelia-complex host species generated from a distribution of trees downloaded from https://birdtree.org/subsets/ - mcc_tree_brueelia_hosts.tre
4) Concatenated sequence alignment - concatenated_alignment_brueelia.nex
keywords:
bird lice; Brueelia-complex; passerines; multiple sequence alignment; phylogenetic tree; Bayesian phylogenetic analysis; MrBayes; BEAST
published:
2025-11-24
Dubinkina, Veronika; Bhogale, Shounak; Hsieh, Ping-Hung; Dibaeinia, Payam; Nambiar, Ananthan; Maslov, Sergei; Yoshikuni, Yasuo; Sinha, Saurabh
(2025)
Because of its natural stress tolerance to low pH, Issatchenkia orientalis (a.k.a. Pichia kudriavzevii) is a promising non-model yeast for bio-based production of organic acids. Yet, this organism is relatively unstudied, and specific mechanisms of its tolerance to low pH are poorly understood, limiting commercial use. In this study, we selected 12 I. orientalis strains with varying acid stress tolerance (six tolerant and six susceptible) and profiled their transcriptomes in different pH conditions to study potential mechanisms of pH tolerance in this species. We identified hundreds of genes whose expression response is shared by tolerant strains but not by susceptible strains, or vice versa, as well as genes whose responses are reversed between tolerant and susceptible strains. We mapped regulatory mechanisms of transcriptomic responses via motif analysis as well as differential network reconstruction, identifying several transcription factors, including Stb5, Mac1, and Rtg1/Rtg3, some of which are known for their roles in acid response in Saccharomyces cerevisiae. Functional genomics analysis of short-listed genes and transcription factors suggested significant roles for energy metabolism and translation-related processes, as well as the cell wall integrity pathway and RTG-dependent retrograde signaling pathway. Finally, we conducted additional experiments for two organic acids, 3-hydroxypropionate and citramalate, to eliminate acid-specific effects and found potential roles for glycolysis and trehalose biosynthesis specifically for response to low pH. In summary, our approach of comparative transcriptomics and phenotypic contrasting, along with a multi-pronged bioinformatics analysis, suggests specific mechanisms of tolerance to low pH in I. orientalis that merit further validation through experimental perturbation and engineering.
keywords:
Conversion;Transcriptomics
published:
2022-11-11
Hsiao, Haw-Wen; Zuo, Jian-Min
(2022)
This dataset is for characterizing chemical short-range-ordering in CrCoNi medium entropy alloys. It has three sub-folders: 1. code, 2. sample WQ, 3. sample HT. The software needed to run the files is Gatan Microscopy Suite® (GMS). Please follow the instruction on this page to install the DM3 GMS: <a href="https://www.gatan.com/installation-instructions#Step1">https://www.gatan.com/installation-instructions#Step1</a>
1. Code folder contains three DM scripts to be installed in Gatan DigitalMicrograph software to analyze scanning electron nanobeam diffraction (SEND) dataset:
Cepstrum.s: need [EF-SEND_sampleWQ_cropped_aligned.dm3] in Sample WQ and the average image from [EF-SEND_sampleWQ_cropped_aligned.dm3]. Same for Sample HT folder.
log_BraggRemoval.s: same as above.
Patterson.s: Need refined diffuse patterns in Sample HT folder.
2. Sample WQ and 3. Sample HT folders both contain the SEND data (.ser) and the binned SEND data (.dm3) as well as our calculated strain maps as the strain measurement reference. The Sample WQ folder additionally has atomic resolution STEM images; the Sample HT folder additionally has three refined diffuse patterns as references for diffraction data processing.
* Only .ser file is needed to perform the strain measurement using imToolBox as listed in the manuscript. .emi file contains the meta data of the microscope, which can be opened together with .ser file using FEI TIA software.
keywords:
Medium entropy alloy; CrCoNi; chemical short-range-ordering; CSRO; TEM
published:
2024-02-21
Hartman, Jordan H; Corush, Joel B; Larson, Eric R; Tiemann, Jeremy S; Willink, Philip; Davis, Mark A
(2024)
Data associated with the manuscript "Niche conservatism and spread explain hybridization and introgression between native and invasive fish" by Jordan H. Hartman, Joel B. Corush, Eric R. Larson, Jeremy S. Tiemann, Philip Willink, and Mark A. Davis. For this project, we combined results of ecological niche models (ENMs) and next-generation restriction site-associated DNA sequencing (RADseq) to test theories of niche conservatism and biotic resistance on the success of invasion, hybridization, and extent of introgression between native Western Banded Killifish and non-native Eastern Banded Killifish. This dataset provides the sampling locations and number of Banded Killifish in each population, accession numbers for RADseq from the National Center for Biotechnology Information Sequence Read Archive and the assignment of each Banded Killifish, the habitat associations of each population from the ENMs, and the occurrence points used to build the ENMs.
keywords:
Banded Killifish; ecological niche model; Fundulus diaphanus; hybrid swarm; invasive species; Laurentian Great Lakes
published:
2024-03-28
Zhang, Yue; Zhao, Helin; Huang, Siyuan; Hossain, Mohhamad Abir; van der Zande, Arend
(2024)
Read me file for the data repository
*******************************************************************************
This repository has raw data for the publication "Enhancing Carrier Mobility In Monolayer MoS2 Transistors With Process Induced Strain". We arrange the data following the figure in which it first appeared. For all electrical transfer measurement, we provide the up-sweep and down-sweep data, with voltage units in V and conductance unit in S. All Raman modes have unit of cm^-1.
*******************************************************************************
How to use this dataset
All data in this dataset is stored in binary Numpy array format as .npy file.
To read a .npy file: use the Numpy module of the python language, and use np.load() command.
Example: suppose the filename is example_data.npy. To load it into a python program, open a Jupyter notebook, or in the python program, run:
import numpy as np
data = np.load("example_data.npy")
Then the example file is stored in the data object.
*******************************************************************************
published:
2025-11-06
Harrison, Wesley; Jiang, Guangde; Zhang, Zhengyi; Li, Maolin; Chen, Haoyu; Zhao, Huimin
(2025)
Chiral alkyl amines are common structural motifs in pharmaceuticals, natural products, synthetic intermediates, and bioactive molecules. An attractive method to prepare these molecules is the asymmetric radical hydroamination; however, this approach has not been explored with dialkyl amine-derived nitrogen-centered radicals since designing a catalytic system to generate the aminium radical cation, to suppress deleterious side reactions such as α-deprotonation and H atom abstraction, and to facilitate enantioselective hydrogen atom transfer is a formidable task. Herein, we describe the application of photoenzymatic catalysis to generate and harness the aminium radical cation for asymmetric intermolecular hydroamination. In this reaction, the flavin-dependent ene-reductase photocatalytically generates the aminium radical cation from the corresponding hydroxylamine and catalyzes the asymmetric intermolecular hydroamination to furnish the enantioenriched tertiary amine, whereby enantioinduction occurs through enzyme-mediated hydrogen atom transfer. This work highlights the use of photoenzymatic catalysis to generate and control highly reactive radical intermediates for asymmetric synthesis, addressing a long-standing challenge in chemical synthesis.
keywords:
Conversion;Bioproducts;Catalysis
published:
2025-12-01
Park, Minhyuk; Yi, Haotian; Warnow, Tandy; Chacko, George
(2025)
This dataset principally consists of four synthetic citation networks that were generated during the preparation of the manuscript Park M, Yi H, Warnow T, and Chacko G (2025). Modeling the Global Citation Network using the Scalable Agent-based Simulator for Citation Analysis with Recency-emphasized Sampling (SASCA-ReS). A preprint is available on Zenodo (below) and the manuscript has been submitted to the MetaRoR platform for review and feedback.
@misc{park_2025_17789558,
author = {Park, Minhyuk and
Yi, Haotian and
Warnow, Tandy and
Chacko, George},
title = {Modeling the Global Citation Network using the
Scalable Agent-based Simulator for Citation
Analysis with Recency-emphasized Sampling (SASCA-
ReS)
},
month = dec,
year = 2025,
publisher = {Zenodo},
doi = {10.5281/zenodo.17789558},
url = {https://doi.org/10.5281/zenodo.17789558},
}
The networks are roughly 14, 76, 161, and 218 million nodes each. Both nodelists with attributes and edge lists are provided as gzipped parquet files along with the configuration file that was passed to the SASCA-ReS software, which can be accessed at: <a href="https://github.com/illinois-or-research-analytics/SASCA-ReS">https://github.com/illinois-or-research-analytics/SASCA-ReS</a>. A copy of the configuration file that was used to generate the network with SASCA-ReS is also provided. For example: abm14_config.ini; abm14_edgelist.parquet.gz; and abm14_nodelist.parquet.gz. The column headers in the edgelists and nodelists and the fields in the configuration file are explained in the Github repository for SASCA-ReS.
In addition, we provide sj_reccount, a table of real world citation frequencies that is an input to the SASCA-Res software. The first column (diff) of sj_reccount lists the difference between the publication year of a citing document and the publication year of a cited document. The second column (count) reports the frequency of such citations across the dataset of 77879427 observations, which is derived from the biomedical literature. Finally, we share data, composite_maverick_disruption.csv , from the mavericks (unconventional citing strategies) experiment reported in the Park et al. (2025) manuscript available at <a href="https://zenodo.org/records/17772113">https://zenodo.org/records/17772113</a>. The columns in the composite_maverick_disruption.csv file are:
node_id -> of agents in the various simulations
n_i, n_j, n_k -> terms used to compute disruption per "Wu, L., Wang, D. & Evans, J.A. Large teams develop and small teams disrupt science and technology. Nature 566, 378–382 (2019). <a href="https://doi.org/10.1038/s41586-019-0941-9">https://doi.org/10.1038/s41586-019-0941-9"</a>
disruption -> the disruption metric of Wu, Wang, and Evans (2019)
type -> maverick type (maximizer, randomnik, or minimizer)
year -> virtual year in the simulation when the maverick was created
alpha -> the alpha parameter of the control agent
pa_weight -> the preferential attachment weight of the control agent phenotype
fit_peak_value -> the fitness value assigned to the control agent
in_degree -> the count of citations accumulated by the maverick or control agent at the end of the simulation
out_degree -> the count of references made by the maverick
tag -> a label for the experiment, e.g. od249_f1 indicates that the mavericks in this experiment made 249 citations and were assigned a fitness value of 1.
keywords:
synthetic networks; agent based models; SASCA-ReS; citation networks
published:
2018-10-17
Price, Edward; Spyreas, Greg; Matthews, Jeffrey
(2018)
This is the dataset used in the Ecological Applications publication of the same name. This dataset consists of the following files:
Internal.Community.Data.txt
Regional.Community.Data.txt
Site.Attributes.txt
Year.Of.Final.Bio.Monitoring.txt
Internal.Community.Data.txt is a site and plot by species matrix. Column labeled SITE consists of site IDs. Column labeled Plot consists of Plot numbers. All other columns represent species relative abundances per plot.
Regional.Community.Data.txt is a site by species matrix of relative abundances. Column labeled site consists of site IDs. All other columns represent species relative abundances per site.
Site.attributes.txt is a matrix of site attributes. Column labeled SITE consists of site IDs. Column labeled Long represents longitude in decimal degrees. Column labeled Lat represents latitude in decimal degrees. Column labeled Richness represents species richness of sites calculated from Regional Community Data. Column labeled NAT_COMP_REST represents designation as a randomly selected natural wetland (NAT), compensation wetland (COMP) or reference quality natural wetland (REF).
Column labeled HQ_LQ_COMP represents designation as high quality (HQ), low quality (LQ) or compensation wetland (COMP). Column labeled SAMPLING_YEAR_INTERNAL represents year data used for analysis of internal β-diversity was gathered. Column labeled SAMPLING_YEAR_REGIONAL represents year data used for analysis of regional β-diversity was gathered. Column labeled TRANSECT_LENGTH represents length in meters of initial sampling transect. INAI_GRADE represents Illinois Natural Areas Inventory grades assigned to each site. Grades range from A for highest quality natural areas to E for lowest quality natural areas.
Year.Of.Final.Bio.Monitoring.txt is a table representing years of final monitoring of compensation wetlands as mandated by the US Army Corps of Engineers. Column labeled Site consists of site IDs. Column labeled YR_FIN_BIO_MON consists of years of final monitoring. Entries of N/A represent dates that were unable to be located.
More information about this dataset: Interested parties can request data from the Critical Trends Assessment Program, which was the source for data on naturally occurring wetlands in this study. More information on the program and data requests can be obtained by visiting the program webpage. Critical Trends Assessment Program, Illinois Natural History Survey. http://wwx.inhs.illinois.edu/research/ctap/
keywords:
biodiversity; wetlands; wetland mitigation; biotic homogenization; beta diversity
published:
2018-12-04
Wang, Yang; Dietrich, Christopher; Zhang, Yalin
(2018)
The text file contains the original data used in the phylogenetic analyses of Wang et al. (2017: Scientific Reports 7:45387). The text file is marked up according to the standard NEXUS format commonly used by various phylogenetic analysis software packages. The file will be parsed automatically by a variety of programs that recognize NEXUS as a standard bioinformatics file format. The first six lines of the file identify the file as NEXUS, indicate that the file contains data for 81 taxa (species) and 2905 characters, indicate that the first 2805 characters are DNA sequence and the last 100 are morphological, that the data may be interleaved (with data for one species on multiple rows), that gaps inserted into the DNA sequence alignment are indicated by a dash, and that missing data are indicated by a question mark. The file contains aligned nucleotide sequence data for 5 gene regions and 100 morphological characters. The identity and positions of data partitions are indicated in the mrbayes block of commands for the phylogenetic program MrBayes at the end of the file. The mrbayes block also contains instructions for MrBayes on various non-default settings for that program. These are explained in the original publication. Descriptions of the morphological characters and more details on the species and specimens included in the dataset are provided in the supplementary document included as a separate pdf. The original raw DNA sequence data are available from NCBI GenBank under the accession numbers indicated in the supplementary file.
keywords:
phylogeny; DNA sequence; morphology; Insecta; Hemiptera; Cicadellidae; leafhopper; evolution; 28S rDNA; wingless; histone H3; cytochrome oxidase I; bayesian analysis
published:
2025-10-30
Cao, Dang Viet; Luo, Guangbin; Korynta, Shelby; Liu, Hui; Liang, Yuanxue; Shanklin, John; Altpeter, Fredy
(2025)
Metabolic engineering for hyperaccumulation of lipids in vegetative tissues is a novel strategy for enhancing energy density and biofuel production from biomass crops. Energycane is a prime feedstock for this approach due to its high biomass production and resilience under marginal conditions. DIACYLGLYCEROL ACYLTRANSFERASE (DGAT) catalyzes the last and only committed step in the biosynthesis of triacylglycerol (TAG) and can be a rate-limiting enzyme for the production of TAG. In this study, we explored the effect of intron-mediated enhancement (IME) on the expression of DGAT1 and resulting accumulation of TAG and total fatty acid (TFA) in leaf and stem tissues of energycane. To maximize lipid accumulation these evaluations were carried out by co-expressing the lipogenic transcription factor WRINKLED1 (WRI1) and the TAG protect factor oleosin (OLE1). Including an intron in the codon-optimized TmDGAT1 elevated the accumulation of its transcript in leaves by seven times on average based on 5 transgenic lines for each construct. Plants with WRI1 (W), DGAT1 with intron (Di), and OLE1 (O) expression (WDiO) accumulated TAG up to a 3.85% of leaf dry weight (DW), a 192-fold increase compared to non-modified energycane (WT) and a 3.8-fold increase compared to the highest accumulation under the intron-less gene combination (WDO). This corresponded to TFA accumulation of up to 8.4% of leaf dry weight, a 2.8-fold or 6.1-fold increase compared to WDO or WT, respectively. Co-expression of WDiO resulted in stem accumulations of TAG up to 1.14% of DW or TFA up to 2.08% of DW that exceeded WT by 57-fold or 12-fold and WDO more than twofold, respectively. Constitutive expression of these lipogenic “push pull and protect” factors correlated with biomass reduction. Intron-mediated enhancement (IME) of the expression of DGAT resulted in a step change in lipid accumulation of energycane and confirmed that under our experimental conditions it is rate limiting for lipid accumulation. IME should be applied to other lipogenic factors and metabolic engineering strategies. The findings from this study may be valuable in developing a high biomass feedstock for commercial production of lipids and advanced biofuels.
keywords:
Feedstock Production;Lipidomics;Metabolomics
published:
2025-11-19
Salesse-Smith, Coralie; Adar, Noga; Kannan, Baskaran; Nguyen, Thaibinhduong; Wei, Wei; Guo, Minghao; Ge, Zhengxiang; Altpeter, Fredy; Clemente, Tom; Long, Stephen
(2025)
This repository includes data sets and R scripts that were used to perform analysis and produce figures for the following publication: Salesse-Smith, C. E. et al. “Adapting C4 photosynthesis to atmospheric change and increasing productivity by elevating Rubisco content in sorghum and sugarcane.” Proceedings of the National Academy of Sciences 122, e2419943122 (2025) doi:10.1073/pnas.2419943122.
keywords:
Feedstock Production;Biomass Analytics;Sorghum;Sugarcane
published:
2017-12-22
Scheidler, Andrew; Kinnett-Hopkins, Dominique; Learmonth, Yvonne; Motl, Robert; Lopez-Ortiz, Citlali
(2017)
TBP assessment raw data files of pre- and post- motion capture velocity and center of pressure force plate data. Labels are self-explanatory. The .mat files refer to data exported from the force plate for the time-to-stabilization assessments while the .txt files are the data collected for smoothness of gait assessments. These files do not relate to one another and are from separate assessments. Version2's files are the result from using Python code Data_Bank_Cleaner.py on version1's. Please find more information in READ_ME_databank.txt.
keywords:
Multiple Sclerosis; Rehabilitation; Balance; Ataxia; Ballet; Dance; Targeted Ballet Program
published:
2018-04-23
Provides links to Author-ity 2009, including records from principal investigators (on NIH and NSF grants), inventors on USPTO patents, and students/advisors on ProQuest dissertations.
Note that NIH and NSF differ in the type of fields they record and standards used (e.g., institution names). Typically an NSF grant spanning multiple years is associated with one record, while an NIH grant occurs in multiple records, for each fiscal year, sub-projects/supplements, possibly with different principal investigators.
The prior probability of match (i.e., that the author exists in Author-ity 2009) varies dramatically across NIH grants, NSF grants, and USPTO patents. The great majority of NIH principal investigators have one or more papers in PubMed but a minority of NSF principal investigators (except in biology) have papers in PubMed, and even fewer USPTO inventors do. This prior probability has been built into the calculation of match probabilities.
The NIH data were downloaded from NIH exporter and the older NIH CRISP files. The dataset has 2,353,387 records, only includes ones with match probability > 0.5, and has the following 12 fields:
1 app_id,
2 nih_full_proj_nbr,
3 nih_subproj_nbr,
4 fiscal_year
5 pi_position
6 nih_pi_names
7 org_name
8 org_city_name
9 org_bodypolitic_code
10 age: number of years since their first paper
11 prob: the match probability to au_id
12 au_id: Author-ity 2009 author ID
The NSF dataset has 262,452 records, only includes ones with match probability > 0.5, and the following 10 fields:
1 AwardId
2 fiscal_year
3 pi_position,
4 PrincipalInvestigators,
5 Institution,
6 InstitutionCity,
7 InstitutionState,
8 age: number of years since their first paper
9 prob: the match probability to au_id
10 au_id: Author-ity 2009 author ID
There are two files for USPTO because here we linked disambiguated authors in PubMed (from Author-ity 2009) with disambiguated inventors.
The USPTO linking dataset has 309,720 records, only includes ones with match probability > 0.5, and the following 3 fields
1 au_id: Author-ity 2009 author ID
2 inv_id: USPTO inventor ID
3 prob: the match probability of au_id vs inv_id
The disambiguated inventors file (uiuc_uspto.tsv) has 2,736,306 records, and has the following 7 fields
1 inv_id: USPTO inventor ID
2 is_lower
3 is_upper
4 fullnames
5 patents: patent IDs separated by '|'
6 first_app_yr
7 last_app_yr
keywords:
PubMed; USPTO; Principal investigator; Name disambiguation