Dataset Search

Displaying 76 - 100 of 1004 in total

Filters

Subject Area

Life Sciences (616)

Social Sciences (148)

Physical Sciences (143)

Technology and Engineering (84)

Uncategorized

Arts and Humanities (2)

Funder

Other (270)

U.S. National Science Foundation (NSF) (242)

U.S. Department of Energy (DOE) (239)

U.S. National Institutes of Health (NIH) (88)

U.S. Department of Agriculture (USDA) (62)

Illinois Department of Natural Resources (IDNR) (26)

U.S. Geological Survey (USGS) (8)

U.S. National Aeronautics and Space Administration (NASA) (6)

Illinois Department of Transportation (IDOT) (4)

U.S. Army (3)

Publication Year

2025 (288)

2021 (108)

2022 (106)

2024 (105)

2020 (96)

2023 (75)

2019 (72)

2018 (61)

2017 (36)

2016 (30)

2026 (22)

2009 (1)

2011 (1)

2012 (1)

2014 (1)

2015 (1)

License

CC BY (514)

CC0 (463)

custom (27)

Illinois Data Bank Dataset Search Results

Results

published: 2024-11-13

Nanoscale Stacking Fault Engineering and Mapping in Spinel Oxides for Reversible Multivalent Ion Insertion

Tang, Zhichu; Chen, Wenxiang; Yin, Kaijun; Busch, Robert; Hou, Hanyu; Lin, Oliver; Lyu, Zhiheng; Zhang, Cheng; Yang, Hong; Zuo, Jian-Min ; Chen, Qian (2024)

These datasets are for the four-dimensional scanning transmission electron microscopy (4D-STEM) and electron energy loss spectroscopy (EELS) experiments for cathode nanoparticles at different states. The raw 4D-STEM experiment datasets were collected by TEM image & analysis software (FEI) and were saved as SER files. The raw 4D-STEM datasets of SER files can be opened and viewed in MATLAB using our analysis software package of imToolBox available at https://github.com/flysteven/imToolBox. The raw EELS datasets were collected by DigitalMicrograph software and were saved as DM4 files. The raw EELS datasets can be opened and viewed in DigitalMicrograph software or using our analysis codes available at https://github.com/chenlabUIUC/OrientedPhaseDomain. All the datasets are from the work "Nanoscale Stacking Fault Engineering and Mapping in Spinel Oxides for Reversible Multivalent Ion Insertion" (2024). The 4D-STEM experiment data include four example datasets for cathode nanoparticles collected at pristine and discharged states. Each dataset contains a stack of diffraction patterns collected at different probe positions scanned across the cathode nanoparticle. 1. Pristine untreated nanoparticle: "Pristine U-NP.ser" 2. Pristine 200ºC heated nanoparticle: "Pristine H200-NP.ser" 3. Untreated nanoparticle after first discharge in Zn-ion batteries: "Discharged U-NP.ser" 4. 200ºC heated nanoparticle after first discharge in Zn-ion batteries: "Discharged H200-NP.ser" The EELS experiment data includes six example datasets for cathode nanoparticles collected at different states (in "EELS datasets.zip") as described below. Each EELS dataset contains the zero-loss and core-loss EELS spectra collected at different probe positions scanned across the cathode nanoparticle. 1. Pristine untreated nanoparticle: "Pristine U-NP EELS.zip" 2. Pristine 200ºC heated nanoparticle: "Prisitne H200-NP EELS.zip" 3. Untreated nanoparticle after first discharge in Zn-ion batteries: "Discharged U-NP EELS.zip" 4. Untreated nanoparticle after first charge in Zn-ion batteries: "Charged U-NP EELS.zip" 5. 200ºC heated nanoparticle after first discharge in Zn-ion batteries: "Discharged H200-NP EELS.zip" 6. 200ºC heated nanoparticle after first charge in Zn-ion batteries: "Charged H200-NP EELS.zip" The details of the software package and codes that can be used to analyze the 4D-STEM datasets and EELS datasets are available at: https://github.com/chenlabUIUC/OrientedPhaseDomain. Once our paper is formally published, we will update the relationship of these datasets with our paper.

keywords: 4D-STEM; EELS; defects; strain; cathode; nanoparticle; energy storage

published: 2025-06-26

Data for Urban Electric Vehicle Infrastructure: Strategic Planning for Curbside Charging

Zhang, Ruolin; Kontou, Eleftheria (2025)

This dataset supports the analysis presented in the study on curbside electric vehicle (EV) charging infrastructure planning in San Francisco and the published paper titled "Urban electric vehicle infrastructure: Strategic planning for curbside charging." It includes spatial data layers and tabular data used to evaluate location suitability under multiple criteria, such as demand, accessibility, and environmental benefits. This dataset can be used to replicate the multi-criteria decision-making framework, perform additional spatial analyses, or inform policy decisions related to EV infrastructure siting in urban environments. The paper's DOI is https://doi.org/10.1016/j.jtrangeo.2025.104328.

keywords: Electric Vehicles; Curbside Charging Stations; Multi-Criteria Decision-Making; Suitability Analysis; Urban Infrastructure

published: 2025-09-08

Data for "Field-scale Evaluation of Ecosystem Service Benefits of Bioenergy Switchgrass"

Lee, DoKyoung; Heaton, Emily; Umar, Muhammad; Jang, Chunhwa; Namoi, Nictor (2025)

Purpose-grown perennial herbaceous species are nonfood crops specifically cultivated for bioenergy production and have the potential to secure bioenergy feedstock resources while enhancing ecosystem services. This study assessed soil greenhouse gas emissions (CO2 and N2O), nitrate (NO3-N) leaching reduction potential, evapotranspiration (ET), and water-use efficiency (WUE) of bioenergy switchgrass (Panicum virgatum L.) in comparison to corn (Zea mays L.). The study was conducted on field-scale plots in Urbana, IL, during the 2020–2022 growing seasons. Switchgrass was established in 2020 and urea-fertilized at 56 kg N ha−1 year−1. Corn management followed best management practices for the US Midwest, including no-till and 202 kg N ha−1 year−1 fertilization, applied as urea–ammonium nitrate (32%). Our results showed lower direct N2O emissions in switchgrass compared to corn. Although soil CO2 emissions did not differ significantly during the establishment year, emissions in subsequent years were over 50% higher in switchgrass than in corn, likely due to increased belowground biomass, which was over five times higher in switchgrass. Nitrate-N leaching decreased as the switchgrass stand matured, reaching 80% lower than in corn by the third year. Differences in ET and WUE between corn and switchgrass were not significant; however, results indicate a trend toward reduced WUE in switchgrass under drought, driven by lower aboveground biomass production. Our study demonstrates that switchgrass can be implemented at a commercial scale without negatively impacting the hydrological cycle, while potentially reducing N losses through nitrate-N leaching and soil N2O emissions, and enhancing belowground C storage.

keywords: field data; perennial bioenergy grasses; soil; switchgrass

published: 2025-12-23

study of liquid suction cup detachment mechanism

Aly, Abdallah; A. Saif, M. Taher (2025)

The uploaded data is part of the paper titled: Self-Modifying Percolation Governs Detachment in Soft Suction Wet Adhesion, which shows the detachment mechanism of liquid suction-based adhesion.

published: 2020-11-18

Dataset for: "A Dual-Frequency Radar Retrieval of Snowfall Properties Using a Neural Network"

Chase, Randy (2020)

This is the dataset that accompanies the paper titled "A Dual-Frequency Radar Retrieval of Snowfall Properties Using a Neural Network", submitted for peer review in August 2020. Please see the github for the most up-to-date data after the revision process: https://github.com/dopplerchase/Chase_et_al_2021_NN Authors: Randy J. Chase, Stephen W. Nesbitt and Greg M. McFarquhar Corresponding author: Randy J. Chase (randyjc2@illinois.edu) Here we have the data used in the manuscript. Please email me if you have specific questions about units etc. 1) DDA/GMM database of scattering properties: base_df_DDA.csv This is the combined dataset from the following papers: Leinonen & Moisseev, 2015; Leinonen & Szyrmer, 2015; Lu et al., 2016; Kuo et al., 2016; Eriksson et al., 2018. The column names are D: Maximum dimension in meters, M: particle mass in grams kg, sigma_ku: backscatter cross-section at ku in m^2, sigma_ka: backscatter cross-section at ka in m^2, sigma_w: backscatter cross-section at w in m^2. The first column is just an index column. 2) Synthetic Data used to train and test the neural network: Unrimed_simulation_wholespecturm_train_V2.nc, Unrimed_simulation_wholespecturm_test_V2.nc This was the result of combining the PSDs and DDA/GMM particles randomly to build the training and test dataset. 3) Notebook for training the network using the synthetic database and Google Colab (tensorflow): Train_Neural_Network_Chase2020.ipynb This is the notebook used to train the neural network. 4)Trained tensorflow neural network: NN_6by8.h5 This is the hdf5 tensorflow model that resulted from the training. You will need this to run the retrieval. 5) Scalers needed to apply the neural network: scaler_X_V2.pkl, scaler_y_V2.pkl These are the sklearn scalers used in training the neural network. You will need these to scale your data if you wish to run the retrieval. 6) <b>New in this version</b> - Example notebook of how to run the trained neural network on Ku- Ka- band observations. We showed this with the 3rd case in the paper: Run_Chase2021_NN.ipynb 7) <b>New in this version</b> - APR data used to show how to run the neural network retrieval: Chase_2021_NN_APR03Dec2015.nc The data for the analysis on the observations are not provided here because of the size of the radar data. Please see the GHRC website (<a href="https://ghrc.nsstc.nasa.gov/home/">https://ghrc.nsstc.nasa.gov/home/</a>) if you wish to download the radar and in-situ data or contact me. We can coordinate transferring the exact datafiles used. The GPM-DPR data are avail. here: <a href="http://dx.doi.org/10.5067/GPM/DPR/GPM/2A/05">http://dx.doi.org/10.5067/GPM/DPR/GPM/2A/05</a>

published: 2021-03-14

Spatial accessibility of COVID-19 healthcare resources in Illinois, USA

Kang, Jeon-Young; Michels, Alexander; Lyu, Fangzheng; Wang, Shaohua; Agbodo, Nelson; Freeman, Vincent L; Wang, Shaowen; Anand, Padmanabhan (2021)

This dataset contains all the code, notebooks, datasets used in the study conducted to measure the spatial accessibility of COVID-19 healthcare resources with a particular focus on Illinois, USA. Specifically, the dataset measures spatial access for people to hospitals and ICU beds in Illinois. The spatial accessibility is measured by the use of an enhanced two-step floating catchment area (E2FCA) method (Luo & Qi, 2009), which is an outcome of interactions between demands (i.e, # of potential patients; people) and supply (i.e., # of beds or physicians). The result is a map of spatial accessibility to hospital beds. It identifies which regions need more healthcare resources, such as the number of ICU beds and ventilators. This notebook serves as a guideline of which areas need more beds in the fight against COVID-19. ## What's Inside A quick explanation of the components of the zip file * `COVID-19Acc.ipynb` is a notebook for calculating spatial accessibility and `COVID-19Acc.html` is an export of the notebook as HTML. * `Data` contains all of the data necessary for calculations:       * `Chicago_Network.graphml`/`Illinois_Network.graphml` are GraphML files of the OSMNX street networks for Chicago and Illinois respectively.       * `GridFile/` has hexagonal gridfiles for Chicago and Illinois       * `HospitalData/` has shapefiles for the hospitals in Chicago and Illinois       * `IL_zip_covid19/COVIDZip.json` has JSON file which contains COVID cases by zip code from IDPH       * `PopData/` contains population data for Chicago and Illinois by census tract and zip code.       * `Result/` is where we write out the results of the spatial accessibility measures       * `SVI/`contains data about the Social Vulnerability Index (SVI) * `img/` contains some images and HTML maps of the hospitals (the notebook generates the maps) * `README.md` is the document you're currently reading! * `requirements.txt` is a list of Python packages necessary to use the notebook (besides Jupyter/IPython). You can install the packages with `python3 -m pip install -r requirements.txt`

keywords: COVID-19; spatial accessibility; CyberGISX

published: 2023-01-12

Processing and Pearson Correlation Scripts for the C&RL Article on the Relationships between Publication, Citation, and Usage Metrics at the University of Illinois at Urbana-Champaign Library

Mischo, William; Schlembach, Mary C. (2023)

These processing and Pearson correlational scripts were developed to support the study that examined the correlational relationships between local journal authorship, local and external citation counts, full-text downloads, link-resolver clicks, and four global journal impact factor indices within an all-disciplines journal collection of 12,200 titles and six subject subsets at the University of Illinois at Urbana-Champaign (UIUC) Library. This study shows strong correlations in the all-disciplines set and most subject subsets. Special processing scripts and web site dashboards were created, including Pearson correlational analysis scripts for reading values from relational databases and displaying tabular results. The raw data used in this analysis, in the form of relational database tables with multiple columns, is available at <a href="https://doi.org/10.13012/B2IDB-6810203_V1">https://doi.org/10.13012/B2IDB-6810203_V1</a>.

keywords: Pearson Correlation Analysis Scripts; Journal Publication; Citation and Usage Data; University of Illinois at Urbana-Champaign Scholarly Communication

published: 2025-08-07

Table data and supplementary info for "Historical land management alters new soil carbon inputs by annual and perennial bioenergy crops"

Keiser, Ashley D.; Heaton, Emily; VanLoocke, Andrew; Studt, Jacob; McDaniel, Marshall D. (2025)

Bioenergy and bioproduct markets are expanding to meet demand for climate friendly goods and services. Perennial biomass crops are particularly well suited for this goal because of their high yields, low input requirements, and potential to increase soil carbon (C). However, it is unclear how much C is allocated into belowground pools by perennial bioenergy crops and whether the belowground benefits vary with nitrogen (N) fertilizer inputs. Using in situ 13C pulse-chase labeling, we tested whether the sterile perennial grass Miscanthus × giganteus (miscanthus) or annual maize transfers more photosynthetic C to belowground pools. The experiment took place at two sites in Central and Northwest (NW) Iowa with different management histories and two nitrogen (N) fertilizer rates (0 and 224 kg N ha-1 yr-1) to determine if the fate of plant-derived soil C depends on soil fertility and crop type (perennial or annual). Maize allocated a greater percentage of total new 13C to roots than miscanthus, but miscanthus had greater new 13C in total and belowground plant biomass. We found strong interactions between site and most soil measurements – including new 13C in mineral and particulate soil organic matter (SOM) pools –which appear to be driven by differences in historical fertilizer management. The NW Iowa site, with a history of manure inputs, had greater plant-available nutrients (phosphorus, potassium, and ammonium) in soils, and resulted in less 13C from miscanthus in SOM pools compared to maize (approximately 64% less in POM and 70% less in MAOM). In more nutrient-limited soils (Central site), miscanthus transferred 4.5 times more 13C than maize to the more stable mineral-associated SOM pool. Our results suggest that past management, including historical manure inputs that affect a site’s soil fertility, can influence the net C benefits of bioenergy crops. Dataset includes tables/figures from article and supplementary info. Dryad contains raw data.

keywords: land management; carbon; miscanthus; maize

published: 2025-10-07

Metabolic Engineering of the Oleaginous Yeast Yarrowia lipolytica PO1f for Production of Erythritol from Glycerol

Jagtap, Sujit Sadashiv; Bedekar, Ashwini Ashok; Singh, Vijay; Jin, Yong-Su; Rao, Christopher V. (2025)

Yarrowia lipolytica was found natively to produce erythritol, mannitol, and arabitol during growth on glucose, fructose, mannose, and glycerol. Osmotic stress is known to increase sugar alcohol production, and was found to significantly increase erythritol production during growth on glycerol. To better understand erythritol production from glycerol, since it was the most promising sugar alcohol, we measured the expression of key genes and intracellular metabolites. Osmotic stress increased the expression of several key genes in the glycerol catabolic pathway and the pentose phosphate pathway. Analysis of intracellular metabolites revealed that amino acids, sugar alcohols, and polyamines are produced at higher levels in response to osmotic stress. Heterologous overexpression of the sugar alcohol phosphatase increased erythritol production and glycerol utilization in Y. lipolytica. We further increased erythritol production by increasing the expression of native glycerol kinase (GK), and transketolase (TKL). These data show the growth and titers produced.

keywords: Conversion;Genome Engineering

published: 2025-11-13

BEPAM code and results for the publication 'Supplementing Biofuel Mandates with a Carbon Mitigation Policy Can Lead to Water Quality Co-benefits'

Fan, Xinxin; Khanna, Madhu; Hartman, Theodore; VanLoocke, Andy (2025)

The dataset consists of: (1) The replication codes and data for the BEPAM model are contained in the "BEPAM_Supplementary Environment Policy Analysis.zip" (2) Simulation results from the BEPAM model are contained in "ModelOutputs.zip" under the "BEPAM_Supplementary Environment Policy Analysis.zip"

published: 2025-09-08

Data for "Integrated Green Biorefinery for the Production of Anthocyanins, Fermentable Sugars, and High Pure Lignin from Miscanthus × giganteus"

Singh, Vijay; Raj, Tirath (2025)

Miscanthus x giganteus (Mxg) is a promising perennial crop for producing natural colorants, renewable fuels, and bioproducts. However, natural recalcitrance and high pretreatment cost are major barriers to their complete conversion. In this study, a green processing method has been investigated for efficient recovery of natural pigments (anthocyanins), fermentable sugars, and pure lignin from Mxg genotypes using choline chloride-based natural deep eutectic solvents (NADES) systems. Interestingly, choline chloride: lactic acid (ChCl: LA) NADES-processed biomass resulted in 67.8 ± 2.1 μg g−1 of anthocyanins from dry biomass. A maximum of 87.4%–94.1% glucose yield was achieved after enzymatic saccharification. The effective extraction of lignin with high purity with higher β-aryl ether (βO4) bonds from advanced crops is crucial for lignin valorization. Notably, highly pure lignin (≈93.4% ± 1.4%) is achieved after low-temperature NADES pretreatment while retaining lignin’s native structure. 31P nuclear magnetic resonance demonstrated that total phenolics for ChCl: LA-lignin resulted in 1.20 mmol g−1 hydroxyls. The relative monolignol composition of syringyl (S), guaiacyl (G), and p-hydroxyphenyl (H) is 19.0, 65.7, and 14.3%, respectively, as evidenced by heteronuclear single quantum coherence analysis. This study provides a novel approach for obtaining high-purity lignin for catalytic depolymerization for oligomers and bifunctional monoaromatics production and leverages current cellulosic biorefinery technologies.

keywords: biomass analytics; feedstock bioprocessing; inter-brc; miscanthus

published: 2017-09-08

Data from: Magnetic response of brickwork artificial spin ice

Park, Jungsik; Le, Brian; Sklenar, Joseph; Chern, Gia-wei; Watts, Justin; Schiffer, Peter (2017)

Transport and MFM data of brickwork artificial spin ice composed of permalloy are included, which are reproductions of the data in an article named "Magnetic response of brickwork artificial spin ice". Transport data represent magnetic response of connected brickwork artificial spin ice, and MFM data represent how both connected and disconnected brickwork artificial spin ice react to external magnetic fields. SEM images of typical samples are included, where individual nanowire leg (island) is approximately 660 nm long and 140 nm wide with a 40 nm thickness. For the transport, each sample was measured in a longitudinal and a transverse geometry. Red curves are the 2500 Oe to -2500 Oe sweeps and the blue curves are -2500 Oe to 2500 Oe sweeps. Transport measurements were taken by using a standard 4-wire technique. Each plot was saved in pdf format.

keywords: Magnetotransport

published: 2023-12-20

Integrative Multiscale Biochemical Mapping of the Brain via Deep-Learning-Enhanced High-Throughput Mass Spectrometry

Xie, Yuxuan Richard; Castro, Daniel C.; Rubakhin, Stanislav S.; Trinklein, Timothy J.; Sweedler, Jonathan V.; Fan, Lam (2023)

Important Note: the raw transient files need to be downloaded through this separate link: https://uofi.box.com/s/oagdxhea1wi8tvfij4robj0z0w8wq7j4. Once downloaded, place the file within the within the .d folder in the unzipped 20210930_ShortTransient_S3_5 folder to perform reconstruction step. The minimal datasets to run the computational pipeline MEISTER introduced in the manuscript titled "Integrative Multiscale Biochemical Mapping of the Brain via Deep-Learning-Enhanced High-Throughput Mass Spectrometry". The key steps of our computational pipeline include (1) tissue mass spectrometry imaging (MSI) reconstruction; (2) multimodal image registration and 3D reconstruction; (3) regional analysis; and (4) single-cell and tissue data integration. Detailed protocols to reproduce our results in the manuscript are provided with an example data set shared for learning the protocols. Our computational processing codes are implemented mostly in Python as well as MATLAB (for image registration).

keywords: deep learning;mass spectrometry;single cells

published: 2023-08-03

Data for Zombie leaves: novel repurposing of senescent fronds in the tree fern Cyathea rojasiana for nutrient uptake in a tropical montane forest

Dalling, James William (2023)

This file contains the delta 15N values for leaf material collected from Cyathea rojasiana tree ferns before and after fertilization using ammonium -15N chloride solution to determine whether 15N update is possible from senescent leaves. Details of the experiment are provided in the online supplement to the published paper. Briefly, In February 2022 we selected three mature C. rojasiana individuals 1-1.5m in height that had leaves rooted in the soil and one new developing (but unexpanded) leaf. For each fern, two plastic pots (10 x 10 x 12 cm) were filled with a 50:50 mixture of washed river sand and soil from the Chorro watershed. For each pot, one senescent leaf that was rooted in the soil was carefully excavated and its roots transplanted into the pot. Pots were then fertilized by adding 30 ml of a 0.02 M 15N solution of ammonium-15N chloride (98% 15N; Sigma-Aldrich 299251; St Louis, MO) to yield a target concentration of 2 µg15N cm-3 of soil. After fertilization pots were carefully enclosed within thick plastic bags, and sealed around the senescent leaf rachis to prevent leaching any of 15N from the pot to the surrounding soil. At the time of N fertilization, pinnae of the youngest fully expanded leaf were collected from each fern. One pinna was collected from the base of the leaf and one from the distal end of the leaf. In March 2022, after 28 days the roots were removed from pots and two additional leaf pinnae sampled from each fern: one from the base and one from the distal end of the youngest (now fully expanded) leaf. Leaf samples were dried for 72 hours at 60 C and then leaf lamina tissue finely ground with a bead beater. The delta 15N for each leaf sample determined at the University of Illinois, Urbana-Champaign using a Thermo Delta V Advantage IRMS run in combination with a Costech 4010 Elemental Analyzer. Samples were run in continuous flow relative to laboratory standards that were calibrated with USGS 40, 41, and NBS 19 reference materials.

keywords: 15N; Cyathea rojasiana; N fertilization; montane forest

published: 2025-11-18

RNA-seq analysis of the effects of stimuli and sex on the amygdala

Rodriguez-Zas, Sandra (2025)

The data set correspponds to gene expression measurements from an RNA-seq experiment profiling the amygdala of pigs representing 3 stimuli and 2 sexes. The experiment was approved by IACUC. Information on ~ 12,000 genes (rows) across 36 samples (36 columns) and a column for gene identification are included in the dataset. A readme, and metadata and a license files are being uploaded with the compressed data file.

keywords: RNA-seq; stimuli; sex; amygdala

published: 2025-09-15

Data from Highly Efficient Single-Pot Scarless Golden Gate Assembly

HamediRad, Mohammad; Weisberg, Scott; Chao, Ran; Lian, Jiazhang; Zhao, Huimin (2025)

Golden Gate assembly is one of the most widely used DNA assembly methods due to its robustness and modularity. However, despite its popularity, the need for BsaI-free parts, the introduction of scars between junctions, as well as the lack of a comprehensive study on the linkers hinders its more widespread use. Here, we first developed a novel sequencing scheme to test the efficiency and specificity of 96 linkers of 4-bp length and experimentally verified these linkers and their effects on Golden Gate assembly efficiency and specificity. We then used this sequencing data to generate 200 distinct linker sets that can be used by the community to perform efficient Golden Gate assemblies of different sizes and complexity. We also present a single-pot scarless Golden Gate assembly and BsaI removal scheme and its accompanying assembly design software to perform point mutations and Golden Gate assembly. This assembly scheme enables scarless assembly without compromising efficiency by choosing optimized linkers near assembly junctions.

keywords: Conversion;Genome Engineering;Genomics

published: 2025-09-15

Data from Development of a CRISPR/Cas9 System for High-Efficiency Multiplexed Gene Deletion in Rhodosporidium toruloides

Schultz, J. Carl; Cao, Mingfeng; Zhao, Huimin (2025)

The oleaginous yeast Rhodosporidium toruloides is considered a promising candidate for production of chemicals and biofuels thanks to its ability to grow on lignocellulosic biomass, and its high production of lipids and carotenoids. However, efforts to engineer this organism are hindered by a lack of suitable genetic tools. Here we report the development of a CRISPR/Cas9 system for genome editing in R. toruloides based on a fusion 5S rRNA–tRNA promoter for guide RNA (gRNA) expression, capable of greater than 95% gene knockout for various genetic targets. Additionally, multiplexed double‐gene knockout mutants were obtained using this method with an efficiency of 78%. This tool can be used to accelerate future metabolic engineering work in this yeast.

keywords: Conversion;Genome Engineering;Genomics;Transcriptomics

published: 2023-08-04

Genetic, demographic, and spatial information for a study of Phlox pilosa ssp. sangamonensis, and congeners

Zinnen, Jack; Matthews, Jeffrey W.; Zaya, David N. (2023)

Data are provided that are relevant to the rare plant Phlox pilosa ssp. sangamonensis, or Sangamon phlox, and other members of the genus that occur in its native range. Sangamon phlox is a state-endangered subspecies that is only known to occur in two Illinois counties. Data provided come from all known Sangamon phlox populations, which we estimate as 10 separate populations. Data include genetic data from DNA microsatellite loci (allele sizes and basic summaries), flowering population size estimates, rates of fruit set, and rates of seed set. Additionally, genetic data (from microsatellites) are provided for Phlox divaricata ssp. laphamii (three populations), Phlox pilosa ssp. pilosa (two populations), and Phlox pilosa ssp. fulgida (two populations).

keywords: Phlox; conservation genetics; microsatellites; endemism; rare plants

published: 2025-09-11

Data from pH Selectively Regulates Citric Acid and Lipid Production in Yarrowia lipolytica W29 During Nitrogen-limited Growth on Glucose

Zhang, Shuyan; Jagtap, Sujit; Deewan, Anshu; Rao, Christopher V. (2025)

Yarrowia lipolytica has been used to produce both citric acid and lipid-based bioproducts at high titers. In this study, we found that pH differentially affects citric acid and lipid production in Y. lipolytica W29, with citric acid production enhanced at more neutral pH’s and lipid production enhanced at more acid pH’s. To determine the mechanism governing this pH-dependent switch between citric acid and lipid production, we profiled gene expression at different pH’s and found that the relative expression of multiple transporters is increased at neutral pH. These results suggest that this pH-dependent switch is mediated at the level of citric acid transport rather than changes in the expression of the enzymes involved in citric acid and lipid metabolism. In further support of this mechanism, thermodynamic calculations suggest that citric acid secretion is more energetically favorable at neutral pH’s, assuming the fully protonated acid is the substrate for secretion. Collectively, these results provide new insights regarding citric acid and lipid production in Y. lipolytica and may offer new strategies for metabolic engineering and process design.

keywords: Conversion;RNA Sequencing;Transcriptomics

published: 2025-11-24

Data for Controlling Circuitry Underlies the Growth Optimization of Saccharomyces cerevisiae

Nguyen, Viviana; Xue, Pu; Li, Yifei; Zhao, Huimin; Lu, Ting (2025)

Microbial growth emerges from coordinated synthesis of various cellular components from limited resources. In Saccharomyces cerevisiae, cyclic AMP (cAMP)-mediated signaling is shown to orchestrate cellular metabolism; however, it remains unclear quantitatively how the controlling circuit drives resource partition and subsequently shapes biomass growth. Here we combined experiment with mathematical modeling to dissect the signaling-mediated growth optimization of S. cerevisiae. We showed that, through cAMP-mediated control, the organism achieves maximal or nearly maximal steady-state growth during the utilization of multiple tested substrates as well as under perturbations impairing glucose uptake. However, the optimal cAMP concentration varies across cases, suggesting that different modes of resource allocation are adopted for varied conditions. Under settings with nutrient alterations, S. cerevisiae tunes its cAMP level to dynamically reprogram itself to realize rapid adaptation. Moreover, to achieve growth maximization, cells employ additional regulatory systems such as the GCN2-mediated amino acid control. This study establishes a systematic understanding of global resource allocation in S. cerevisiae, providing insights into quantitative yeast physiology as well as metabolic strain engineering for biotechnological applications.

keywords: Conversion;Metabolomics;Modeling

published: 2025-10-15

Radar analyzed quasi-linear convective system mesovortices during the Propagation, Evolution, and Rotation in Linear Storms (PERiLS) Project

Blind-Doskocil, Leanne; Trapp, Robert J.; Nesbitt, Stephen W. (2025)

This is a collection of 31 quasi-linear convective system (QLCS) mesovortices (MVs) that were first manually identified and analyzed using the lowest elevation scan of the nearest relevant Weather Surveillance Radar–1988 Doppler (WSR-88D) during the two years (springs of 2022 and 2023) of the Propagation, Evolution, and Rotation in Linear Storms (PERiLS) field campaign. This analysis was completed using the Gibson Ridge radar-viewing software (GR2Analyst). Throughout the two years of PERiLS, a total of nine intensive observing periods (IOPs) occurred (see https://catalog.eol.ucar.edu/perils_2022/missions and https://catalog.eol.ucar.edu/perils_2023/missions for exact IOP dates/times). However, only six of these IOPs (specifically, IOPs 2, 3, and 4 from both years) are included in this dataset. The inclusion criteria were based on the presence of strictly QLCS MVs that from a cursory analysis were within the C-band On Wheels (COW) domain, one of the research radars deployed in the field for the PERiLS project. The 31 QLCS MVs identified using WSR-88D data were also examined using data from the COW radar (using Solo3 software). The lowest elevation angle was not always useable in the COW data, and sometimes the second lowest elevation angle was used. Further details on how MVs were identified are provided below, and a very detailed methodology is published in Blind-Doskocil et al. (2025). Each MV had to be produced by a QLCS, defined as a continuous area of 35 dBZ radar reflectivity over at least 100 km when viewed from the lowest elevation scan. The MVs analyzed also had to pass through/near the COW’s domain at some point during their lifetimes to allow for additional analysis using the COW data. Tornadic (TOR), wind-damaging (WD), and non-damaging (ND) MVs were analyzed over their entire lifetime and subsequently during the pretornadic, predamaging (wind damage), and prewarning phase (classified altogether as the prephase) of each MV. The prephase MVs were classified based on the first damage report or lack thereof associated with them. ND MVs were ones that usually had a tornado warning placed on them (all but one case) but did not produce any damage and persisted for five or more radar scans; this was done to target the strongest MVs that forecasters thought could be tornadic. The QLCS MVs were identified using objective criteria, which included the existence of a circulation with a maximum differential velocity (dV; i.e., the difference between the maximum outbound and minimum inbound velocities at a constant range) of at least 20 kt over a distance ≤ 7 km. The following radar-based characteristics were catalogued for each QLCS MV at the lowest elevation angle of the nearest WSR-88D: latitude and longitude locations of the MV, the genesis to decay time of the MV, the maximum dV across the MV, the maximum rotational velocity (Vrot; i.e., dV divided by two), diameter of the MV, the range from the radar of the MV center, and the height above radar level of the MV center. In the Excel workbook titled “nexrad_analyzed_mvs_perils_illinois_data_bank”, there are a total of 36 sheets. 31 of the 36 sheets are for each MV that was examined. The 31 MV sheets that were used to calculate MV statistics are labeled following the convention 'mv#_iop#_qlcs'. ‘mv#’ is the unique number that was assigned to each MV for clear identification, 'iop#' is the IOP in which the MV occurred, 'qlcs' denotes that the MV was produced by a QLCS, and the 2023 IOPs are denoted by ‘_2023’ after ‘qlcs’ in the sheet name. In these sheets, there are notes on what was visually seen in the radar data, damage associated with each MV (using the National Centers for Environmental Information (NCEI) database), and the characteristics of the MV at each time step of its lifetime. The yellow rows in each of the sheets indicate the last row of data included in the prephase statistics. The orange boxes in the notes column indicate any reports that were in NCEI but not in GR2Analyst. There are also sheets that examine pretornadic and predamaging diameter trends; box and whisker plot statistics of the overall characteristics of the different types of MVs; and the overall characteristics of each MV, with one Excel sheet (‘combined_qlcs_mvs’) examining the characteristics of each MV over its entire lifetime and one Excel sheet (‘combined_qlcs_mvs_before_report’) examining the characteristics of each MV before it first produced damage or had a tornado warning placed on it. In the Excel workbook titled “cow_analyzed_mvs_perils_illinois_data_bank”, there are a total of 33 sheets. 31 of the 33 sheets are for each MV that was examined, with a similar naming convention to those analyzed using WSR-88D data. The data documented in each sheet is also similar to that in the WSR-88D sheets. Due to the very tedious and time-consuming nature of analyzing radar data manually, we mainly focused on cataloging only the times where the MVs were detectable in the COW data during the prephase. In the WSR-88D data, we examined the MVs over their entire lifetimes and during their prephases. Not all the MVs analyzed in the WSR-88D data ended up being detectable in the COW data, and we focused on comparing the prephase MVs in the COW data and WSR-88D data. Therefore, there are sheets that are missing values and note that the MV was not in the COW’s domain, not detectable during the prephase, only focused on cataloging the prephase, etc. There are also sheets that examine characteristics of each MV during the prephase (‘combined_qlcs_mvs_before_report’) and box and whisker plot statistics of the prephase characteristics of the MVs (‘box_whisker_stats).

keywords: quasi-linear convective system; QLCS; tornado; radar; mesovortex; PERiLS; low-level rotation; tornadic; nontornadic; wind-damaging; Propagation, Evolution, and Rotation in Linear Storms; tornado warning; C-band On Wheels

published: 2018-07-29

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

Molloy, Erin K.; Warnow, Tandy (2018)

This repository includes scripts, datasets, and supplementary materials for the study, "NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees", presented at RECOMB-CG 2018. The supplementary figures and tables referenced in the main paper can be found in njmerge-supplementary-materials.pdf. The latest version of NJMerge can be downloaded from Github: https://github.com/ekmolloy/njmerge. ***When downloading datasets, please note that the following errors.*** In README.txt, lines 37 and 38 should read: + fasttree-exon.tre contains lines 1-25, 1-100, or 1-1000 of fasttree-total.tre + fasttree-intron.tre contains lines 26-50, 101-200, or 1001-2000 of fasttree-total.tre Note that the file names (fasttree-exon.tre and fasttree-intron.tre) are swapped. In tools.zip, the compare_trees.py and the compare_tree_lists.py scripts incorrectly refer to the "symmetric difference error rate" as the "Robinson-Foulds error rate". Because the normalized symmetric difference and the normalized Robinson-Foulds distance are equal for binary trees, this does not impact the species tree error rates reported in the study. This could impact the gene tree error rates reported in the study (see data-gene-trees.csv in data.zip), as FastTree-2 returns trees with polytomies whenever 3 or more sequences in the input alignment are identical. Note that the normalized symmetric difference is always greater than or equal to the normalized Robinson-Foulds distance, so the gene tree error rates reported in the study are more conservative. In njmerge-supplementary-materials.pdf, the alpha parameter shown in Supplementary Table S2 is actually the divisor D, which is used to compute alpha for each gene as follows. 1. For each gene, a random value X between 0 and 1 is drawn from a uniform distribution. 2. Alpha is computed as -log(X) / D, where D is 4.2 for exons, 1.0 for UCEs, and 0.4 for introns (as stated in Table S2). Note that because the mean of the uniform distribution (between 0 and 1) is 0.5, the mean alpha value is -log(0.5) / 4.2 = 0.16 for exons, -log(0.5) / 1.0 = 0.69 for UCEs, and -log(0.5) / 0.4 = 1.73 for introns.

keywords: phylogenomics; species trees; incomplete lineage sorting; divide-and-conquer

published: 2025-11-19

Data for Redefining the Product Portfolio of Oilcane Bagasse Biorefinery: Recovering Natural Colorants, Vegetative Lipids and Sugars

Banerjee, Shivali; Beraja, Galit; Eilts, Kristen; Singh, Vijay (2025)

:Bioenergy crops have been known for their ability to produce biofuels and bioproducts. In this study, the product portfolio of recently developed transgenic sugarcane (oilcane) bagasse has been redefined for recovering natural pigments (anthocyanins), sugars, and vegetative lipids. The total anthocyanin content in oilcane bagasse has been estimated as 92.9 ± 18.9 µg/g of dried bagasse with cyanidin-3-glucoside (13.5 ± 18.9 µg per g of dried bagasse) as the most prominent anthocyanin present. More than 85 % (w/w) of the total anthocyanins were recovered from oilcane bagasse at a pretreatment temperature of 150 °C for 15 min. These conditions for the hydrothermal pretreatment also led to a 2-fold increase in the glucose yield upon the enzymatic saccharification of the pretreated bagasse. Further, a 1.5-fold enrichment of the vegetative lipids was demonstrated in the pretreated residue. Re-defining green biorefineries with multiple high-value products in a zero-waste approach is the need of the hour for attaining sustainability.

keywords: Conversion;Biomass Analytics;Bioproducts;Biorefinery;Oilcane

published: 2025-09-30

Data from Technical and Economic Feasibility of an Integrated Ethanol and Anthocyanin Coproduction Process Using Purple Corn Stover

Kurambhatti, Chinmay V.; Kumar, Deepak; Singh, Vijay (2025)

The coproduction of high-value anthocyanin extract in the cellulosic ethanol process would diversify the co-product market, increase revenue, and potentially improve the economics of the process. The high anthocyanin concentration in the cob and structural carbohydrates in residual stover make purple corn stover an attractive source for anthocyanin and ethanol coproduction. This study aimed to develop simulation models for processes integrating ethanol production and anthocyanin extraction using purple corn stover, to evaluate their techno-economic feasibility, and to compare their performance with the conventional ethanol production process using corn stover. The annual ethanol production for plants processing 2000 MT dry feedstock / day was 148.6 million L/year for the integrated processes compared with 222.6 million L/year for the conventional process. Anthocyanin production in the modified processes using dilute acid-based and water-based anthocyanin extraction processes was 1779 and 1099 MT/year, respectively. Capital investments for the integrated processes ($448.1 to $443.8 million) were higher than the conventional process ($371.9 million). Due to high revenue from anthocyanin extract, the ethanol production cost for the integrated process using acid-based anthocyanin extraction ($0.36/L) was 34.5% lower than conventional ethanol production ($0.55/L). The ethanol production cost for the integrated process using water-based anthocyanin extraction ($0.68/L) was higher than conventional ethanol production due to low ethanol and anthocyanin yields. The minimum ethanol selling price for the integrated process using acid-based anthocyanin extraction ($0.65/L) was also lower than the conventional process ($0.72/L), indicating an improvement in economic performance.

keywords: Conversion;Economics;Feedstock Bioprocessing;Modeling

published: 2022-02-11

Time-lapse Fluorescence Microscopy Images and Gene Expression Data of Single T-Cells Infected with a Minimal HIV Feedback Circuit under 1,806 Drug Treatments

Lu, Yiyang; Bohn-Wippert, Kathrin; Pazerunas, Patrick J.; Moy, Jennifer M.; Singh, Harpal; Dar, Roy D. (2022)

Upon treatment removal, spontaneous and random reactivation of latently infected T cells remains a major barrier toward curing HIV. Due to its stochastic nature, fluctuations in gene expression (or “noise”) can bias HIV reactivation from latency, and conventional drug screens for mean gene expression neglect compounds that modulate noise. Here we present a time-lapse fluorescence microscopy image set obtained from a Jurkat T-cell line, infected with a minimal HIV gene circuit, treated with 1,806 small molecule compounds, and imaged for 48 hours. In addition, the single-cell time-dependent reporter dynamics (single-cell gene expression intensity and noise trajectories) extracted from the image dataset are included. Based on this dataset, a total of 5 latency promoting agents of HIV was found through further experimentation in Lu et al., PNAS 2021 (doi: 10.1073/pnas.2012191118). For a detailed description of the dataset, please refer to the readme file.

keywords: HIV; latency; drug screen; fluorescence microscopy; time-lapse; microscopy; single-cell data; noise; gene expression fluctuation;