JUHE API Marketplace
DATASET
Open Source Community

Publically Available Bladder Cancer Dataset

The project aims to evaluate publicly available single‑cell sequencing data from the SRA database using the `scanpy` library. It covers the full data‑analysis workflow from discovery and retrieval to final evaluation with `scanpy`.

Updated 5/6/2024
github

Description

Dataset Overview

Dataset Description

The dataset is named "Single Cell Sequencing Data Analysis with Scanpy" and is intended to evaluate publicly available single‑cell sequencing data obtained from the Sequence Read Archive (SRA) database. The project employs the scanpy library for analysis, covering the complete workflow from data discovery and retrieval to final evaluation.

Data Sources

Data are sourced from the SRA database, a public repository maintained by the National Center for Biotechnology Information (NCBI) that contains a large collection of sequencing data.

Data Analysis Workflow

  1. Data Discovery and Retrieval: Use the Entrez Direct tool to search the SRA database for keywords such as "human bladder cancer samples" and download the relevant data.
  2. Quality Control: Perform quality control on downloaded FASTQ files using FastQC.
  3. Read Processing and Alignment: Quantify and align single‑cell RNA transcripts with CellRanger.
  4. Technical Artifact Removal: Remove background noise caused by extracellular RNA fragments using CellBender.
  5. Data Analysis: Use Scanpy to load, preprocess, predict doublets, normalize, reduce dimensionality, conduct PCA analysis, and integrate the dataset.

Tools and Libraries

  • Entrez Direct: Retrieves data from the SRA database.
  • SRA Toolkit: Downloads SRA data.
  • FastQC: Performs quality control.
  • Cell Ranger: Handles read processing and alignment.
  • CellBender: Removes technical artifacts.
  • Scanpy: Analyzes single‑cell sequencing data.

Data Formats

Data are primarily in FASTQ format. After processing with CellRanger, the output includes the filtered_feature_bc_matrix and raw_feature_bc_matrix directories, containing files such as matrix.mtx, features.tsv, and barcodes.tsv.

Data Applications

The dataset is suitable for single‑cell sequencing analysis, especially in studies of human bladder cancer samples, and can be used for gene expression analysis, cell‑type identification, and other biological investigations.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Bladder Cancer Research
Single‑Cell Sequencing

Source

Organization: github

Created: 7/28/2023

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.