Back to datasets
Dataset assetOpen Source CommunityBladder Cancer ResearchSingle‑Cell Sequencing

Publically Available Bladder Cancer Dataset

The project aims to evaluate publicly available single‑cell sequencing data from the SRA database using the `scanpy` library. It covers the full data‑analysis workflow from discovery and retrieval to final evaluation with `scanpy`.

Source
github
Created
Jul 28, 2023
Updated
May 6, 2024
Signals
278 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset Description

The dataset is named "Single Cell Sequencing Data Analysis with Scanpy" and is intended to evaluate publicly available single‑cell sequencing data obtained from the Sequence Read Archive (SRA) database. The project employs the scanpy library for analysis, covering the complete workflow from data discovery and retrieval to final evaluation.

Data Sources

Data are sourced from the SRA database, a public repository maintained by the National Center for Biotechnology Information (NCBI) that contains a large collection of sequencing data.

Data Analysis Workflow

  1. Data Discovery and Retrieval: Use the Entrez Direct tool to search the SRA database for keywords such as "human bladder cancer samples" and download the relevant data.
  2. Quality Control: Perform quality control on downloaded FASTQ files using FastQC.
  3. Read Processing and Alignment: Quantify and align single‑cell RNA transcripts with CellRanger.
  4. Technical Artifact Removal: Remove background noise caused by extracellular RNA fragments using CellBender.
  5. Data Analysis: Use Scanpy to load, preprocess, predict doublets, normalize, reduce dimensionality, conduct PCA analysis, and integrate the dataset.

Tools and Libraries

  • Entrez Direct: Retrieves data from the SRA database.
  • SRA Toolkit: Downloads SRA data.
  • FastQC: Performs quality control.
  • Cell Ranger: Handles read processing and alignment.
  • CellBender: Removes technical artifacts.
  • Scanpy: Analyzes single‑cell sequencing data.

Data Formats

Data are primarily in FASTQ format. After processing with CellRanger, the output includes the filtered_feature_bc_matrix and raw_feature_bc_matrix directories, containing files such as matrix.mtx, features.tsv, and barcodes.tsv.

Data Applications

The dataset is suitable for single‑cell sequencing analysis, especially in studies of human bladder cancer samples, and can be used for gene expression analysis, cell‑type identification, and other biological investigations.

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio