Dataset assetOpen Source CommunityObject DetectionAerial Image Analysis

DOTA v1.5

The DOTA v1.5 dataset is designed for object detection in aerial and satellite imagery and supports rotated bounding boxes. It is compatible with YOLOv9 models.

Source

github

Created

Jul 14, 2024

Updated

Jul 15, 2024

Signals

420 views

Availability

Linked source ready

Overview

Dataset description and usage context

Aerial‑YOLO‑DOTA: Advanced Aerial Image Object Detection

Dataset

We prepared an easy‑to‑use comprehensive DOTA v1.5 dataset, including YOLOv9‑compatible labels. The dataset can be downloaded from our Google Drive.

Experiments and Analysis

100e‑16b‑0.01lr (Best‑performing Model)

Normalized Confusion Matrix

The normalized confusion matrix of the best model reveals several interesting patterns:

Strong diagonal performance: Many classes exhibit high true‑positive rates, evident from deep‑blue diagonal cells. Tennis court (0.89), airplane (0.86) and basketball court (0.69) are especially well classified.
Consistent mis‑classifications: Off‑diagonal errors make intuitive sense in terms of visual and contextual similarity:
- Ships are occasionally mis‑classified as ports (0.20), reflecting their co‑occurrence.
- Small vehicles sometimes are labeled as large vehicles (0.17), likely due to scale ambiguity in aerial views.
- Some confusion between airplanes and helicopters (helicopter → airplane 0.29) stems from shared aerial features.
Sports‑facility confusion: Similar sports venues are sometimes mixed, e.g., football fields being labeled as other field types.
Background interactions: The “background” class strongly interacts with bridge (0.84) and container crane (1.00), indicating difficulty separating these structures from background in certain scenes.
No severe errors: Notably, there are no extreme mis‑classifications such as airplanes being labeled as ships, indicating the model has learned discriminative features.
Challenging classes: Certain categories like “roundabout” (0.33) and “small vehicle” (0.26) show lower true‑positive rates, suggesting they could benefit from additional training data or feature engineering.

Overall, the confusion matrix reflects a model that has learned to differentiate categories in a context‑aware and semantically sensible manner, with most errors occurring between visually or functionally similar objects.

F1‑Score Curve

The F1‑score curve displays model performance across confidence thresholds, revealing key insight into its behavior and effectiveness. The overall F1‑score peaks at 0.54 when the confidence threshold is 0.202, indicating an optimal balance between precision and recall.

Precision Curve

The precision curve shows that precision improves as the confidence threshold rises for all categories, which is expected. This demonstrates that higher‑confidence predictions are more accurate across the board.

Precision‑Recall Curve

The precision‑recall curve illustrates the trade‑off between precision and recall at varying thresholds, highlighting model performance on aerial image detection tasks. Key findings include:

Overall performance: The model achieves a mean average precision (mAP) of 0.512 across all classes, indicating moderate overall performance.
Class‑wise differences:
- Tennis court (0.940) and airplane (0.909) display excellent performance, maintaining high precision even at high recall.
- Basketball court (0.758) and port (0.716) also perform well.
- Bridge (0.195), football field (0.301) and small vehicle (0.364) perform poorly, with precision dropping sharply as recall increases.

Recall Curve

The recall curve shows how the model’s ability to detect objects (recall) changes as we adjust the confidence threshold. Key observations include:

Overall performance: The model attains a mean recall of 0.53 across all classes, indicating moderate detection capability.
Class‑wise differences:
- Tennis court, airplane and port achieve strong recall even at high confidence thresholds.
- Football field, bridge and container crane show poor recall, with recall decreasing quickly as confidence rises.

Overall Result Summary

The overall result summary provides a comprehensive view of model performance, highlighting key metrics and their implications.

Loss reduction: All three loss components (box_loss, cls_loss, dfl_loss) consistently decrease over training epochs, indicating effective learning.
Convergence: Losses appear to stabilize toward the end of training, suggesting the model converges after 100 epochs.
Validation performance: Validation loss closely follows training loss, indicating good generalization without significant over‑fitting.
Precision and recall: Both metrics steadily improve over time, with recall showing a slightly more pronounced early‑stage improvement.
mAP performance: Mean average precision improves consistently under both mAP50 (IoU = 0.5) and mAP50‑95 (IoU = 0.5‑0.95), reaching approximately 0.52 for mAP50 and 0.35 for mAP50‑95.
Learning dynamics: Most metrics improve rapidly at first and then plateau, suggesting the model quickly learns major features before fine‑tuning.
Stability: The metrics/precision and metrics/recall plots exhibit minor fluctuations, which are normal, while the overall trend is upward.
Improvement space: Although performance is solid, the final mAP values indicate room for improvement, potentially via longer training or architectural refinements.

These results demonstrate a successful training process with good generalization and consistent improvements across multiple metrics, providing a robust foundation for aerial image object detection.

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio