Skip to content
View ZhanYang-nwpu's full-sized avatar
💭
learning
💭
learning
Block or Report

Block or report ZhanYang-nwpu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
ZhanYang-nwpu/README.md

Yang Zhan 👋

I am currently pursuing the Ph.D. degree with the School of Artificial Intelligence, OPtics and ElectroNics (iOPEN), Northwestern Polytechnical University, Xi’an, China.

🏆My research interests

Vision and Language, Large Language Model, Multimodal Machine Learning, AI for Remote Sensing, and Data Mining.

💬Projects

📢News

🔥 [……]:

🔥 [2024]: Remote sensing multimodal large language model is an ongoing project. We will be working on improving it.

🔥 [2024/1]: SkyEyeGPT now is available at arXiv.

  • This work explores the remote sensing multimodal large language model (vision-language). We meticulously curate a high-quality remote sensing multi-modal instruction tuning dataset, including single-task and multi-task conversation instructions, namely SkyEye-968k. We develop SkyEyeGPT, which unifies remote sensing vision-language tasks and breaks new ground in enabling the unified modeling of remote sensing vision and LLM. Experiments on 8 datasets for remote sensing vision language tasks demonstrate SkyEyeGPT’s superiority in image-level and region-level tasks. Specially, it has shown encouraging results in some tests, compared with GPT-4V.

🔥 [2024/1]: A curated list about Remote Sensing Multimodal Large Language Model (Vision-Language) is created.

🔥 [2023/12]: Propose the Mono3DVG task and construct the Mono3DRefer dataset(accepted by AAAI2024)!

  • For intelligent systems and robots, understanding objects based on language expressions in real 3D scenes is an important capability for human-machine interaction. However, existing 2D visual grounding cannot capture the true 3D extent of the referred objects. 3D visual grounding requires laser radars or RGB-D sensors, which greatly limits its application scenarios due to the expensive cost and device limitations. Monocular 3D object detection is low-cost and has strong applicability, but it cannot localize specific objects. We introduce a novel task of 3D visual grounding in monocular RGB images using language descriptions with appearance and geometry information. We create Mono3DRefer, which is the first dataset that leverages the ChatGPT to generate descriptions. We believe Mono3DVG can be widely applied since it does not require strict conditions such as RGB-D sensors, LiDARs, or industrial cameras. The application scenarios are wide, such as drones, surveillance systems, intelligent vehicles, robots, and other devices equipped with cameras.

🔥 [2023/08]: Propose a novel PE-RSITR task and provide empirical studies(accepted by T-GRS)!

  • This work explores the parameter-efficient transfer learning for remote sensing image-text retrieval. Our proposed MRS-Adapter reduces 98.9% of fine-tuned parameters and its performance exceeds traditional methods by 7%~13%.

🔥 [2023/02]: Propose the RSVG task and construct the DIOR-RSVG dataset(accepted by T-GRS)!

  • This work explores the visual grounding for remote sensing domain. The DIOR-RSVG takes DIOR dataset as the data source and is built using an automatic generation algorithm with manual verification. A novel transformer-based MGVLF model is devised to solve problems of the cluttered background and scale variation of RS images.

🔥 [2022/08]: Propose a STMGCN for vessel traffic flow prediction(accepted by T-ITS)!

  • This work explores multi-graph convolutional network for vessel traffic flow prediction. Due to the difference between water traffic and land traffic, we propose a big data-driven maritime traffic network extraction algorithm to construct a "road network". We then design a STMGCN to make full use of maritime graphs and multi-graph learning (including distance graph, interaction graph, and correlation graph).

🔥 [2021/08]: Propose a MVFFNet for imbalanced ship classification(accepted by PRLetters)!

🌱 Academic Services

  • Journal Reviewer:
    • IEEE Transactions on Geoscience and Remote Sensing (T-GRS)
    • Neural Networks (NEUNET)
    • Pattern Recognition Letters (PRLETTERS)
    • IEEE Geoscience and Remote Sensing Letters (IEEE GRSL)
    • Computers and Electrical Engineering (COMPELECENG)

📫 Contact

Email: zhanyangnwpu@gmail.com

⚡ Fact

Popular repositories Loading

  1. RSVG-pytorch RSVG-pytorch Public

    RSVG: Exploring Data and Model for Visual Grounding on Remote Sensing Data, 2022

    Python 103 4

  2. Awesome-Remote-Sensing-Multimodal-Large-Language-Model Awesome-Remote-Sensing-Multimodal-Large-Language-Model Public

    Multimodal Large Language Models for Remote Sensing (RS-MLLMs): A Survey

    102 3

  3. SkyEyeGPT SkyEyeGPT Public

    SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model

    50 2

  4. Mono3DVG Mono3DVG Public

    [AAAI 2024] Mono3DVG: 3D Visual Grounding in Monocular Images, AAAI, 2024

    Python 22 1

  5. PE-RSITR PE-RSITR Public

    Parameter-Efficient Transfer Learning for Remote Sensing Image-Text Retrieval, 2023

    Python 9

  6. ZhanYang-nwpu ZhanYang-nwpu Public

    1