Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to deploy yolov5 instance segmentation in android #12869

Closed
1 task done
gh4ni404 opened this issue Apr 1, 2024 · 3 comments
Closed
1 task done

How to deploy yolov5 instance segmentation in android #12869

gh4ni404 opened this issue Apr 1, 2024 · 3 comments
Labels
question Further information is requested Stale

Comments

@gh4ni404
Copy link

gh4ni404 commented Apr 1, 2024

Search before asking

Question

hi, i want to asking about how to deploy yolov5 instance segmentation in android using TFLite model, i still not able to detect the mask, this is the output i got from exported yolov5-seg into TFLite model (1,25200,41) and (1,32,160,160), i'm detecting 4 class in a single image, this is my code

`class DetectAndSegmentModel(
private val context: Context,
private val modelPath: String,
private val detectAndSegmentListener: DetectAndSegmentListener
) {

private var interpreter: Interpreter? = null
private var associatedAxisLabels: List<String>? = null

private var fullTimeExecutionTime = 0L
private var preprocessTime = 0L
private var imageSegmentationTime = 0L


fun setup() {
    try {
        val model: ByteBuffer = FileUtil.loadMappedFile(context, modelPath)
        val modelOptions = Interpreter.Options()
        modelOptions.setNumThreads(numThreads)

        interpreter = Interpreter(model, modelOptions)
        Log.i(TAG, "Berhasil Membaca Model: $modelPath")
        associatedAxisLabels = FileUtil.loadLabels(context, MODEL_LABELS)
        Log.i(TAG, "Berhasil Membaca Label $associatedAxisLabels")
    } catch (e: IOException) {
        Log.e(TAG, "Gagal Membuka Model atau label ", e)
        Toast.makeText(
            context,
            "Akses Model gagal: " + e.message,
            Toast.LENGTH_LONG
        ).show()
    }

}

fun executeModel(image: Bitmap?) {
    if (image == null) {
        detectAndSegmentListener.onEmptyDetect()
    }
    try {
        val width = image?.width
        val height = image?.height
        Log.d(TAG, "Original Image Size :$width x $height")

        interpreter?.let { checkInputInfo(TAG, it) }
        interpreter?.let { checkOutputInfo(TAG, it) }

        fullTimeExecutionTime = SystemClock.uptimeMillis()
        preprocessTime = SystemClock.uptimeMillis()

        Log.i(TAG, "Gambar sedang di Pre-processed ...")
        // Lakukan Pre-processing gambar input
        // Dan Persiapkan inputBuffer
        val resizedImage = preprocessInput(image!!)
        Log.i(TAG, "Gambar telah di Preprocessd: $resizedImage")

        preprocessTime = SystemClock.uptimeMillis() - preprocessTime

        // Persiapkan Output Tensor terlebih dahulu
        val detectObjectBuffer = TensorBuffer.createFixedSize(
            OUTPUT_SIZE_0,
            DataType.FLOAT32
        )

        val segmentMasksBuffer = TensorBuffer.createFixedSize(
            OUTPUT_SIZE_1,
            DataType.FLOAT32
        )

        imageSegmentationTime = SystemClock.uptimeMillis()

        val outputBuffers = mapOf<Int, Any>(
            0 to detectObjectBuffer.buffer.rewind(),
            1 to segmentMasksBuffer.buffer.rewind()
        )

        // Lakukan proses Inferensi menggunakan interpreter
        interpreter?.runForMultipleInputsOutputs(
            arrayOf(resizedImage.buffer),
            outputBuffers
        )

        Log.d(TAG, "OUTPUT RUN MODEL: ${interpreter?.outputTensorCount}")

        val coordinates = detectObjectBuffer.floatArray
        val masks = segmentMasksBuffer.floatArray

        val bestBoxes = processOutput(coordinates, masks, image)

        detectAndSegmentListener.onDetectAndSegment(bestBoxes, image)

    } catch (e: Exception) {
        val exceptionLog = "Ada yang Salah nih: ${e.message}"
        Log.d(TAG, exceptionLog, e)
    }
}

private fun processOutput(
    coordinates: FloatArray,
    masks: FloatArray,
    oriBitmap: Bitmap
): ArrayList<Recognition> {
    val allRecognitions = ArrayList<Recognition>()

    // OUTPUT_SIZE_0[1] = 25200
    // OUTPUT_SIZE_0[2] = 41
    for (i in 0 until OUTPUT_SIZE_0[1]) {
        // set index for determine boundingbox and mask
        val startIndex = i * OUTPUT_SIZE_0[2]

        // Extract bounding box coordinates
        val x = coordinates[0 + startIndex] * INPUT_SIZE.width
        val y = coordinates[1 + startIndex] * INPUT_SIZE.height
        val w = coordinates[2 + startIndex] * INPUT_SIZE.width
        val h = coordinates[3 + startIndex] * INPUT_SIZE.height

        // Calculate bounding box coordinates
        val xmin = max(0.0, x - w / 2.0).toInt()
        val ymin = max(0.0, y - h / 2.0).toInt()
        val xmax = min(INPUT_SIZE.width.toDouble(), x + w / 2.0).toInt()
        val ymax = min(INPUT_SIZE.height.toDouble(), y + h / 2.0).toInt()

        // Extract confidences and class scores
        val confidences = coordinates[4 + startIndex]
        val classScores = coordinates.copyOfRange(5 + startIndex, 9 + startIndex)

        // Extract mask weights for this bounding box
        val maskWeight = mutableListOf<Float>()
        for (index in 0 until 32) {
            maskWeight.add(coordinates[i + OUTPUT_SIZE_0[1] * (index+5)])
        }

        var labelId = 0
        var maxLabelScores = 0f
        for (j in classScores.indices) {
            if (classScores[j] > maxLabelScores) {
                maxLabelScores = classScores[j]
                labelId = j
            }
        }

        val r = Recognition(
            labelId,
            "",
            maxLabelScores,
            confidences,
            RectF(
                xmin.toFloat(),
                ymin.toFloat(),
                xmax.toFloat(),
                ymax.toFloat()
            ),
            maskWeight,
            null
        )
        allRecognitions.add(r)
    }

    val nmsRecognition = applyNMS(allRecognitions)

    val nmsFilterDuplBox = applyNMSAllClass(nmsRecognition)

    for (recognition in nmsFilterDuplBox) {
        val labelId = recognition.labelId
        val labelName = associatedAxisLabels!![labelId]

        Log.d(TAG, "Label Name: $labelName")
        recognition.labelName = labelName

    }
    val output1 = reshapeOutput1(masks)

    val multiply = mutableListOf<Mat>()
    for (index in 0 until 32) {
        val multiplyMask = output1[index].multiplyDouble(allRecognitions[index].getMaskWeights()[index].toDouble())
        multiply.add(multiplyMask)
    }

    val final = multiply[0].clone()
    for (i in 1 until multiply.size) {
        Core.add(final, multiply[i], final)
    }

    val mask = Mat()
    Core.compare(final, Scalar(0.0), mask, Core.CMP_GT)

    for (rec in allRecognitions) {
        rec.setBitmapMasks(mask.toBitmap())
    }

    Log.d(TAG, "processOutput: Multiply = ${multiply.size}")

    return nmsFilterDuplBox
}

private fun reshapeOutput1(masks: FloatArray): List<Mat> {
    val all = mutableListOf<Mat>()
    for (mask in 0 until 32) {
        val mat = Mat(160,160, CvType.CV_32F)
        for (x in 0 until 160) {
            for (y in 0 until 160) {
                val pixelValue = masks[32*160*y+32*x+mask].toDouble()
                mat.put(y,x,pixelValue)
            }
        }
        all.add(mat)
    }
    return all
}

private fun Mat.multiplyDouble(double: Double): Mat {
    val result = Mat()
    Core.multiply(this, Scalar(double), result)
    return result
}

private fun Mat.toBitmap(): Bitmap {
    val outputBitmap = Bitmap.createBitmap(
        this.width(),
        this.height(),
        Bitmap.Config.ARGB_8888
    )
    Utils.matToBitmap(this, outputBitmap)
    return outputBitmap
}

private fun applyNMS(recog: ArrayList<Recognition>): ArrayList<Recognition> {
    val selectedBoxes = ArrayList<Recognition>()

    for (i in 0 until OUTPUT_SIZE_0[2] - 5) {

        // Make a queue for each category here,
        // and put the ones with higher labelScore at the front.
        val pq: PriorityQueue<Recognition> = PriorityQueue<Recognition>(
            25200
        ) { l, r -> // Intentionally reversed to put high confidence at the head of the queue.
            r.confidence!!.compareTo(l.confidence!!)
        }

        // Filter out the same category,
        // and obj must be greater than the set threshold
        for (j in recog.indices) {
            if (recog[j].labelId == i && recog[j].confidence!! > DETECT_THRES) {
                pq.add(recog[j])
            }
        }

        // nms loop traversal
        while (pq.size > 0) {
            // Take out the one with the highest probability first
            val a = arrayOfNulls<Recognition>(pq.size)
            val detections = pq.toArray<Recognition>(a)
            val max: Recognition = detections[0]
            selectedBoxes.add(max)
            pq.clear()
            for (k in 1 until detections.size) {
                val detection: Recognition = detections[k]
                if (calculateIoU(max.getLocation(), detection.getLocation()) < IOU_THRESHOLD) {
                    pq.add(detection)
                }
            }
        }
    }
    return selectedBoxes
}

private fun applyNMSAllClass(allRecognitions: ArrayList<Recognition>): ArrayList<Recognition> {
    val nmsRecognitions = ArrayList<Recognition>()
    val pq = PriorityQueue<Recognition>(
        100
    ) { l, r ->
        r.confidence!!.compareTo(l.confidence!!)
    }

    // Filter out the same category,
    // and obj must be greater than the set threshold
    for (j in allRecognitions.indices) {
        if (allRecognitions[j].confidence!! > DETECT_THRES) {
            pq.add(allRecognitions[j])
        }
    }

    while (pq.size > 0) {
        // Take out the one with the highest probability first
        val a = arrayOfNulls<Recognition>(pq.size)
        val detections = pq.toArray<Recognition>(a)
        val max = detections[0]
        nmsRecognitions.add(max)
        pq.clear()

        for (k in 1 until detections.size) {
            val detection = detections[k]
            if (calculateIoU(
                    max.getLocation(),
                    detection.getLocation()
                ) < IOU_CLASS_DUPL
            ) {
                pq.add(detection)
            }
        }
    }
    return nmsRecognitions
}

private fun calculateIoU(b1: RectF, b2: RectF): Float {

// val x1 = maxOf(b1.cx - (b1.w / 2F), b2.cx - (b2.w / 2F))
// val y1 = maxOf(b1.cy - (b1.h / 2F), b2.cy - (b2.h / 2F))
// val x2 = minOf(b1.cx + (b1.w / 2F), b2.cx + (b2.w / 2F))
// val y2 = minOf(b1.cy + (b1.h / 2F), b2.cy + (b2.h / 2F))
//
// val intersectionArea = maxOf(0F, x2 - x1) * maxOf(0F, y2 - y1)
// val box1Area = b1.w * b1.h
// val box2Area = b2.w * b2.h
val intersection = boxIntersection(b1, b2)
val union = boxUnion(b1, b2)
return if (union <= 0) 1f else intersection / union
}

private fun boxIntersection(a: RectF, b: RectF): Float {
    val maxLeft = if (a.left > b.left) a.left else b.left
    val maxTop = if (a.top > b.top) a.top else b.top
    val minRight = if (a.right < b.right) a.right else b.right
    val minBottom = if (a.bottom < b.bottom) a.bottom else b.bottom

    val w = minRight - maxLeft
    val h = minBottom - maxTop

    return if (w < 0f || h < 0f) 0f else w * h
}

private fun boxUnion(a: RectF, b: RectF): Float {
    val i = boxIntersection(a, b)
    val aRightLeft = a.right - a.left
    val aBottomTop = a.bottom - a.top
    val bRightLeft = b.right - b.left
    val bBottomTop = b.bottom - b.top

    val multiplyA = aRightLeft * aBottomTop
    val multiplyB = bRightLeft * bBottomTop

    val plusAB = multiplyA + multiplyB

    return plusAB.minus(i)
}

private fun preprocessInput(image: Bitmap): TensorImage {
    // Buat ImageProcessor untuk pra-process gambar
    val imageProcessor = ImageProcessor.Builder()
        .add(
            ResizeOp(
                INPUT_SIZE.height,
                INPUT_SIZE.width,
                ResizeOp.ResizeMethod.BILINEAR
            )
        )
        .add(
            NormalizeOp(
                0f, 255f
            )
        )
        .build()

    val inputTensor = TensorImage(DataType.FLOAT32)
    inputTensor.load(image)
    return imageProcessor.process(inputTensor)
}

interface DetectAndSegmentListener {
    fun onEmptyDetect()
    fun onDetectAndSegment(boxAndSegment: ArrayList<Recognition>, bitmap: Bitmap)
}

companion object {
    private val INPUT_SIZE = Size(640, 640)
    private val OUTPUT_SIZE_0 = intArrayOf(1, 25200, 41)
    private val OUTPUT_SIZE_1 = intArrayOf(1, 32, 160, 160)

    private const val numThreads: Int = 4

    private const val MODEL_LABELS = "best-fp16_labels.txt"

    private const val IOU_THRESHOLD = 0.45f
    private const val IOU_CLASS_DUPL = 0.7f
    private const val DETECT_THRES = 0.25f

    private const val TAG = "DetectAndSegmentModel"
}

}`

i'm using Recognition class like this
`class Recognition(
/** Display name for the recognition. /
@JvmField var labelId: Int, var labelName: String?, var labelScore: Float,
/
*
* A sortable score for how good the recognition is relative to others. Higher should be better.
/
@JvmField var confidence: Float?,
/
* Optional location within the source image for the location of the recognized object. */
private var location: RectF?,
private var maskWeights: List,
private var maskBitmap: Bitmap?
) {

fun getLocation(): RectF {
    return RectF(location)
}

fun setLocation(location: RectF?) {
    this.location = location
}

fun getMaskWeights(): List<Float> {
    return maskWeights
}

fun getBitmapMasks(): Bitmap? {
    return maskBitmap
}

fun setBitmapMasks(bitmap: Bitmap) {
    maskBitmap = bitmap
}

override fun toString(): String {
    var resultString = ""
    resultString += "$labelId "
    if (labelName != null) {
        resultString += "$labelName "
    }
    if (confidence != null) {
        resultString += String.format("(%.1f%%) ", confidence!! * 100.0f)
    }
    if (location != null) {
        resultString += location.toString() + " "
    }

// if (maskWeights != null) {
// resultString += maskWeights.toString() + " "
// }
return resultString.trim { it <= ' ' }
}
}`

please help me correct this code, thank you

Additional

No response

@gh4ni404 gh4ni404 added the question Further information is requested label Apr 1, 2024
Copy link
Contributor

github-actions bot commented Apr 1, 2024

👋 Hello @gh4ni404, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Requirements

Python>=3.8.0 with all requirements.txt installed including PyTorch>=1.8. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

YOLOv5 CI

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

Introducing YOLOv8 🚀

We're excited to announce the launch of our latest state-of-the-art (SOTA) object detection model for 2023 - YOLOv8 🚀!

Designed to be fast, accurate, and easy to use, YOLOv8 is an ideal choice for a wide range of object detection, image segmentation and image classification tasks. With YOLOv8, you'll be able to quickly and accurately detect objects in real-time, streamline your workflows, and achieve new levels of accuracy in your projects.

Check out our YOLOv8 Docs for details and get started with:

pip install ultralytics

@glenn-jocher
Copy link
Member

@gh4ni404 hi there! Thank you for reaching out with your detailed question. Deploying YOLOv5 for instance segmentation on Android using a TFLite model, especially with mask detection, involves a nuanced approach. It seems you have a good base, but there are a few key points to consider:

  1. Model Conversion: Ensure the model is correctly converted to TFLite format, preserving its instance segmentation capabilities. Many times, operations needed for instance segmentation might not be fully supported by TFLite, affecting the model's performance or functionality.

  2. Output Interpretation: The output dimensions (1,25200,41) and (1,32,160,160) indicate the model predicts bounding boxes with associated classes and segmentation masks, respectively. It's crucial to correctly interpret these outputs within your Android application. Ensure you're decoding the bounding boxes and masks accurately - this often involves applying a sigmoid on the outputs to get the respective probabilities and then applying a threshold to filter detections.

  3. Input Preprocessing: Ensure the input images to the TFLite model are preprocessed correctly, including resizing to the expected dimensions (e.g., 640x640) and normalization as the model was trained with. Mismatch in preprocessing steps can significantly affect model performance.

  4. Performance Optimizations: If you're experiencing performance issues, remember to leverage GPU acceleration (if available on your device) by setting modelOptions.useDelegate(GpuDelegate()) for the TFLite interpreter.

Unfortunately, without a more detailed look at how each part of your code interprets the model outputs, it's challenging to provide specific code corrections. However, here are a few steps you might find helpful:

  • Review the TFLite model conversion process to ensure instance segmentation capabilities are retained.
  • Double-check the logic for decoding the model's output into bounding boxes and masks, ensuring proper application of sigmoid functions and thresholds.
  • Verify the preprocessing steps match those expected by the model.
  • Consider utilizing Android's ML Kit or other libraries to simplify some of the image processing tasks.

Lastly, for any detailed support or if your project is leaning towards commercial usage, please consider acquiring an Ultralytics Enterprise License which grants you direct support from our team. This ensures compliance with our licensing model and supports the continued development of YOLOv5. For more details, you can refer to our documentation at https://docs.ultralytics.com/yolov5/.

Please keep in mind the complexity of deploying deep learning models on mobile devices and the importance of adhering to the guidelines of the model's license. Good luck with your project!

Copy link
Contributor

github-actions bot commented May 2, 2024

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

@github-actions github-actions bot added the Stale label May 2, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested Stale
Projects
None yet
Development

No branches or pull requests

2 participants