index.html

<!doctype html>
<html>
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <script src="https://cdn.tailwindcss.com"></script>
  <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/katex.min.css" integrity="sha384-vKruj+a13U8yHIkAyGgK1J3ArTLzrFGBbBc0tDp4ad/EyewESeXE/Iv67Aj8gKZ0" crossorigin="anonymous">
  <script defer src="https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/katex.min.js" integrity="sha384-PwRUT/YqbnEjkZO0zZxNqcxACrXe+j766U2amXcgMg5457rve2Y7I6ZJSm2A0mS4" crossorigin="anonymous"></script>
  <script defer src="https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/contrib/auto-render.min.js" integrity="sha384-+VBxd3r6XgURycqtZ117nYw44OOcIax56Z4dCRWbxyPt0Koah1uHoK0o4+/RRE05" crossorigin="anonymous"></script>
  <script defer src="https://cdn.jsdelivr.net/npm/@alpinejs/collapse@3.x.x/dist/cdn.min.js"></script>
  <script defer src="https://cdn.jsdelivr.net/npm/alpinejs@3.x.x/dist/cdn.min.js"></script>
</head>
<body>
  <div class="relative mx-auto h-full max-w-2xl text-md">
    <table class="table-auto">
      <tbody>
        <tr>
          <td></td>
          <td>
            <h1 class="text-4xl pt-4 font-bold"><span class="underline">Vincent's</span> Arxiv FrontPage</h1>
            <br>
            <p>Generated on 2024-09-17.</p><br/>
            <p class="text-sm text-gray-500 pt-2">This frontpage is made by scraping arxiv and by running a sentence-model that detects if the abstract describes a paper about a topic of interest. One cool feature: it all pretty much runs via Github Actions. </p>
            <br>
          </td>
        </tr><tr>
          <td></td>
          <td>
            <h2 class="text-2xl tracking-tight pt-4 font-bold">New Datasets</h2>
          </td>
        </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-16</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Point2Graph: An End-to-end Point Cloud-based 3D Open-Vocabulary Scene Graph for Robot Navigation
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Current open-vocabulary scene graph generation algorithms highly rely on both 3D scene point cloud data and posed RGB-D images and thus have limited applications in scenarios where RGB-D images or camera poses are not readily available.<span class='px-1 mx-1 bg-yellow-200'>To solve this problem, we propose Point2Graph, a novel end-to-end point cloud-based 3D open-vocabulary scene graph generation framework in which the requirement of posed RGB-D image series is eliminated. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.753</span></span>This hierarchical framework contains room and object detection/segmentation and open-vocabulary classification.For the room layer, we leverage the advantage of merging the geometry-based border detection algorithm with the learning-based region detection to segment rooms and create a "Snap-Lookup" framework for open-vocabulary room classification.In addition, we create an end-to-end pipeline for the object layer to detect and classify 3D objects based solely on 3D point cloud data.Our evaluation results show that our framework can outperform the current state-of-the-art (SOTA) open-vocabulary object and room segmentation and classification algorithm on widely used real-scene datasets.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.10350v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-16</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                2D or not 2D: How Does the Dimensionality of Gesture Representation Affect 3D Co-Speech Gesture Generation?
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Co-speech gestures are fundamental for communication.The advent of recent deep learning techniques has facilitated the creation of lifelike, synchronous co-speech gestures for Embodied Conversational Agents.<span class='px-1 mx-1 bg-yellow-200'>"In-the-wild" datasets, aggregating video content from platforms like YouTube via human pose detection technologies, provide a feasible solution by offering 2D skeletal sequences aligned with speech. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.76</span></span>Concurrent developments in lifting models enable the conversion of these 2D sequences into 3D gesture databases.However, it is important to note that the 3D poses estimated from the 2D extracted poses are, in essence, approximations of the ground-truth, which remains in the 2D domain.This distinction raises questions about the impact of gesture representation dimensionality on the quality of generated motions - a topic that, to our knowledge, remains largely unexplored.Our study examines the effect of using either 2D or 3D joint coordinates as training data on the performance of speech-to-gesture deep generative models.We employ a lifting model for converting generated 2D pose sequences into 3D and assess how gestures created directly in 3D stack up against those initially generated in 2D and then converted to 3D.We perform an objective evaluation using widely used metrics in the gesture generation field as well as a user study to qualitatively evaluate the different approaches.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.10357v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-16</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                A Large-Scale Privacy Assessment of Android Third-Party SDKs
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Third-party Software Development Kits (SDKs) are widely adopted in Android app development, to effortlessly accelerate development pipelines and enhance app functionality.However, this convenience raises substantial concerns about unauthorized access to users' privacy-sensitive information, which could be further abused for illegitimate purposes like user tracking or monetization.Our study offers a targeted analysis of user privacy protection among Android third-party SDKs, filling a critical gap in the Android software supply chain.It focuses on two aspects of their privacy practices, including data exfiltration and behavior-policy compliance (or privacy compliance), utilizing techniques of taint analysis and large language models.<span class='px-1 mx-1 bg-yellow-200'>It covers 158 widely-used SDKs from two key SDK release platforms, the official one and a large alternative one. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.833</span></span>From them, we identified 338 instances of privacy data exfiltration.On the privacy compliance, our study reveals that more than 30% of the examined SDKs fail to provide a privacy policy to disclose their data handling practices.Among those that provide privacy policies, 37% of them over-collect user data, and 88% falsely claim access to sensitive data.We revisit the latest versions of the SDKs after 12 months.Our analysis demonstrates a persistent lack of improvement in these concerning trends.Based on our findings, we propose three actionable recommendations to mitigate the privacy leakage risks and enhance privacy protection for Android users.Our research not only serves as an urgent call for industry attention but also provides crucial insights for future regulatory interventions.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.10411v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-16</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Charting EDA: Characterizing Interactive Visualization Use in Computational Notebooks with a Mixed-Methods Formalism
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Interactive visualizations are powerful tools for Exploratory Data Analysis (EDA), but how do they affect the observations analysts make about their data?<span class='px-1 mx-1 bg-yellow-200'>We conducted a qualitative experiment with 13 professional data scientists analyzing two datasets with Jupyter notebooks, collecting a rich dataset of interaction traces and think-aloud utterances. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.825</span></span>By qualitatively coding participant utterances, we introduce a formalism that describes EDA as a sequence of analysis states, where each state is comprised of either a representation an analyst constructs (e.g., the output of a data frame, an interactive visualization, etc.) or an observation the analyst makes (e.g., about missing data, the relationship between variables, etc.).By applying our formalism to our dataset, we identify that interactive visualizations, on average, lead to earlier and more complex insights about relationships between dataset attributes compared to static visualizations.Moreover, by calculating metrics such as revisit count and representational diversity, we uncover that some representations serve more as "planning aids" during EDA rather than tools strictly for hypothesis-answering.We show how these measures help identify other patterns of analysis behavior, such as the "80-20 rule", where a small subset of representations drove the majority of observations.Based on these findings, we offer design guidelines for interactive exploratory analysis tooling and reflect on future directions for studying the role that visualizations play in EDA.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.10450v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-16</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Do Pre-trained Vision-Language Models Encode Object States?
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>For a vision-language model (VLM) to understand the physical world, such as cause and effect, a first step is to capture the temporal dynamics of the visual world, for example how the physical states of objects evolve over time (e.g. a whole apple into a sliced apple).Our paper aims to investigate if VLMs pre-trained on web-scale data learn to encode object states, which can be extracted with zero-shot text prompts.We curate an object state recognition dataset ChangeIt-Frames, and evaluate nine open-source VLMs, including models trained with contrastive and generative objectives.We observe that while these state-of-the-art vision-language models can reliably perform object recognition, they consistently fail to accurately distinguish the objects' physical states.Through extensive experiments, we identify three areas for improvements for VLMs to better encode object states, namely the quality of object localization, the architecture to bind concepts to objects, and the objective to learn discriminative visual and language encoders on object states.<span class='px-1 mx-1 bg-yellow-200'>Data and code are released. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.735</span></span></p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.10488v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-16</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Pennsieve - A Collaborative Platform for Translational Neuroscience and Beyond
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>The exponential growth of neuroscientific data necessitates platforms that facilitate data management and multidisciplinary collaboration.<span class='px-1 mx-1 bg-yellow-200'>In this paper, we introduce Pennsieve - an open-source, cloud-based scientific data management platform built to meet these needs. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.727</span></span><span class='px-1 mx-1 bg-yellow-200'>Pennsieve supports complex multimodal datasets and provides tools for data visualization and analyses. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.768</span></span>It takes a comprehensive approach to data integration, enabling researchers to define custom metadata schemas and utilize advanced tools to filter and query their data.Pennsieve's modular architecture allows external applications to extend its capabilities, and collaborative workspaces with peer-reviewed data publishing mechanisms promote high-quality datasets optimized for downstream analysis, both in the cloud and on-premises.   Pennsieve forms the core for major neuroscience research programs including the NIH SPARC Initiative, NIH HEAL Initiative's PRECISION Human Pain Network, and NIH HEAL RE-JOIN Initiative.It serves more than 80 research groups worldwide, along with several large-scale, inter-institutional projects at clinical sites through the University of Pennsylvania.Underpinning the SPARC.Science, Epilepsy.<span class='px-1 mx-1 bg-yellow-200'>Science, and Pennsieve Discover portals, Pennsieve stores over 125 TB of scientific data, with 35 TB of data publicly available across more than 350 high-impact datasets. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.899</span></span>It adheres to the findable, accessible, interoperable, and reusable (FAIR) principles of data sharing and is recognized as one of the NIH-approved Data Repositories.By facilitating scientific data management, discovery, and analysis, Pennsieve fosters a robust and collaborative research ecosystem for neuroscience and beyond.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.10509v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-16</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Enhancing Video Transmission with Machine Learning based Routing in Software-Defined Networks
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Our study uses the centralized, flexible, dynamic, and programmable structure of Software-Defined networks (SDN) to overcome the problems.Although SDN effectively addresses the challenges present in traditional networks, it still requires further enhancements to achieve a more optimized network architecture.The Floodlight controller utilized in this study employs metrics such as hop count, which provides limited information for routing.In scenarios such as video transmission, this situation is insufficient and the need for optimization arises.For this purpose, an artificial intelligence (AI) based routing algorithm is proposed between the server and the client in the scenario based on NSFNET topology.The topology designed with the Floodlight controller in the Mininet simulation environment includes a client, a server, and 14 switches.A realistic network environment is provided by adding different receivers and creating TCP traffic between these receivers using the iperf3 tool.In three scenarios, video streaming is performed using the FFmpeg tool, and 49 path metrics such as RTT, throughput, and loss are recorded.In these scenarios, PSNR and SSIM calculations are made to observe the differences between the transmitted and the original video in congested and uncongested environments.<span class='px-1 mx-1 bg-yellow-200'>Due to the lack of a dataset suitable for the proposed network environment in the literature, a new dataset consisting of 876 records is created using continuously transmitted video traffic. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.881</span></span>Low and high traffic levels are created within the dataset, and different machine learning techniques such as KNN, Random Forest, SVM, AdaBoost, Logistic Regression and XGBoost are applied using the features that affect the traffic levels.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.10512v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-12</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                The CLC-UKET Dataset: Benchmarking Case Outcome Prediction for the UK Employment Tribunal
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>This paper explores the intersection of technological innovation and access to justice by developing a benchmark for predicting case outcomes in the UK Employment Tribunal (UKET).To address the challenge of extensive manual annotation, the study employs a large language model (LLM) for automatic annotation, resulting in the creation of the CLC-UKET dataset.<span class='px-1 mx-1 bg-yellow-200'>The dataset consists of approximately 19,000 UKET cases and their metadata. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.961</span></span>Comprehensive legal annotations cover facts, claims, precedent references, statutory references, case outcomes, reasons and jurisdiction codes.Facilitated by the CLC-UKET data, we examine a multi-class case outcome prediction task in the UKET.Human predictions are collected to establish a performance reference for model comparison.Empirical results from baseline models indicate that finetuned transformer models outperform zero-shot and few-shot LLMs on the UKET prediction task.The performance of zero-shot LLMs can be enhanced by integrating task-related information into few-shot examples.We hope that the CLC-UKET dataset, along with human annotations and empirical findings, can serve as a valuable benchmark for employment-related dispute resolution.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.08098v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-12</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Enhancing Canine Musculoskeletal Diagnoses: Leveraging Synthetic Image Data for Pre-Training AI-Models on Visual Documentations
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>The examination of the musculoskeletal system in dogs is a challenging task in veterinary practice.In this work, a novel method has been developed that enables efficient documentation of a dog's condition through a visual representation.However, since the visual documentation is new, there is no existing training data.The objective of this work is therefore to mitigate the impact of data scarcity in order to develop an AI-based diagnostic support system.To this end, the potential of synthetic data that mimics realistic visual documentations of diseases for pre-training AI models is investigated.We propose a method for generating synthetic image data that mimics realistic visual documentations.<span class='px-1 mx-1 bg-yellow-200'>Initially, a basic dataset containing three distinct classes is generated, followed by the creation of a more sophisticated dataset containing 36 different classes. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.715</span></span>Both datasets are used for the pre-training of an AI model.<span class='px-1 mx-1 bg-yellow-200'>Subsequently, an evaluation dataset is created, consisting of 250 manually created visual documentations for five different diseases. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.757</span></span><span class='px-1 mx-1 bg-yellow-200'>This dataset, along with a subset containing 25 examples. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.964</span></span>The obtained results on the evaluation dataset containing 25 examples demonstrate a significant enhancement of approximately 10% in diagnosis accuracy when utilizing generated synthetic images that mimic real-world visual documentations.However, these results do not hold true for the larger evaluation dataset containing 250 examples, indicating that the advantages of using synthetic data for pre-training an AI model emerge primarily when dealing with few examples of visual documentations for a given disease.Overall, this work provides valuable insights into mitigating the limitations imposed by limited training data through the strategic use of generated synthetic data, presenting an approach applicable beyond the canine musculoskeletal assessment domain.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.08181v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-12</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                AudioBERT: Audio Knowledge Augmented Language Model
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Recent studies have identified that language models, pretrained on text-only datasets, often lack elementary visual knowledge, \textit{e.g.,} colors of everyday objects.Motivated by this observation, we ask whether a similar shortcoming exists in terms of the \textit{auditory} knowledge.To answer this question, we construct a new dataset called AuditoryBench, which consists of two novel tasks for evaluating auditory knowledge.Based on our analysis using the benchmark, we find that language models also suffer from a severe lack of auditory knowledge.To address this limitation, we propose AudioBERT, a novel method to augment the auditory knowledge of BERT through a retrieval-based approach.First, we detect auditory knowledge spans in prompts to query our retrieval model efficiently.Then, we inject audio knowledge into BERT and switch on low-rank adaptation for effective adaptation when audio knowledge is required.Our experiments demonstrate that AudioBERT is quite effective, achieving superior performance on the AuditoryBench.<span class='px-1 mx-1 bg-yellow-200'>The dataset and code are available at \bulurl{https://github.com/HJ-Ok/AudioBERT}. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.766</span></span></p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.08199v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-12</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Touch2Touch: Cross-Modal Tactile Generation for Object Manipulation
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Today's touch sensors come in many shapes and sizes.This has made it challenging to develop general-purpose touch processing methods since models are generally tied to one specific sensor design.We address this problem by performing cross-modal prediction between touch sensors: given the tactile signal from one sensor, we use a generative model to estimate how the same physical contact would be perceived by another sensor.This allows us to apply sensor-specific methods to the generated signal.We implement this idea by training a diffusion model to translate between the popular GelSlim and Soft Bubble sensors.As a downstream task, we perform in-hand object pose estimation using GelSlim sensors while using an algorithm that operates only on Soft Bubble signals.<span class='px-1 mx-1 bg-yellow-200'>The dataset, the code, and additional details can be found at https://www.mmintlab.com/research/touch2touch/. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.86</span></span></p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.08269v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-12</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                DreamHOI: Subject-Driven Generation of 3D Human-Object Interactions with Diffusion Priors
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>We present DreamHOI, a novel method for zero-shot synthesis of human-object interactions (HOIs), enabling a 3D human model to realistically interact with any given object based on a textual description.<span class='px-1 mx-1 bg-yellow-200'>This task is complicated by the varying categories and geometries of real-world objects and the scarcity of datasets encompassing diverse HOIs. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.715</span></span>To circumvent the need for extensive data, we leverage text-to-image diffusion models trained on billions of image-caption pairs.We optimize the articulation of a skinned human mesh using Score Distillation Sampling (SDS) gradients obtained from these models, which predict image-space edits.However, directly backpropagating image-space gradients into complex articulation parameters is ineffective due to the local nature of such gradients.To overcome this, we introduce a dual implicit-explicit representation of a skinned mesh, combining (implicit) neural radiance fields (NeRFs) with (explicit) skeleton-driven mesh articulation.During optimization, we transition between implicit and explicit forms, grounding the NeRF generation while refining the mesh articulation.We validate our approach through extensive experiments, demonstrating its effectiveness in generating realistic HOIs.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.08278v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-11</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                ManaTTS Persian: a recipe for creating TTS datasets for lower resource languages
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p><span class='px-1 mx-1 bg-yellow-200'>In this study, we introduce ManaTTS, the most extensive publicly accessible single-speaker Persian corpus, and a comprehensive framework for collecting transcribed speech datasets for the Persian language. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.785</span></span>ManaTTS, released under the open CC-0 license, comprises approximately 86 hours of audio with a sampling rate of 44.1 kHz.Alongside ManaTTS, we also generated the VirgoolInformal dataset to evaluate Persian speech recognition models used for forced alignment, extending over 5 hours of audio.The datasets are supported by a fully transparent, MIT-licensed pipeline, a testament to innovation in the field.It includes unique tools for sentence tokenization, bounded audio segmentation, and a novel forced alignment method.This alignment technique is specifically designed for low-resource languages, addressing a crucial need in the field.With this dataset, we trained a Tacotron2-based TTS model, achieving a Mean Opinion Score (MOS) of 3.76, which is remarkably close to the MOS of 3.86 for the utterances generated by the same vocoder and natural spectrogram, and the MOS of 4.01 for the natural waveform, demonstrating the exceptional quality and effectiveness of the corpus.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.07259v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-11</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Visual Compositional Data Analytics for Spatial Transcriptomics
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p><span class='px-1 mx-1 bg-yellow-200'>For the Bio+Med-Vis Challenge 2024, we propose a visual analytics system as a redesign for the scatter pie chart visualization of cell type proportions of spatial transcriptomics data. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.729</span></span>Our design uses three linked views: a view of the histological image of the tissue, a stacked bar chart showing cell type proportions of the spots, and a scatter plot showing a dimensionality reduction of the multivariate proportions.Furthermore, we apply a compositional data analysis framework, the Aitchison geometry, to the proportions for dimensionality reduction and $k$-means clustering.Leveraging brushing and linking, the system allows one to explore and uncover patterns in the cell type mixtures and relate them to their spatial locations on the cellular tissue.This redesign shifts the pattern recognition workload from the human visual system to computational methods commonly used in visual analytics.<span class='px-1 mx-1 bg-yellow-200'>We provide the code and setup instructions of our visual analytics system on GitHub (https://github.com/UniStuttgart-VISUS/va-for-spatial-transcriptomics). <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.734</span></span></p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.07306v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-11</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Benchmarking 2D Egocentric Hand Pose Datasets
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Hand pose estimation from egocentric video has broad implications across various domains, including human-computer interaction, assistive technologies, activity recognition, and robotics, making it a topic of significant research interest.The efficacy of modern machine learning models depends on the quality of data used for their training.<span class='px-1 mx-1 bg-yellow-200'>Thus, this work is devoted to the analysis of state-of-the-art egocentric datasets suitable for 2D hand pose estimation. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.736</span></span><span class='px-1 mx-1 bg-yellow-200'>We propose a novel protocol for dataset evaluation, which encompasses not only the analysis of stated dataset characteristics and assessment of data quality, but also the identification of dataset shortcomings through the evaluation of state-of-the-art hand pose estimation models. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.788</span></span>Our study reveals that despite the availability of numerous egocentric databases intended for 2D hand pose estimation, the majority are tailored for specific use cases.<span class='px-1 mx-1 bg-yellow-200'>There is no ideal benchmark dataset yet; however, H2O and GANerated Hands datasets emerge as the most promising real and synthetic datasets, respectively. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.808</span></span></p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.07337v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-11</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                CLNX: Bridging Code and Natural Language for C/C++ Vulnerability-Contributing Commits Identification
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Large Language Models (LLMs) have shown great promise in vulnerability identification.As C/C++ comprises half of the Open-Source Software (OSS) vulnerabilities over the past decade and updates in OSS mainly occur through commits, enhancing LLMs' ability to identify C/C++ Vulnerability-Contributing Commits (VCCs) is essential.However, current studies primarily focus on further pre-training LLMs on massive code datasets, which is resource-intensive and poses efficiency challenges.In this paper, we enhance the ability of BERT-based LLMs to identify C/C++ VCCs in a lightweight manner.We propose CodeLinguaNexus (CLNX) as a bridge facilitating communication between C/C++ programs and LLMs.Based on commits, CLNX efficiently converts the source code into a more natural representation while preserving key details.Specifically, CLNX first applies structure-level naturalization to decompose complex programs, followed by token-level naturalization to interpret complex symbols.<span class='px-1 mx-1 bg-yellow-200'>We evaluate CLNX on public datasets of 25,872 C/C++ functions with their commits. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.748</span></span>The results show that CLNX significantly enhances the performance of LLMs on identifying C/C++ VCCs.Moreover, CLNX-equipped CodeBERT achieves new state-of-the-art and identifies 38 OSS vulnerabilities in the real world.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.07407v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-11</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                StereoCrafter: Diffusion-based Generation of Long and High-fidelity Stereoscopic 3D from Monocular Videos
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>This paper presents a novel framework for converting 2D videos to immersive stereoscopic 3D, addressing the growing demand for 3D content in immersive experience.Leveraging foundation models as priors, our approach overcomes the limitations of traditional methods and boosts the performance to ensure the high-fidelity generation required by the display devices.The proposed system consists of two main steps: depth-based video splatting for warping and extracting occlusion mask, and stereo video inpainting.We utilize pre-trained stable video diffusion as the backbone and introduce a fine-tuning protocol for the stereo video inpainting task.To handle input video with varying lengths and resolutions, we explore auto-regressive strategies and tiled processing.<span class='px-1 mx-1 bg-yellow-200'>Finally, a sophisticated data processing pipeline has been developed to reconstruct a large-scale and high-quality dataset to support our training. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.792</span></span>Our framework demonstrates significant improvements in 2D-to-3D video conversion, offering a practical solution for creating immersive content for 3D devices like Apple Vision Pro and 3D displays.In summary, this work contributes to the field by presenting an effective method for generating high-quality stereoscopic videos from monocular input, potentially transforming how we experience digital media.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.07447v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-11</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                VMAS: Video-to-Music Generation via Semantic Alignment in Web Music Videos
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>We present a framework for learning to generate background music from video inputs.Unlike existing works that rely on symbolic musical annotations, which are limited in quantity and diversity, our method leverages large-scale web videos accompanied by background music.This enables our model to learn to generate realistic and diverse music.To accomplish this goal, we develop a generative video-music Transformer with a novel semantic video-music alignment scheme.Our model uses a joint autoregressive and contrastive learning objective, which encourages the generation of music aligned with high-level video content.We also introduce a novel video-beat alignment scheme to match the generated music beats with the low-level motions in the video.Lastly, to capture fine-grained visual cues in a video needed for realistic background music generation, we introduce a new temporal video encoder architecture, allowing us to efficiently process videos consisting of many densely sampled frames.<span class='px-1 mx-1 bg-yellow-200'>We train our framework on our newly curated DISCO-MV dataset, consisting of 2.2M video-music samples, which is orders of magnitude larger than any prior datasets used for video music generation. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.735</span></span>Our method outperforms existing approaches on the DISCO-MV and MusicCaps datasets according to various music generation evaluation metrics, including human evaluation.Results are available at https://genjib.github.io/project_page/VMAs/index.html</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.07450v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-11</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Despite having tremendous progress in image-to-3D generation, existing methods still struggle to produce multi-view consistent images with high-resolution textures in detail, especially in the paradigm of 2D diffusion that lacks 3D awareness.In this work, we present High-resolution Image-to-3D model (Hi3D), a new video diffusion based paradigm that redefines a single image to multi-view images as 3D-aware sequential image generation (i.e., orbital video generation).This methodology delves into the underlying temporal consistency knowledge in video diffusion model that generalizes well to geometry consistency across multiple views in 3D generation.Technically, Hi3D first empowers the pre-trained video diffusion model with 3D-aware prior (camera pose condition), yielding multi-view images with low-resolution texture details.A 3D-aware video-to-video refiner is learnt to further scale up the multi-view images with high-resolution texture details.Such high-resolution multi-view images are further augmented with novel views through 3D Gaussian Splatting, which are finally leveraged to obtain high-fidelity meshes via 3D reconstruction.Extensive experiments on both novel view synthesis and single view reconstruction demonstrate that our Hi3D manages to produce superior multi-view consistency images with highly-detailed textures.<span class='px-1 mx-1 bg-yellow-200'>Source code and data are available at \url{https://github.com/yanghb22-fdu/Hi3D-Official}. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.733</span></span></p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.07452v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-10</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Exploring syntactic information in sentence embeddings through multilingual subject-verb agreement
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>In this paper, our goal is to investigate to what degree multilingual pretrained language models capture cross-linguistically valid abstract linguistic representations.<span class='px-1 mx-1 bg-yellow-200'>We take the approach of developing curated synthetic data on a large scale, with specific properties, and using them to study sentence representations built using pretrained language models. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.806</span></span>We use a new multiple-choice task and datasets, Blackbird Language Matrices (BLMs), to focus on a specific grammatical structural phenomenon -- subject-verb agreement across a variety of sentence structures -- in several languages.Finding a solution to this task requires a system detecting complex linguistic patterns and paradigms in text representations.Using a two-level architecture that solves the problem in two steps -- detect syntactic objects and their properties in individual sentences, and find patterns across an input sequence of sentences -- we show that despite having been trained on multilingual texts in a consistent manner, multilingual pretrained language models have language-specific differences, and syntactic structure is not shared, even across closely related languages.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.06567v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-10</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                LLaMA-Omni: Seamless Speech Interaction with Large Language Models
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Models like GPT-4o enable real-time interaction with large language models (LLMs) through speech, significantly enhancing user experience compared to traditional text-based interaction.However, there is still a lack of exploration on how to build speech interaction models based on open-source LLMs.To address this, we propose LLaMA-Omni, a novel model architecture designed for low-latency and high-quality speech interaction with LLMs.LLaMA-Omni integrates a pretrained speech encoder, a speech adaptor, an LLM, and a streaming speech decoder.It eliminates the need for speech transcription, and can simultaneously generate text and speech responses directly from speech instructions with extremely low latency.We build our model based on the latest Llama-3.1-8B-Instruct model.<span class='px-1 mx-1 bg-yellow-200'>To align the model with speech interaction scenarios, we construct a dataset named InstructS2S-200K, which includes 200K speech instructions and corresponding speech responses. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.85</span></span>Experimental results show that compared to previous speech-language models, LLaMA-Omni provides better responses in both content and style, with a response latency as low as 226ms.Additionally, training LLaMA-Omni takes less than 3 days on just 4 GPUs, paving the way for the efficient development of speech-language models in the future.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.06666v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-10</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Benchmarking Sub-Genre Classification For Mainstage Dance Music
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Music classification, with a wide range of applications, is one of the most prominent tasks in music information retrieval.To address the absence of comprehensive datasets and high-performing methods in the classification of mainstage dance music, this work introduces a novel benchmark comprising a new dataset and a baseline.<span class='px-1 mx-1 bg-yellow-200'>Our dataset extends the number of sub-genres to cover most recent mainstage live sets by top DJs worldwide in music festivals. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.85</span></span>A continuous soft labeling approach is employed to account for tracks that span multiple sub-genres, preserving the inherent sophistication.For the baseline, we developed deep learning models that outperform current state-of-the-art multimodel language models, which struggle to identify house music sub-genres, emphasizing the need for specialized models trained on fine-grained datasets.Our benchmark is applicable to serve for application scenarios such as music recommendation, DJ set curation, and interactive multimedia, where we also provide video demos.Our code is on \url{https://anonymous.4open.science/r/Mainstage-EDM-Benchmark/}.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.06690v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-09</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                AnomalyCD: A benchmark for Earth anomaly change detection with high-resolution and time-series observations
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Various Earth anomalies have destroyed the stable, balanced state, resulting in fatalities and serious destruction of property.With the advantages of large-scale and precise observation, high-resolution remote sensing images have been widely used for anomaly monitoring and localization.Powered by the deep representation, the existing methods have achieved remarkable advances, primarily in classification and change detection techniques.However, labeled samples are difficult to acquire due to the low probability of anomaly occurrence, and the trained models are limited to fixed anomaly categories, which hinders the application for anomalies with few samples or unknown anomalies.In this paper, to tackle this problem, we propose the anomaly change detection (AnomalyCD) technique, which accepts time-series observations and learns to identify anomalous changes by learning from the historical normal change pattern.Compared to the existing techniques, AnomalyCD processes an unfixed number of time steps and can localize the various anomalies in a unified manner, without human supervision.<span class='px-1 mx-1 bg-yellow-200'>To benchmark AnomalyCD, we constructed a high-resolution dataset with time-series images dedicated to various Earth anomalies (the AnomalyCDD dataset). <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.878</span></span>AnomalyCDD contains high-resolution (from 0.15 to 2.39 m/pixel), time-series (from 3 to 7 time steps), and large-scale images (1927.93 km2 in total) collected globally Furthermore, we developed a zero-shot baseline model (AnomalyCDM), which implements the AnomalyCD technique by extracting a general representation from the segment anything model (SAM) and conducting temporal comparison to distinguish the anomalous changes from normal changes.AnomalyCDM is designed as a two-stage workflow to enhance the efficiency, and has the ability to process the unseen images directly, without retraining for each scene.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.05679v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-09</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                LayeredFlow: A Real-World Benchmark for Non-Lambertian Multi-Layer Optical Flow
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Achieving 3D understanding of non-Lambertian objects is an important task with many useful applications, but most existing algorithms struggle to deal with such objects.One major obstacle towards progress in this field is the lack of holistic non-Lambertian benchmarks -- most benchmarks have low scene and object diversity, and none provide multi-layer 3D annotations for objects occluded by transparent surfaces.In this paper, we introduce LayeredFlow, a real world benchmark containing multi-layer ground truth annotation for optical flow of non-Lambertian objects.Compared to previous benchmarks, our benchmark exhibits greater scene and object diversity, with 150k high quality optical flow and stereo pairs taken over 185 indoor and outdoor scenes and 360 unique objects.Using LayeredFlow as evaluation data, we propose a new task called multi-layer optical flow.<span class='px-1 mx-1 bg-yellow-200'>To provide training data for this task, we introduce a large-scale densely-annotated synthetic dataset containing 60k images within 30 scenes tailored for non-Lambertian objects. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.91</span></span>Training on our synthetic dataset enables model to predict multi-layer optical flow, while fine-tuning existing optical flow methods on the dataset notably boosts their performance on non-Lambertian objects without compromising the performance on diffuse objects.Data is available at https://layeredflow.cs.princeton.edu.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.05688v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-09</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Extracting the U.S. building types from OpenStreetMap data
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Building type information is crucial for population estimation, traffic planning, urban planning, and emergency response applications.Although essential, such data is often not readily available.<span class='px-1 mx-1 bg-yellow-200'>To alleviate this problem, this work creates a comprehensive dataset by providing residential/non-residential building classification covering the entire United States. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.783</span></span>We propose and utilize an unsupervised machine learning method to classify building types based on building footprints and available OpenStreetMap information.The classification result is validated using authoritative ground truth data for select counties in the U.S.The validation shows a high precision for non-residential building classification and a high recall for residential buildings.We identified various approaches to improving the quality of the classification, such as removing sheds and garages from the dataset.Furthermore, analyzing the misclassifications revealed that they are mainly due to missing and scarce metadata in OSM.<span class='px-1 mx-1 bg-yellow-200'>A major result of this work is the resulting dataset of classifying 67,705,475 buildings. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.89</span></span>We hope that this data is of value to the scientific community, including urban and transportation planners.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.05692v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-09</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Towards Democratizing Multilingual Large Language Models For Medicine Through A Two-Stage Instruction Fine-tuning Approach
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Open-source, multilingual medical large language models (LLMs) have the potential to serve linguistically diverse populations across different regions.Adapting generic LLMs for healthcare often requires continual pretraining, but this approach is computationally expensive and sometimes impractical.Instruction fine-tuning on a specific task may not always guarantee optimal performance due to the lack of broader domain knowledge that the model needs to understand and reason effectively in diverse scenarios.<span class='px-1 mx-1 bg-yellow-200'>To address these challenges, we introduce two multilingual instruction fine-tuning datasets, MMed-IFT and MMed-IFT-MC, containing over 200k high-quality medical samples in six languages. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.775</span></span>We propose a two-stage training paradigm: the first stage injects general medical knowledge using MMed-IFT, while the second stage fine-tunes task-specific multiple-choice questions with MMed-IFT-MC.Our method achieves competitive results on both English and multilingual benchmarks, striking a balance between computational efficiency and performance.We plan to make our dataset and model weights public at \url{https://github.com/SpassMed/Med-Llama3} in the future.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.05732v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-09</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Are Heterophily-Specific GNNs and Homophily Metrics Really Effective? Evaluation Pitfalls and New Benchmarks
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Over the past decade, Graph Neural Networks (GNNs) have achieved great success on machine learning tasks with relational data.However, recent studies have found that heterophily can cause significant performance degradation of GNNs, especially on node-level tasks.Numerous heterophilic benchmark datasets have been put forward to validate the efficacy of heterophily-specific GNNs and various homophily metrics have been designed to help people recognize these malignant datasets.Nevertheless, there still exist multiple pitfalls that severely hinder the proper evaluation of new models and metrics.In this paper, we point out three most serious pitfalls: 1) a lack of hyperparameter tuning; 2) insufficient model evaluation on the real challenging heterophilic datasets; 3) missing quantitative evaluation benchmark for homophily metrics on synthetic graphs.<span class='px-1 mx-1 bg-yellow-200'>To overcome these challenges, we first train and fine-tune baseline models on $27$ most widely used benchmark datasets, categorize them into three distinct groups: malignant, benign and ambiguous heterophilic datasets, and identify the real challenging subsets of tasks. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.71</span></span>To our best knowledge, we are the first to propose such taxonomy.Then, we re-evaluate $10$ heterophily-specific state-of-the-arts (SOTA) GNNs with fine-tuned hyperparameters on different groups of heterophilic datasets.Based on the model performance, we reassess their effectiveness on addressing heterophily challenge.At last, we evaluate $11$ popular homophily metrics on synthetic graphs with three different generation approaches.To compare the metrics strictly, we propose the first quantitative evaluation method based on Fr\'echet distance.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.05755v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-09</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Benchmarking Chinese Knowledge Rectification in Large Language Models
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>While Large Language Models (LLMs) exhibit remarkable generative capabilities, they are not without flaws, particularly in the form of hallucinations.This issue is even more pronounced when LLMs are applied to specific languages and domains.For example, LLMs may generate nonsense information when handling Chinese ancient poetry, proverbs, or idioms, owing to the lack of specific knowledge.To this end, this paper introduces a benchmark for rectifying Chinese knowledge in LLMs via knowledge editing.<span class='px-1 mx-1 bg-yellow-200'>Specifically, we introduce a new Chinese dataset, CKnowEdit, by collecting seven type of knowledge from various sources, including classical texts, idioms, and content from Baidu Tieba Ruozhiba, thereby accounting for the unique polyphony, antithesis, and logical constructs inherent in the Chinese language. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.823</span></span>Through the analysis of this dataset, we uncover the challenges faced by current LLMs in mastering Chinese.Furthermore, our evaluation of state-of-the-art knowledge editing techniques on this dataset unveil the substantial scope for advancement in the rectification of Chinese knowledge.<span class='px-1 mx-1 bg-yellow-200'>Code and dataset are available at https://github.com/zjunlp/EasyEdit. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.842</span></span></p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.05806v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-09</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Evaluating Multiview Object Consistency in Humans and Image Models
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>We introduce a benchmark to directly evaluate the alignment between human observers and vision models on a 3D shape inference task.We leverage an experimental design from the cognitive sciences which requires zero-shot visual inferences about object shape: given a set of images, participants identify which contain the same/different objects, despite considerable viewpoint variation.We draw from a diverse range of images that include common objects (e.g., chairs) as well as abstract shapes (i.e., procedurally generated `nonsense' objects).<span class='px-1 mx-1 bg-yellow-200'>After constructing over 2000 unique image sets, we administer these tasks to human participants, collecting 35K trials of behavioral data from over 500 participants. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.725</span></span>This includes explicit choice behaviors as well as intermediate measures, such as reaction time and gaze data.We then evaluate the performance of common vision models (e.g., DINOv2, MAE, CLIP).We find that humans outperform all models by a wide margin.Using a multi-scale evaluation approach, we identify underlying similarities and differences between models and humans: while human-model performance is correlated, humans allocate more time/processing on challenging trials.All images, data, and code can be accessed via our project page.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.05862v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-09</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Promptable Closed-loop Traffic Simulation
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Simulation stands as a cornerstone for safe and efficient autonomous driving development.At its core a simulation system ought to produce realistic, reactive, and controllable traffic patterns.In this paper, we propose ProSim, a multimodal promptable closed-loop traffic simulation framework.ProSim allows the user to give a complex set of numerical, categorical or textual prompts to instruct each agent's behavior and intention.ProSim then rolls out a traffic scenario in a closed-loop manner, modeling each agent's interaction with other traffic participants.Our experiments show that ProSim achieves high prompt controllability given different user prompts, while reaching competitive performance on the Waymo Sim Agents Challenge when no prompt is given.<span class='px-1 mx-1 bg-yellow-200'>To support research on promptable traffic simulation, we create ProSim-Instruct-520k, a multimodal prompt-scenario paired driving dataset with over 10M text prompts for over 520k real-world driving scenarios. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.744</span></span><span class='px-1 mx-1 bg-yellow-200'>We will release code of ProSim as well as data and labeling tools of ProSim-Instruct-520k at https://ariostgx.github.io/ProSim. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.768</span></span></p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.05863v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-05</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                TCDiff: Triple Condition Diffusion Model with 3D Constraints for Stylizing Synthetic Faces
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>A robust face recognition model must be trained using datasets that include a large number of subjects and numerous samples per subject under varying conditions (such as pose, expression, age, noise, and occlusion).Due to ethical and privacy concerns, large-scale real face datasets have been discontinued, such as MS1MV3, and synthetic face generators have been proposed, utilizing GANs and Diffusion Models, such as SYNFace, SFace, DigiFace-1M, IDiff-Face, DCFace, and GANDiffFace, aiming to supply this demand.Some of these methods can produce high-fidelity realistic faces, but with low intra-class variance, while others generate high-variance faces with low identity consistency.In this paper, we propose a Triple Condition Diffusion Model (TCDiff) to improve face style transfer from real to synthetic faces through 2D and 3D facial constraints, enhancing face identity consistency while keeping the necessary high intra-class variance.<span class='px-1 mx-1 bg-yellow-200'>Face recognition experiments using 1k, 2k, and 5k classes of our new dataset for training outperform state-of-the-art synthetic datasets in real face benchmarks such as LFW, CFP-FP, AgeDB, and BUPT. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.776</span></span>Our source code is available at: https://github.com/BOVIFOCR/tcdiff.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.03600v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-05</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Reimagining Data Visualization to Address Sustainability Goals
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Information visualization holds significant potential to support sustainability goals such as environmental stewardship, and climate resilience by transforming complex data into accessible visual formats that enhance public understanding of complex climate change data and drive actionable insights.While the field has predominantly focused on analytical orientation of visualization, challenging traditional visualization techniques and goals, through critical visualization research expands existing assumptions and conventions in the field.In this paper, I explore how reimagining overlooked aspects of data visualization, such as engagement, emotional resonance, communication, and community empowerment, can contribute to achieving sustainability objectives.I argue that by focusing on inclusive data visualization that promotes clarity, understandability, and public participation, we can make complex data more relatable and actionable, fostering broader connections and mobilizing collective action on critical issues like climate change.<span class='px-1 mx-1 bg-yellow-200'>Moreover, I discuss the role of emotional receptivity in environmental data communication, stressing the need for visualizations that respect diverse cultural perspectives and emotional responses to achieve impactful outcomes. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.708</span></span>Drawing on insights from a decade of research in public participation and community engagement, I aim to highlight how data visualization can democratize data access and increase public involvement in order to contribute to a more sustainable and resilient future.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.03611v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-05</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Practical Forecasting of Cryptocoins Timeseries using Correlation Patterns
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Cryptocoins (i.e., Bitcoin, Ether, Litecoin) are tradable digital assets.Ownerships of cryptocoins are registered on distributed ledgers (i.e., blockchains).Secure encryption techniques guarantee the security of the transactions (transfers of coins among owners), registered into the ledger.Cryptocoins are exchanged for specific trading prices.The extreme volatility of such trading prices across all different sets of crypto-assets remains undisputed.However, the relations between the trading prices across different cryptocoins remains largely unexplored.Major coin exchanges indicate trend correlation to advise for sells or buys.However, price correlations remain largely unexplored.We shed some light on the trend correlations across a large variety of cryptocoins, by investigating their coin/price correlation trends over the past two years.We study the causality between the trends, and exploit the derived correlations to understand the accuracy of state-of-the-art forecasting techniques for time series modeling (e.g., GBMs, LSTM and GRU) of correlated cryptocoins.Our evaluation shows (i) strong correlation patterns between the most traded coins (e.g., Bitcoin and Ether) and other types of cryptocurrencies, and (ii) state-of-the-art time series forecasting algorithms can be used to forecast cryptocoins price trends.<span class='px-1 mx-1 bg-yellow-200'>We released datasets and code to reproduce our analysis to the research community. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.847</span></span></p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.03674v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-05</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Classification and Prediction of Heart Diseases using Machine Learning Algorithms
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Heart disease is a serious worldwide health issue because it claims the lives of many people who might have been treated if the disease had been identified earlier.The leading cause of death in the world is cardiovascular disease, usually referred to as heart disease.Creating reliable, effective, and precise predictions for these diseases is one of the biggest issues facing the medical world today.Although there are tools for predicting heart diseases, they are either expensive or challenging to apply for determining a patient's risk.The best classifier for foretelling and spotting heart disease was the aim of this research.This experiment examined a range of machine learning approaches, including Logistic Regression, K-Nearest Neighbor, Support Vector Machine, and Artificial Neural Networks, to determine which machine learning algorithm was most effective at predicting heart diseases.<span class='px-1 mx-1 bg-yellow-200'>One of the most often utilized data sets for this purpose, the UCI heart disease repository provided the data set for this study. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.795</span></span>The K-Nearest Neighbor technique was shown to be the most effective machine learning algorithm for determining whether a patient has heart disease.It will be beneficial to conduct further studies on the application of additional machine learning algorithms for heart disease prediction.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.03697v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-05</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                ArtiFade: Learning to Generate High-quality Subject from Blemished Images
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Subject-driven text-to-image generation has witnessed remarkable advancements in its ability to learn and capture characteristics of a subject using only a limited number of images.However, existing methods commonly rely on high-quality images for training and may struggle to generate reasonable images when the input images are blemished by artifacts.This is primarily attributed to the inadequate capability of current techniques in distinguishing subject-related features from disruptive artifacts.<span class='px-1 mx-1 bg-yellow-200'>In this paper, we introduce ArtiFade to tackle this issue and successfully generate high-quality artifact-free images from blemished datasets. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.84</span></span>Specifically, ArtiFade exploits fine-tuning of a pre-trained text-to-image model, aiming to remove artifacts.The elimination of artifacts is achieved by utilizing a specialized dataset that encompasses both unblemished images and their corresponding blemished counterparts during fine-tuning.ArtiFade also ensures the preservation of the original generative capabilities inherent within the diffusion model, thereby enhancing the overall performance of subject-driven methods in generating high-quality and artifact-free images.We further devise evaluation benchmarks tailored for this task.Through extensive qualitative and quantitative experiments, we demonstrate the generalizability of ArtiFade in effective artifact removal under both in-distribution and out-of-distribution scenarios.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.03745v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-05</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                WildVis: Open Source Visualizer for Million-Scale Chat Logs in the Wild
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>The increasing availability of real-world conversation data offers exciting opportunities for researchers to study user-chatbot interactions.However, the sheer volume of this data makes manually examining individual conversations impractical.<span class='px-1 mx-1 bg-yellow-200'>To overcome this challenge, we introduce WildVis, an interactive tool that enables fast, versatile, and large-scale conversation analysis. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.725</span></span>WildVis provides search and visualization capabilities in the text and embedding spaces based on a list of criteria.To manage million-scale datasets, we implemented optimizations including search index construction, embedding precomputation and compression, and caching to ensure responsive user interactions within seconds.We demonstrate WildVis's utility through three case studies: facilitating chatbot misuse research, visualizing and comparing topic distributions across datasets, and characterizing user-specific conversation patterns.WildVis is open-source and designed to be extendable, supporting additional datasets and customized search and visualization functionalities.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.03753v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-05</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Foundation Model or Finetune? Evaluation of few-shot semantic segmentation for river pollution
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Foundation models (FMs) are a popular topic of research in AI.Their ability to generalize to new tasks and datasets without retraining or needing an abundance of data makes them an appealing candidate for applications on specialist datasets.In this work, we compare the performance of FMs to finetuned pre-trained supervised models in the task of semantic segmentation on an entirely new dataset.We see that finetuned models consistently outperform the FMs tested, even in cases were data is scarce.<span class='px-1 mx-1 bg-yellow-200'>We release the code and dataset for this work on GitHub. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.837</span></span></p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.03754v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-04</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                The Impact of Balancing Real and Synthetic Data on Accuracy and Fairness in Face Recognition
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Over the recent years, the advancements in deep face recognition have fueled an increasing demand for large and diverse datasets.Nevertheless, the authentic data acquired to create those datasets is typically sourced from the web, which, in many cases, can lead to significant privacy issues due to the lack of explicit user consent.Furthermore, obtaining a demographically balanced, large dataset is even more difficult because of the natural imbalance in the distribution of images from different demographic groups.In this paper, we investigate the impact of demographically balanced authentic and synthetic data, both individually and in combination, on the accuracy and fairness of face recognition models.Initially, several generative methods were used to balance the demographic representations of the corresponding synthetic datasets.Then a state-of-the-art face encoder was trained and evaluated using (combinations of) synthetic and authentic images.Our findings emphasized two main points: (i) the increased effectiveness of training data generated by diffusion-based models in enhancing accuracy, whether used alone or combined with subsets of authentic data, and (ii) the minimal impact of incorporating balanced data from pre-trained generative methods on fairness (in nearly all tested scenarios using combined datasets, fairness scores remained either unchanged or worsened, even when compared to unbalanced authentic datasets).<span class='px-1 mx-1 bg-yellow-200'>Source code and data are available at \url{https://cutt.ly/AeQy1K5G} for reproducibility. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.74</span></span></p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.02867v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-04</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Can LVLMs Obtain a Driver's License? A Benchmark Towards Reliable AGI for Autonomous Driving
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Large Vision-Language Models (LVLMs) have recently garnered significant attention, with many efforts aimed at harnessing their general knowledge to enhance the interpretability and robustness of autonomous driving models.However, LVLMs typically rely on large, general-purpose datasets and lack the specialized expertise required for professional and safe driving.Existing vision-language driving datasets focus primarily on scene understanding and decision-making, without providing explicit guidance on traffic rules and driving skills, which are critical aspects directly related to driving safety.<span class='px-1 mx-1 bg-yellow-200'>To bridge this gap, we propose IDKB, a large-scale dataset containing over one million data items collected from various countries, including driving handbooks, theory test data, and simulated road test data. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.916</span></span>Much like the process of obtaining a driver's license, IDKB encompasses nearly all the explicit knowledge needed for driving from theory to practice.In particular, we conducted comprehensive tests on 15 LVLMs using IDKB to assess their reliability in the context of autonomous driving and provided extensive analysis.We also fine-tuned popular models, achieving notable performance improvements, which further validate the significance of our dataset.The project page can be found at: \url{https://4dvlab.github.io/project_page/idkb.html}</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.02914v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-04</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins (early version)
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Effective collaboration of dual-arm robots and their tool use capabilities are increasingly important areas in the advancement of robotics.These skills play a significant role in expanding robots' ability to operate in diverse real-world environments.However, progress is impeded by the scarcity of specialized training data.<span class='px-1 mx-1 bg-yellow-200'>This paper introduces RoboTwin, a novel benchmark dataset combining real-world teleoperated data with synthetic data from digital twins, designed for dual-arm robotic scenarios. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.717</span></span>Using the COBOT Magic platform, we have collected diverse data on tool usage and human-robot interaction.We present a innovative approach to creating digital twins using AI-generated content, transforming 2D images into detailed 3D models.Furthermore, we utilize large language models to generate expert-level training data and task-specific pose sequences oriented toward functionality.Our key contributions are: 1) the RoboTwin benchmark dataset, 2) an efficient real-to-simulation pipeline, and 3) the use of language models for automatic expert-level data generation.These advancements are designed to address the shortage of robotic training data, potentially accelerating the development of more capable and versatile robotic systems for a wide range of real-world applications.The project page is available at https://robotwin-benchmark.github.io/early-version/</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.02920v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td></td>
          <td>
            <h2 class="text-2xl tracking-tight pt-4 font-bold">Data Quality</h2>
          </td>
        </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-16</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Detecting Sexism in German Online Newspaper Comments with Open-Source Text Embeddings (Team GDA, GermEval2024 Shared Task 1: GerMS-Detect, Subtasks 1 and 2, Closed Track)
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Sexism in online media comments is a pervasive challenge that often manifests subtly, complicating moderation efforts as interpretations of what constitutes sexism can vary among individuals.We study monolingual and multilingual open-source text embeddings to reliably detect sexism and misogyny in German-language online comments from an Austrian newspaper.<span class='px-1 mx-1 bg-yellow-200'>We observed classifiers trained on text embeddings to mimic closely the individual judgements of human annotators. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.628</span></span>Our method showed robust performance in the GermEval 2024 GerMS-Detect Subtask 1 challenge, achieving an average macro F1 score of 0.597 (4th place, as reported on Codabench).It also accurately predicted the distribution of human annotations in GerMS-Detect Subtask 2, with an average Jensen-Shannon distance of 0.301 (2nd place).The computational efficiency of our approach suggests potential for scalable applications across various languages and linguistic contexts.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.10341v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-16</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Efficiently Crowdsourcing Visual Importance with Punch-Hole Annotation
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>We introduce a novel crowdsourcing method for identifying important areas in graphical images through punch-hole labeling.Traditional methods, such as gaze trackers and mouse-based annotations, which generate continuous data, can be impractical in crowdsourcing scenarios.They require many participants, and the outcome data can be noisy.In contrast, our method first segments the graphical image with a grid and drops a portion of the patches (punch holes).<span class='px-1 mx-1 bg-yellow-200'>Then, we iteratively ask the labeler to validate each annotation with holes, narrowing down the annotation only having the most important area. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.724</span></span><span class='px-1 mx-1 bg-yellow-200'>This approach aims to reduce annotation noise in crowdsourcing by standardizing the annotations while enhancing labeling efficiency and reliability. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.771</span></span>Preliminary findings from fundamental charts demonstrate that punch-hole labeling can effectively pinpoint critical regions.This also highlights its potential for broader application in visualization research, particularly in studying large-scale users' graphical perception.Our future work aims to enhance the algorithm to achieve faster labeling speed and prove its utility through large-scale experiments.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.10459v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-16</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                DILA: Dictionary Label Attention for Mechanistic Interpretability in High-dimensional Multi-label Medical Coding Prediction
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Predicting high-dimensional or extreme multilabels, such as in medical coding, requires both accuracy and interpretability.<span class='px-1 mx-1 bg-yellow-200'>Existing works often rely on local interpretability methods, failing to provide comprehensive explanations of the overall mechanism behind each label prediction within a multilabel set. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.636</span></span>We propose a mechanistic interpretability module called DIctionary Label Attention (\method) that disentangles uninterpretable dense embeddings into a sparse embedding space, where each nonzero element (a dictionary feature) represents a globally learned medical concept.Through human evaluations, we show that our sparse embeddings are more human understandable than its dense counterparts by at least 50 percent.Our automated dictionary feature identification pipeline, leveraging large language models (LLMs), uncovers thousands of learned medical concepts by examining and summarizing the highest activating tokens for each dictionary feature.We represent the relationships between dictionary features and medical codes through a sparse interpretable matrix, enhancing the mechanistic and global understanding of the model's predictions while maintaining competitive performance and scalability without extensive human annotation.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.10504v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-09</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Robust Loss Functions for Object Grasping under Limited Ground Truth
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Object grasping is a crucial technology enabling robots to perceive and interact with the environment sufficiently.However, in practical applications, researchers are faced with missing or noisy ground truth while training the convolutional neural network, which decreases the accuracy of the model.Therefore, different loss functions are proposed to deal with these problems to improve the accuracy of the neural network.For missing ground truth, a new predicted category probability method is defined for unlabeled samples, which works effectively in conjunction with the pseudo-labeling method.<span class='px-1 mx-1 bg-yellow-200'>Furthermore, for noisy ground truth, a symmetric loss function is introduced to resist the corruption of label noises. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.645</span></span>The proposed loss functions are powerful, robust, and easy to use.Experimental results based on the typical grasping neural network show that our method can improve performance by 2 to 13 percent.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.05742v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-08-29</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Eigen-Cluster VIS: Improving Weakly-supervised Video Instance Segmentation by Leveraging Spatio-temporal Consistency
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>The performance of Video Instance Segmentation (VIS) methods has improved significantly with the advent of transformer networks.However, these networks often face challenges in training due to the high annotation cost.<span class='px-1 mx-1 bg-yellow-200'>To address this, unsupervised and weakly-supervised methods have been developed to reduce the dependency on annotations. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.613</span></span>This work introduces a novel weakly-supervised method called Eigen-cluster VIS that, without requiring any mask annotations, achieves competitive accuracy compared to other VIS approaches.This method is based on two key innovations: a Temporal Eigenvalue Loss (TEL) and a clip-level Quality Cluster Coefficient (QCC).The TEL ensures temporal coherence by leveraging the eigenvalues of the Laplacian matrix derived from graph adjacency matrices.By minimizing the mean absolute error (MAE) between the eigenvalues of adjacent frames, this loss function promotes smooth transitions and stable segmentation boundaries over time, reducing temporal discontinuities and improving overall segmentation quality.The QCC employs the K-means method to ensure the quality of spatio-temporal clusters without relying on ground truth masks.Using the Davies-Bouldin score, the QCC provides an unsupervised measure of feature discrimination, allowing the model to self-evaluate and adapt to varying object distributions, enhancing robustness during the testing phase.These enhancements are computationally efficient and straightforward, offering significant performance gains without additional annotated data.The proposed Eigen-Cluster VIS method is evaluated on the YouTube-VIS 2019/2021 and OVIS datasets, demonstrating that it effectively narrows the performance gap between the fully-supervised and weakly-supervised VIS approaches.The code is available on: https://github.com/farnooshar/EigenClusterVIS</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2408.16661v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-08-26</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                An Embedding is Worth a Thousand Noisy Labels
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>The performance of deep neural networks scales with dataset size and label quality, rendering the efficient mitigation of low-quality data annotations crucial for building robust and cost-effective systems.<span class='px-1 mx-1 bg-yellow-200'>Existing strategies to address label noise exhibit severe limitations due to computational complexity and application dependency. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.71</span></span>In this work, we propose WANN, a Weighted Adaptive Nearest Neighbor approach that builds on self-supervised feature representations obtained from foundation models.To guide the weighted voting scheme, we introduce a reliability score, which measures the likelihood of a data label being correct.WANN outperforms reference methods, including a linear layer trained with robust loss functions, on diverse datasets of varying size and under various noise types and severities.WANN also exhibits superior generalization on imbalanced data compared to both Adaptive-NNs (ANN) and fixed k-NNs.Furthermore, the proposed weighting scheme enhances supervised dimensionality reduction under noisy labels.This yields a significant boost in classification performance with 10x and 100x smaller image embeddings, minimizing latency and storage requirements.Our approach, emphasizing efficiency and explainability, emerges as a simple, robust solution to overcome the inherent limitations of deep neural network training.The code is available at https://github.com/francescodisalvo05/wann-noisy-labels .</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2408.14358v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td></td>
          <td>
            <h2 class="text-2xl tracking-tight pt-4 font-bold">Benchmarks</h2>
          </td>
        </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-16</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Escaping Local Minima: Hybrid Artificial Potential Field with Wall-Follower for Decentralized Multi-Robot Navigation
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>We tackle the challenges of decentralized multi-robot navigation in environments with nonconvex obstacles, where complete environmental knowledge is unavailable.While reactive methods like Artificial Potential Field (APF) offer simplicity and efficiency, they suffer from local minima, causing robots to become trapped due to their lack of global environmental awareness.Other existing solutions either rely on inter-robot communication, are limited to single-robot scenarios, or struggle to overcome nonconvex obstacles effectively.   Our proposed methods enable collision-free navigation using only local sensor and state information without a map.By incorporating a wall-following (WF) behavior into the APF approach, our method allows robots to escape local minima, even in the presence of nonconvex and dynamic obstacles including other robots.We introduce two algorithms for switching between APF and WF: a rule-based system and an encoder network trained on expert demonstrations.<span class='px-1 mx-1 bg-yellow-200'>Experimental results show that our approach achieves substantially higher success rates compared to state-of-the-art methods, highlighting its ability to overcome the limitations of local minima in complex environments <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.625</span></span></p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.10332v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-16</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Hyperedge Modeling in Hypergraph Neural Networks by using Densest Overlapping Subgraphs
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Hypergraphs tackle the limitations of traditional graphs by introducing {\em hyperedges}.While graph edges connect only two nodes, hyperedges connect an arbitrary number of nodes along their edges.Also, the underlying message-passing mechanisms in Hypergraph Neural Networks (HGNNs) are in the form of vertex-hyperedge-vertex, which let HGNNs capture and utilize richer and more complex structural information than traditional Graph Neural Networks (GNNs).More recently, the idea of overlapping subgraphs has emerged.These subgraphs can capture more information about subgroups of vertices without limiting one vertex belonging to just one group, allowing vertices to belong to multiple groups or subgraphs.In addition, one of the most important problems in graph clustering is to find densest overlapping subgraphs (DOS).In this paper, we propose a solution to the DOS problem via Agglomerative Greedy Enumeration (DOSAGE) algorithm as a novel approach to enhance the process of generating the densest overlapping subgraphs and, hence, a robust construction of the hypergraphs.<span class='px-1 mx-1 bg-yellow-200'>Experiments on standard benchmarks show that the DOSAGE algorithm significantly outperforms the HGNNs and six other methods on the node classification task. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.618</span></span></p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.10340v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-16</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Large Language Model Enhanced Hard Sample Identification for Denoising Recommendation
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Implicit feedback, often used to build recommender systems, unavoidably confronts noise due to factors such as misclicks and position bias.Previous studies have attempted to alleviate this by identifying noisy samples based on their diverged patterns, such as higher loss values, and mitigating the noise through sample dropping or reweighting.Despite the progress, we observe existing approaches struggle to distinguish hard samples and noise samples, as they often exhibit similar patterns, thereby limiting their effectiveness in denoising recommendations.To address this challenge, we propose a Large Language Model Enhanced Hard Sample Denoising (LLMHD) framework.Specifically, we construct an LLM-based scorer to evaluate the semantic consistency of items with the user preference, which is quantified based on summarized historical user interactions.The resulting scores are used to assess the hardness of samples for the pointwise or pairwise training objectives.<span class='px-1 mx-1 bg-yellow-200'>To ensure efficiency, we introduce a variance-based sample pruning strategy to filter potential hard samples before scoring. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.615</span></span>Besides, we propose an iterative preference update module designed to continuously refine summarized user preference, which may be biased due to false-positive user-item interactions.Extensive experiments on three real-world datasets and four backbone recommenders demonstrate the effectiveness of our approach.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.10343v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-16</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Frequency-Guided Masking for Enhanced Vision Self-Supervised Learning
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>We present a novel frequency-based Self-Supervised Learning (SSL) approach that significantly enhances its efficacy for pre-training.Prior work in this direction masks out pre-defined frequencies in the input image and employs a reconstruction loss to pre-train the model.<span class='px-1 mx-1 bg-yellow-200'>While achieving promising results, such an implementation has two fundamental limitations as identified in our paper. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.619</span></span>First, using pre-defined frequencies overlooks the variability of image frequency responses.Second, pre-trained with frequency-filtered images, the resulting model needs relatively more data to adapt to naturally looking images during fine-tuning.To address these drawbacks, we propose FOurier transform compression with seLf-Knowledge distillation (FOLK), integrating two dedicated ideas.First, inspired by image compression, we adaptively select the masked-out frequencies based on image frequency responses, creating more suitable SSL tasks for pre-training.Second, we employ a two-branch framework empowered by knowledge distillation, enabling the model to take both the filtered and original images as input, largely reducing the burden of downstream tasks.Our experimental results demonstrate the effectiveness of FOLK in achieving competitive performance to many state-of-the-art SSL methods across various downstream tasks, including image classification, few-shot learning, and semantic segmentation.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.10362v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-16</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                A Knowledge-Enhanced Disease Diagnosis Method Based on Prompt Learning and BERT Integration
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>This paper proposes a knowledge-enhanced disease diagnosis method based on a prompt learning framework.The method retrieves structured knowledge from external knowledge graphs related to clinical cases, encodes it, and injects it into the prompt templates to enhance the language model's understanding and reasoning capabilities for the task.We conducted experiments on three public datasets: CHIP-CTC, IMCS-V2-NER, and KUAKE-QTR.The results show that the proposed method significantly outperforms existing models across multiple evaluation metrics, with an F1 score improvement of 2.4% on the CHIP-CTC dataset, 3.1% on the IMCS-V2-NER dataset,and 4.2% on the KUAKE-QTR dataset.Additionally,ablation studies confirmed the critical role of the knowledge injection module,as the removal of this module resulted in a significant drop in F1 score.<span class='px-1 mx-1 bg-yellow-200'>The experimental results demonstrate that the proposed method not only effectively improves the accuracy of disease diagnosis but also enhances the interpretability of the predictions, providing more reliable support and evidence for clinical diagnosis. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.605</span></span></p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.10403v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-16</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Deep-Wide Learning Assistance for Insect Pest Classification
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Accurate insect pest recognition plays a critical role in agriculture.It is a challenging problem due to the intricate characteristics of insects.In this paper, we present DeWi, novel learning assistance for insect pest classification.With a one-stage and alternating training strategy, DeWi simultaneously improves several Convolutional Neural Networks in two perspectives: discrimination (by optimizing a triplet margin loss in a supervised training manner) and generalization (via data augmentation).From that, DeWi can learn discriminative and in-depth features of insect pests (deep) yet still generalize well to a large number of insect categories (wide).<span class='px-1 mx-1 bg-yellow-200'>Experimental results show that DeWi achieves the highest performances on two insect pest classification benchmarks (76.44\% accuracy on the IP102 dataset and 99.79\% accuracy on the D0 dataset, respectively). <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.628</span></span>In addition, extensive evaluations and ablation studies are conducted to thoroughly investigate our DeWi and demonstrate its superiority.Our source code is available at https://github.com/toannguyen1904/DeWi.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.10445v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-16</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Incorporating Classifier-Free Guidance in Diffusion Model-Based Recommendation
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>This paper presents a diffusion-based recommender system that incorporates classifier-free guidance.Most current recommender systems provide recommendations using conventional methods such as collaborative or content-based filtering.Diffusion is a new approach to generative AI that improves on previous generative AI approaches such as Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs).We incorporate diffusion in a recommender system that mirrors the sequence users take when browsing and rating items.Although a few current recommender systems incorporate diffusion, they do not incorporate classifier-free guidance, a new innovation in diffusion models as a whole.In this paper, we present a diffusion recommender system that augments the underlying recommender system model for improved performance and also incorporates classifier-free guidance.<span class='px-1 mx-1 bg-yellow-200'>Our findings show improvements over state-of-the-art recommender systems for most metrics for several recommendation tasks on a variety of datasets. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.602</span></span>In particular, our approach demonstrates the potential to provide better recommendations when data is sparse.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.10494v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-12</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Ranked Enumeration for Database Queries
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Ranked enumeration is a query-answering paradigm where the query answers are returned incrementally in order of importance (instead of returning all answers at once).Importance is defined by a ranking function that can be specific to the application, but typically involves either a lexicographic order (e.g., "ORDER BY R.A, S.B" in SQL) or a weighted sum of attributes (e.g., "ORDER BY 3*R.A + 2*S.B").We recently introduced any-k algorithms for (multi-way) join queries, which push ranking into joins and avoid materializing intermediate results until necessary.<span class='px-1 mx-1 bg-yellow-200'>The top-ranked answers are returned asymptotically faster than the common join-then-rank approach of database systems, resulting in orders-of-magnitude speedup in practice.    <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.634</span></span>In addition to their practical usefulness, our techniques complement a long line of theoretical research on unranked enumeration, where answers are also returned incrementally, but with no explicit ordering requirement.For a broad class of ranking functions with certain monotonicity properties, including lexicographic orders and sum-based rankings, the ordering requirement surprisingly does not increase the asymptotic time or space complexity, apart from logarithmic factors.   A key insight of our work is the connection between ranked enumeration for database queries and the fundamental task of computing the kth-shortest path in a graph.Uncovering these connections allowed us to ground our approach in the rich literature of that problem and connect ideas that had been explored in isolation before.In this article, we adopt a pragmatic approach and present a slightly simplified version of the algorithm without the shortest-path interpretation.We believe that this will benefit practitioners looking to implement and optimize any-k approaches.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.08142v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-12</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Machine Learning for Two-Sample Testing under Right-Censored Data: A Simulation Study
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>The focus of this study is to evaluate the effectiveness of Machine Learning (ML) methods for two-sample testing with right-censored observations.<span class='px-1 mx-1 bg-yellow-200'>To achieve this, we develop several ML-based methods with varying architectures and implement them as two-sample tests. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.621</span></span>Each method is an ensemble (stacking) that combines predictions from classical two-sample tests.This paper presents the results of training the proposed ML methods, examines their statistical power compared to classical two-sample tests, analyzes the distribution of test statistics for the proposed methods when the null hypothesis is true, and evaluates the significance of the features incorporated into the proposed methods.All results from numerical experiments were obtained from a synthetic dataset generated using the Smirnov transform (Inverse Transform Sampling) and replicated multiple times through Monte Carlo simulation.To test the two-sample problem with right-censored observations, one can use the proposed two-sample methods.All necessary materials (source code, example scripts, dataset, and samples) are available on GitHub and Hugging Face.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.08201v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-12</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Graph Laplacian-based Bayesian Multi-fidelity Modeling
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>We present a novel probabilistic approach for generating multi-fidelity data while accounting for errors inherent in both low- and high-fidelity data.In this approach a graph Laplacian constructed from the low-fidelity data is used to define a multivariate Gaussian prior density for the coordinates of the true data points.In addition, few high-fidelity data points are used to construct a conjugate likelihood term.Thereafter, Bayes rule is applied to derive an explicit expression for the posterior density which is also multivariate Gaussian.The maximum \textit{a posteriori} (MAP) estimate of this density is selected to be the optimal multi-fidelity estimate.It is shown that the MAP estimate and the covariance of the posterior density can be determined through the solution of linear systems of equations.<span class='px-1 mx-1 bg-yellow-200'>Thereafter, two methods, one based on spectral truncation and another based on a low-rank approximation, are developed to solve these equations efficiently. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.61</span></span>The multi-fidelity approach is tested on a variety of problems in solid and fluid mechanics with data that represents vectors of quantities of interest and discretized spatial fields in one and two dimensions.The results demonstrate that by utilizing a small fraction of high-fidelity data, the multi-fidelity approach can significantly improve the accuracy of a large collection of low-fidelity data points.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.08211v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-12</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                CliquePH: Higher-Order Information for Graph Neural Networks through Persistent Homology on Clique Graphs
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Graph neural networks have become the default choice by practitioners for graph learning tasks such as graph classification and node classification.Nevertheless, popular graph neural network models still struggle to capture higher-order information, i.e., information that goes \emph{beyond} pairwise interactions.Recent work has shown that persistent homology, a tool from topological data analysis, can enrich graph neural networks with topological information that they otherwise could not capture.Calculating such features is efficient for dimension 0 (connected components) and dimension 1 (cycles).However, when it comes to higher-order structures, it does not scale well, with a complexity of $O(n^d)$, where $n$ is the number of nodes and $d$ is the order of the structures.In this work, we introduce a novel method that extracts information about higher-order structures in the graph while still using the efficient low-dimensional persistent homology algorithm.<span class='px-1 mx-1 bg-yellow-200'>On standard benchmark datasets, we show that our method can lead to up to $31\%$ improvements in test accuracy. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.758</span></span></p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.08217v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-12</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Tweezers: A Framework for Security Event Detection via Event Attribution-centric Tweet Embedding
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Twitter is recognized as a crucial platform for the dissemination and gathering of Cyber Threat Intelligence (CTI).Its capability to provide real-time, actionable intelligence makes it an indispensable tool for detecting security events, helping security professionals cope with ever-growing threats.However, the large volume of tweets and inherent noises of human-crafted tweets pose significant challenges in accurately identifying security events.While many studies tried to filter out event-related tweets based on keywords, they are not effective due to their limitation in understanding the semantics of tweets.Another challenge in security event detection from Twitter is the comprehensive coverage of security events.Previous studies emphasized the importance of early detection of security events, but they overlooked the importance of event coverage.To cope with these challenges, in our study, we introduce a novel event attribution-centric tweet embedding method to enable the high precision and coverage of events.Our experiment result shows that the proposed method outperforms existing text and graph-based tweet embedding methods in identifying security events.Leveraging this novel embedding approach, we have developed and implemented a framework, Tweezers, that is applicable to security event detection from Twitter for CTI gathering.<span class='px-1 mx-1 bg-yellow-200'>This framework has demonstrated its effectiveness, detecting twice as many events compared to established baselines. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.608</span></span>Additionally, we have showcased two applications, built on Tweezers for the integration and inspection of security events, i.e., security event trend analysis and informative security user identification.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.08221v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-12</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                TextBoost: Towards One-Shot Personalization of Text-to-Image Models via Fine-tuning Text Encoder
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Recent breakthroughs in text-to-image models have opened up promising research avenues in personalized image generation, enabling users to create diverse images of a specific subject using natural language prompts.<span class='px-1 mx-1 bg-yellow-200'>However, existing methods often suffer from performance degradation when given only a single reference image. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.606</span></span>They tend to overfit the input, producing highly similar outputs regardless of the text prompt.This paper addresses the challenge of one-shot personalization by mitigating overfitting, enabling the creation of controllable images through text prompts.Specifically, we propose a selective fine-tuning strategy that focuses on the text encoder.Furthermore, we introduce three key techniques to enhance personalization performance: (1) augmentation tokens to encourage feature disentanglement and alleviate overfitting, (2) a knowledge-preservation loss to reduce language drift and promote generalizability across diverse prompts, and (3) SNR-weighted sampling for efficient training.Extensive experiments demonstrate that our approach efficiently generates high-quality, diverse images using only a single reference image while significantly reducing memory and storage requirements.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.08248v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-12</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Dynamic Prompting of Frozen Text-to-Image Diffusion Models for Panoptic Narrative Grounding
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Panoptic narrative grounding (PNG), whose core target is fine-grained image-text alignment, requires a panoptic segmentation of referred objects given a narrative caption.Previous discriminative methods achieve only weak or coarse-grained alignment by panoptic segmentation pretraining or CLIP model adaptation.Given the recent progress of text-to-image Diffusion models, several works have shown their capability to achieve fine-grained image-text alignment through cross-attention maps and improved general segmentation performance.However, the direct use of phrase features as static prompts to apply frozen Diffusion models to the PNG task still suffers from a large task gap and insufficient vision-language interaction, yielding inferior performance.Therefore, we propose an Extractive-Injective Phrase Adapter (EIPA) bypass within the Diffusion UNet to dynamically update phrase prompts with image features and inject the multimodal cues back, which leverages the fine-grained image-text alignment capability of Diffusion models more sufficiently.In addition, we also design a Multi-Level Mutual Aggregation (MLMA) module to reciprocally fuse multi-level image and phrase features for segmentation refinement.<span class='px-1 mx-1 bg-yellow-200'>Extensive experiments on the PNG benchmark show that our method achieves new state-of-the-art performance. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.621</span></span></p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.08251v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-12</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Large language models (LLMs) show remarkable potential to act as computer agents, enhancing human productivity and software accessibility in multi-modal tasks that require planning and reasoning.However, measuring agent performance in realistic environments remains a challenge since: (i) most benchmarks are limited to specific modalities or domains (e.g. text-only, web navigation, Q&A, coding) and (ii) full benchmark evaluations are slow (on order of magnitude of days) given the multi-step sequential nature of tasks.To address these challenges, we introduce the Windows Agent Arena: a reproducible, general environment focusing exclusively on the Windows operating system (OS) where agents can operate freely within a real Windows OS and use the same wide range of applications, tools, and web browsers available to human users when solving tasks.We adapt the OSWorld framework (Xie et al., 2024) to create 150+ diverse Windows tasks across representative domains that require agent abilities in planning, screen understanding, and tool usage.Our benchmark is scalable and can be seamlessly parallelized in Azure for a full benchmark evaluation in as little as 20 minutes.To demonstrate Windows Agent Arena's capabilities, we also introduce a new multi-modal agent, Navi.Our agent achieves a success rate of 19.5% in the Windows domain, compared to 74.5% performance of an unassisted human.<span class='px-1 mx-1 bg-yellow-200'>Navi also demonstrates strong performance on another popular web-based benchmark, Mind2Web. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.626</span></span>We offer extensive quantitative and qualitative analysis of Navi's performance, and provide insights into the opportunities for future research in agent development and data generation using Windows Agent Arena.   Webpage: https://microsoft.github.io/WindowsAgentArena   Code: https://github.com/microsoft/WindowsAgentArena</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.08264v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-12</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>This study addresses the challenge of accurately segmenting 3D Gaussian Splatting from 2D masks.Conventional methods often rely on iterative gradient descent to assign each Gaussian a unique label, leading to lengthy optimization and sub-optimal solutions.Instead, we propose a straightforward yet globally optimal solver for 3D-GS segmentation.The core insight of our method is that, with a reconstructed 3D-GS scene, the rendering of the 2D masks is essentially a linear function with respect to the labels of each Gaussian.As such, the optimal label assignment can be solved via linear programming in closed form.This solution capitalizes on the alpha blending characteristic of the splatting process for single step optimization.By incorporating the background bias in our objective function, our method shows superior robustness in 3D segmentation against noises.<span class='px-1 mx-1 bg-yellow-200'>Remarkably, our optimization completes within 30 seconds, about 50$\times$ faster than the best existing methods. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.675</span></span>Extensive experiments demonstrate the efficiency and robustness of our method in segmenting various scenes, and its superior performance in downstream tasks such as object removal and inpainting.Demos and code will be available at https://github.com/florinshen/FlashSplat.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.08270v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-11</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                RePlay: a Recommendation Framework for Experimentation and Production Use
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Using a single tool to build and compare recommender systems significantly reduces the time to market for new models.<span class='px-1 mx-1 bg-yellow-200'>In addition, the comparison results when using such tools look more consistent. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.724</span></span>This is why many different tools and libraries for researchers in the field of recommendations have recently appeared.Unfortunately, most of these frameworks are aimed primarily at researchers and require modification for use in production due to the inability to work on large datasets or an inappropriate architecture.In this demo, we present our open-source toolkit RePlay - a framework containing an end-to-end pipeline for building recommender systems, which is ready for production use.RePlay also allows you to use a suitable stack for the pipeline on each stage: Pandas, Polars, or Spark.This allows the library to scale computations and deploy to a cluster.Thus, RePlay allows data scientists to easily move from research mode to production mode using the same interfaces.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.07272v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-11</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Tuning-Free Online Robust Principal Component Analysis through Implicit Regularization
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>The performance of the standard Online Robust Principal Component Analysis (OR-PCA) technique depends on the optimum tuning of the explicit regularizers and this tuning is dataset sensitive.We aim to remove the dependency on these tuning parameters by using implicit regularization.We propose to use the implicit regularization effect of various modified gradient descents to make OR-PCA tuning free.<span class='px-1 mx-1 bg-yellow-200'>Our method incorporates three different versions of modified gradient descent that separately but naturally encourage sparsity and low-rank structures in the data. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.633</span></span><span class='px-1 mx-1 bg-yellow-200'>The proposed method performs comparable or better than the tuned OR-PCA for both simulated and real-world datasets. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.603</span></span>Tuning-free ORPCA makes it more scalable for large datasets since we do not require dataset-dependent parameter tuning.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.07275v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-11</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                List-based Optimization of Proximal Decoding for LDPC Codes
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>In this paper, the proximal decoding algorithm is considered within the context of additive white Gaussian noise (AWGN) channels.An analysis of the convergence behavior of the algorithm shows that proximal decoding inherently enters an oscillating behavior of the estimate after a certain number of iterations.Due to this oscillation, frame errors arising during decoding can often be attributed to only a few remaining wrongly decoded bit positions.In this letter, an improvement of the proximal decoding algorithm is proposed by establishing an additional step, in which these erroneous positions are attempted to be corrected.<span class='px-1 mx-1 bg-yellow-200'>We suggest an empirical rule with which the components most likely needing correction can be determined. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.602</span></span>Using this insight and performing a subsequent ``ML-in-the-list'' decoding, a gain of up to 1 dB is achieved compared to conventional proximal decoding, depending on the decoder parameters and the code.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.07278v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-11</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                General Methods for Evaluating Collision Probability of Different Types of Theta-phi Positioners
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>In many modern astronomical facilities, multi-object telescopes are crucial instruments.Most of these telescopes have thousands of robotic fiber positioners(RFPs) installed on their focal plane, sharing an overlapping workspace.Collisions between RFPs during their movement can result in some targets becoming unreachable and cause structural damage.Therefore, it is necessary to reasonably assess and evaluate the collision probability of the RFPs.In this study, we propose a mathematical models of collision probability and validate its results using Monte Carlo simulations.<span class='px-1 mx-1 bg-yellow-200'>In addition, a new collision calculation method is proposed for faster calculation(nearly 0.15% of original time). <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.628</span></span>Simulation experiments have verified that our method can evaluate the collision probability between RFPs with both equal and unequal arm lengths.Additionally, we found that adopting a target distribution based on a Poisson distribution can reduce the collision probability by approximately 2.6% on average.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.07288v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-11</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                A Unified Contrastive Loss for Self-Training
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Self-training methods have proven to be effective in exploiting abundant unlabeled data in semi-supervised learning, particularly when labeled data is scarce.While many of these approaches rely on a cross-entropy loss function (CE), recent advances have shown that the supervised contrastive loss function (SupCon) can be more effective.Additionally, unsupervised contrastive learning approaches have also been shown to capture high quality data representations in the unsupervised setting.To benefit from these advantages in a semi-supervised setting, we propose a general framework to enhance self-training methods, which replaces all instances of CE losses with a unique contrastive loss.By using class prototypes, which are a set of class-wise trainable parameters, we recover the probability distributions of the CE setting and show a theoretical equivalence with it.Our framework, when applied to popular self-training methods, results in significant performance improvements across three different datasets with a limited number of labeled data.<span class='px-1 mx-1 bg-yellow-200'>Additionally, we demonstrate further improvements in convergence speed, transfer ability, and hyperparameter stability. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.617</span></span>The code is available at \url{https://github.com/AurelienGauffre/semisupcon/}.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.07292v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-11</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Statistically Valid Information Bottleneck via Multiple Hypothesis Testing
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>The information bottleneck (IB) problem is a widely studied framework in machine learning for extracting compressed features that are informative for downstream tasks.However, current approaches to solving the IB problem rely on a heuristic tuning of hyperparameters, offering no guarantees that the learned features satisfy information-theoretic constraints.In this work, we introduce a statistically valid solution to this problem, referred to as IB via multiple hypothesis testing (IB-MHT), which ensures that the learned features meet the IB constraints with high probability, regardless of the size of the available dataset.The proposed methodology builds on Pareto testing and learn-then-test (LTT), and it wraps around existing IB solvers to provide statistical guarantees on the IB constraints.<span class='px-1 mx-1 bg-yellow-200'>We demonstrate the performance of IB-MHT on classical and deterministic IB formulations, validating the effectiveness of IB-MHT in outperforming conventional methods in terms of statistical robustness and reliability. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.617</span></span></p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.07325v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-11</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                A Contrastive Symmetric Forward-Forward Algorithm (SFFA) for Continual Learning Tasks
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>The so-called Forward-Forward Algorithm (FFA) has recently gained momentum as an alternative to the conventional back-propagation algorithm for neural network learning, yielding competitive performance across various modeling tasks.By replacing the backward pass of gradient back-propagation with two contrastive forward passes, the FFA avoids several shortcomings undergone by its predecessor (e.g., vanishing/exploding gradient) by enabling layer-wise training heuristics.In classification tasks, this contrastive method has been proven to effectively create a latent sparse representation of the input data, ultimately favoring discriminability.However, FFA exhibits an inherent asymmetric gradient behavior due to an imbalanced loss function between positive and negative data, adversely impacting on the model's generalization capabilities and leading to an accuracy degradation.To address this issue, this work proposes the Symmetric Forward-Forward Algorithm (SFFA), a novel modification of the original FFA which partitions each layer into positive and negative neurons.This allows the local fitness function to be defined as the ratio between the activation of positive neurons and the overall layer activity, resulting in a symmetric loss landscape during the training phase.<span class='px-1 mx-1 bg-yellow-200'>To evaluate the enhanced convergence of our method, we conduct several experiments using multiple image classification benchmarks, comparing the accuracy of models trained with SFFA to those trained with its FFA counterpart. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.623</span></span>As a byproduct of this reformulation, we explore the advantages of using a layer-wise training algorithm for Continual Learning (CL) tasks.The specialization of neurons and the sparsity of their activations induced by layer-wise training algorithms enable efficient CL strategies that incorporate new knowledge (classes) into the neural network, while preventing catastrophic forgetting of previously...</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.07387v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-11</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                A Scalable Algorithm for Active Learning
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>FIRAL is a recently proposed deterministic active learning algorithm for multiclass classification using logistic regression.<span class='px-1 mx-1 bg-yellow-200'>It was shown to outperform the state-of-the-art in terms of accuracy and robustness and comes with theoretical performance guarantees. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.673</span></span>However, its scalability suffers when dealing with datasets featuring a large number of points $n$, dimensions $d$, and classes $c$, due to its $\mathcal{O}(c^2d^2+nc^2d)$ storage and $\mathcal{O}(c^3(nd^2 + bd^3 + bn))$ computational complexity where $b$ is the number of points to select in active learning.To address these challenges, we propose an approximate algorithm with storage requirements reduced to $\mathcal{O}(n(d+c)+cd^2)$ and a computational complexity of $\mathcal{O}(bncd^2)$. Additionally, we present a parallel implementation on GPUs.We demonstrate the accuracy and scalability of our approach using MNIST, CIFAR-10, Caltech101, and ImageNet.The accuracy tests reveal no deterioration in accuracy compared to FIRAL.We report strong and weak scaling tests on up to 12 GPUs, for three million point synthetic dataset.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.07392v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-11</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Revisiting Static Feature-Based Android Malware Detection
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>The increasing reliance on machine learning (ML) in computer security, particularly for malware classification, has driven significant advancements.<span class='px-1 mx-1 bg-yellow-200'>However, the replicability and reproducibility of these results are often overlooked, leading to challenges in verifying research findings. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.633</span></span>This paper highlights critical pitfalls that undermine the validity of ML research in Android malware detection, focusing on dataset and methodological issues.We comprehensively analyze Android malware detection using two datasets and assess offline and continual learning settings with six widely used ML models.<span class='px-1 mx-1 bg-yellow-200'>Our study reveals that when properly tuned, simpler baseline methods can often outperform more complex models. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.654</span></span>To address reproducibility challenges, we propose solutions for improving datasets and methodological practices, enabling fairer model comparisons.Additionally, we open-source our code to facilitate malware analysis, making it extensible for new models and datasets.Our paper aims to support future research in Android malware detection and other security domains, enhancing the reliability and reproducibility of published results.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.07397v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-11</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Hierarchical Reinforcement Learning for Temporal Abstraction of Listwise Recommendation
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Modern listwise recommendation systems need to consider both long-term user perceptions and short-term interest shifts.Reinforcement learning can be applied on recommendation to study such a problem but is also subject to large search space, sparse user feedback and long interactive latency.Motivated by recent progress in hierarchical reinforcement learning, we propose a novel framework called mccHRL to provide different levels of temporal abstraction on listwise recommendation.Within the hierarchical framework, the high-level agent studies the evolution of user perception, while the low-level agent produces the item selection policy by modeling the process as a sequential decision-making problem.We argue that such framework has a well-defined decomposition of the outra-session context and the intra-session context, which are encoded by the high-level and low-level agents, respectively.To verify this argument, we implement both a simulator-based environment and an industrial dataset-based experiment.<span class='px-1 mx-1 bg-yellow-200'>Results observe significant performance improvement by our method, compared with several well-known baselines. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.841</span></span>Data and codes have been made public.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.07416v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-11</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Enhancing adversarial robustness in Natural Language Inference using explanations
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>The surge of state-of-the-art Transformer-based models has undoubtedly pushed the limits of NLP model performance, excelling in a variety of tasks.We cast the spotlight on the underexplored task of Natural Language Inference (NLI), since models trained on popular well-suited datasets are susceptible to adversarial attacks, allowing subtle input interventions to mislead the model.In this work, we validate the usage of natural language explanation as a model-agnostic defence strategy through extensive experimentation: only by fine-tuning a classifier on the explanation rather than premise-hypothesis inputs, robustness under various adversarial attacks is achieved in comparison to explanation-free baselines.Moreover, since there is no standard strategy of testing the semantic validity of the generated explanations, we research the correlation of widely used language generation metrics with human perception, in order for them to serve as a proxy towards robust NLI models.<span class='px-1 mx-1 bg-yellow-200'>Our approach is resource-efficient and reproducible without significant computational limitations. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.601</span></span></p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.07423v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-11</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                A Suite for Acoustic Language Model Evaluation
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Speech language models have recently demonstrated great potential as universal speech processing systems.Such models have the ability to model the rich acoustic information existing in audio signals, beyond spoken content, such as emotion, background noise, etc.Despite this, evaluation benchmarks which evaluate awareness to a wide range of acoustic aspects, are lacking.To help bridge this gap, we introduce SALMon, a novel evaluation suite encompassing background noise, emotion, speaker identity and room impulse response.<span class='px-1 mx-1 bg-yellow-200'>The proposed benchmarks both evaluate the consistency of the inspected element and how much it matches the spoken text. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.68</span></span>We follow a modelling based approach, measuring whether a model gives correct samples higher scores than incorrect ones.<span class='px-1 mx-1 bg-yellow-200'>This approach makes the benchmark fast to compute even for large models. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.648</span></span>We evaluated several speech language models on SALMon, thus highlighting the strengths and weaknesses of each evaluated method.Code and data are publicly available at https://pages.cs.huji.ac.il/adiyoss-lab/salmon/ .</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.07437v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-11</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Adaptive Adapter Routing for Long-Tailed Class-Incremental Learning
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>In our ever-evolving world, new data exhibits a long-tailed distribution, such as e-commerce platform reviews.This necessitates continuous model learning imbalanced data without forgetting, addressing the challenge of long-tailed class-incremental learning (LTCIL).Existing methods often rely on retraining linear classifiers with former data, which is impractical in real-world settings.In this paper, we harness the potent representation capabilities of pre-trained models and introduce AdaPtive Adapter RouTing (APART) as an exemplar-free solution for LTCIL.To counteract forgetting, we train inserted adapters with frozen pre-trained weights for deeper adaptation and maintain a pool of adapters for selection during sequential model updates.Additionally, we present an auxiliary adapter pool designed for effective generalization, especially on minority classes.Adaptive instance routing across these pools captures crucial correlations, facilitating a comprehensive representation of all classes.Consequently, APART tackles the imbalance problem as well as catastrophic forgetting in a unified framework.<span class='px-1 mx-1 bg-yellow-200'>Extensive benchmark experiments validate the effectiveness of APART. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.674</span></span>Code is available at: https://github.com/vita-qzh/APART</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.07446v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-10</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Neural Laplacian Operator for 3D Point Clouds
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>The discrete Laplacian operator holds a crucial role in 3D geometry processing, yet it is still challenging to define it on point clouds.Previous works mainly focused on constructing a local triangulation around each point to approximate the underlying manifold for defining the Laplacian operator, which may not be robust or accurate.In contrast, we simply use the K-nearest neighbors (KNN) graph constructed from the input point cloud and learn the Laplacian operator on the KNN graph with graph neural networks (GNNs).However, the ground-truth Laplacian operator is defined on a manifold mesh with a different connectivity from the KNN graph and thus cannot be directly used for training.To train the GNN, we propose a novel training scheme by imitating the behavior of the ground-truth Laplacian operator on a set of probe functions so that the learned Laplacian operator behaves similarly to the ground-truth Laplacian operator.We train our network on a subset of ShapeNet and evaluate it across a variety of point clouds.<span class='px-1 mx-1 bg-yellow-200'>Compared with previous methods, our method reduces the error by an order of magnitude and excels in handling sparse point clouds with thin structures or sharp features. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.6</span></span>Our method also demonstrates a strong generalization ability to unseen shapes.With our learned Laplacian operator, we further apply a series of Laplacian-based geometry processing algorithms directly to point clouds and achieve accurate results, enabling many exciting possibilities for geometry processing on point clouds.The code and trained models are available at https://github.com/IntelligentGeometry/NeLo.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.06506v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-10</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Multi-robot Task Allocation and Path Planning with Maximum Range Constraints
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>This letter presents a novel multi-robot task allocation and path planning method that considers robots' maximum range constraints in large-sized workspaces, enabling robots to complete the assigned tasks within their range limits.Firstly, we developed a fast path planner to solve global paths efficiently.Subsequently, we propose an innovative auction-based approach that integrates our path planner into the auction phase for reward computation while considering the robots' range limits.This method accounts for extra obstacle-avoiding travel distances rather than ideal straight-line distances, resolving the coupling between task allocation and path planning.Additionally, to avoid redundant computations during iterations, we implemented a lazy auction strategy to speed up the convergence of the task allocation.<span class='px-1 mx-1 bg-yellow-200'>Finally, we validated the proposed method's effectiveness and application potential through extensive simulation and real-world experiments. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.699</span></span>The implementation code for our method will be available at https://github.com/wuuya1/RangeTAP.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.06531v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-10</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                MAPS: Energy-Reliability Tradeoff Management in Autonomous Vehicles Through LLMs Penetrated Science
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>As autonomous vehicles become more prevalent, highly accurate and efficient systems are increasingly critical to improve safety, performance, and energy consumption.Efficient management of energy-reliability tradeoffs in these systems demands the ability to predict various conditions during vehicle operations.With the promising improvement of Large Language Models (LLMs) and the emergence of well-known models like ChatGPT, unique opportunities for autonomous vehicle-related predictions have been provided in recent years.This paper proposed MAPS using LLMs as map reader co-drivers to predict the vital parameters to set during the autonomous vehicle operation to balance the energy-reliability tradeoff.<span class='px-1 mx-1 bg-yellow-200'>The MAPS method demonstrates a 20% improvement in navigation accuracy compared to the best baseline method. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.605</span></span>MAPS also shows 11% energy savings in computational units and up to 54% in both mechanical and computational units.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.06558v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-10</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Learn2Aggregate: Supervised Generation of Chvátal-Gomory Cuts Using Graph Neural Networks
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>We present $\textit{Learn2Aggregate}$, a machine learning (ML) framework for optimizing the generation of Chv\'atal-Gomory (CG) cuts in mixed integer linear programming (MILP).The framework trains a graph neural network to classify useful constraints for aggregation in CG cut generation.The ML-driven CG separator selectively focuses on a small set of impactful constraints, improving runtimes without compromising the strength of the generated cuts.Key to our approach is the formulation of a constraint classification task which favours sparse aggregation of constraints, consistent with empirical findings.This, in conjunction with a careful constraint labeling scheme and a hybrid of deep learning and feature engineering, results in enhanced CG cut generation across five diverse MILP benchmarks.<span class='px-1 mx-1 bg-yellow-200'>On the largest test sets, our method closes roughly $\textit{twice}$ as much of the integrality gap as the standard CG method while running 40$% faster. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.688</span></span><span class='px-1 mx-1 bg-yellow-200'>This performance improvement is due to our method eliminating 75% of the constraints prior to aggregation. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.625</span></span></p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.06559v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-10</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Retrieval-Augmented Generation (RAG) has emerged as a common paradigm to use Large Language Models (LLMs) alongside private and up-to-date knowledge bases.In this work, we address the challenges of using LLM-as-a-Judge when evaluating grounded answers generated by RAG systems.To assess the calibration and discrimination capabilities of judge models, we identify 7 generator failure modes and introduce GroUSE (Grounded QA Unitary Scoring of Evaluators), a meta-evaluation benchmark of 144 unit tests.<span class='px-1 mx-1 bg-yellow-200'>This benchmark reveals that existing automated RAG evaluation frameworks often overlook important failure modes, even when using GPT-4 as a judge.    <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.617</span></span>To improve on the current design of automated RAG evaluation frameworks, we propose a novel pipeline and find that while closed models perform well on GroUSE, state-of-the-art open-source judges do not generalize to our proposed criteria, despite strong correlation with GPT-4's judgement.Our findings suggest that correlation with GPT-4 is an incomplete proxy for the practical performance of judge models and should be supplemented with evaluations on unit tests for precise failure mode detection.   We further show that finetuning Llama-3 on GPT-4's reasoning traces significantly boosts its evaluation capabilities, improving upon both correlation with GPT-4's evaluations and calibration on reference situations.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.06595v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-10</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                When to Extract ReID Features: A Selective Approach for Improved Multiple Object Tracking
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Extracting and matching Re-Identification (ReID) features is used by many state-of-the-art (SOTA)Multiple Object Tracking (MOT) methods, particularly effective against frequent and long-term occlusions.While end-to-end object detection and tracking have been the main focus of recent research, they have yet to outperform traditional methods in benchmarks like MOT17 and MOT20.Thus, from an application standpoint, methods with separate detection and embedding remain the best option for accuracy, modularity, and ease of implementation, though they are impractical for edge devices due to the overhead involved.In this paper, we investigate a selective approach to minimize the overhead of feature extraction while preserving accuracy, modularity, and ease of implementation.This approach can be integrated into various SOTA methods.<span class='px-1 mx-1 bg-yellow-200'>We demonstrate its effectiveness by applying it to StrongSORT and Deep OC-SORT. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.615</span></span>Experiments on MOT17, MOT20, and DanceTrack datasets show that our mechanism retains the advantages of feature extraction during occlusions while significantly reducing runtime.Additionally, it improves accuracy by preventing confusion in the feature-matching stage, particularly in cases of deformation and appearance similarity, which are common in DanceTrack.https://github.com/emirhanbayar/Fast-StrongSORT, https://github.com/emirhanbayar/Fast-Deep-OC-SORT</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.06617v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-10</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                MVGaussian: High-Fidelity text-to-3D Content Generation with Multi-View Guidance and Surface Densification
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>The field of text-to-3D content generation has made significant progress in generating realistic 3D objects, with existing methodologies like Score Distillation Sampling (SDS) offering promising guidance.However, these methods often encounter the "Janus" problem-multi-face ambiguities due to imprecise guidance.Additionally, while recent advancements in 3D gaussian splitting have shown its efficacy in representing 3D volumes, optimization of this representation remains largely unexplored.This paper introduces a unified framework for text-to-3D content generation that addresses these critical gaps.Our approach utilizes multi-view guidance to iteratively form the structure of the 3D model, progressively enhancing detail and accuracy.We also introduce a novel densification algorithm that aligns gaussians close to the surface, optimizing the structural integrity and fidelity of the generated models.Extensive experiments validate our approach, demonstrating that it produces high-quality visual outputs with minimal time cost.<span class='px-1 mx-1 bg-yellow-200'>Notably, our method achieves high-quality results within half an hour of training, offering a substantial efficiency gain over most existing methods, which require hours of training time to achieve comparable results. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.626</span></span></p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.06620v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-10</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                TeXBLEU: Automatic Metric for Evaluate LaTeX Format
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>LaTeX is highly suited to creating documents with special formatting, particularly in the fields of science, technology, mathematics, and computer science.Despite the increasing use of mathematical expressions in LaTeX format with language models, there are no evaluation metrics for evaluating them.In this study, we propose TeXBLEU, an evaluation metric tailored for mathematical expressions in LaTeX format, based on the n-gram-based BLEU metric that is widely used for translation tasks.The proposed TeXBLEU includes a predefined tokenizer trained on the arXiv paper dataset and a finetuned embedding model.It also considers the positional embedding of tokens.<span class='px-1 mx-1 bg-yellow-200'>Simultaneously, TeXBLEU compares tokens based on n-grams and computes the score using exponentiation of a logarithmic sum, similar to the original BLEU. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.627</span></span>Experimental results show that TeXBLEU outperformed traditional evaluation metrics such as BLEU, Rouge, CER, and WER when compared to human evaluation data on the test dataset of the MathBridge dataset, which contains 1,000 data points.<span class='px-1 mx-1 bg-yellow-200'>The average correlation coefficient with human evaluation was 0.71, which is an improvement of 87% compared with BLEU, which had the highest correlation with human evaluation data among the existing metrics. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.614</span></span>The code is available at https://github.com/KyuDan1/TeXBLEU.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.06639v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-10</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Optimal Workload Placement on Multi-Instance GPUs
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>There is an urgent and pressing need to optimize usage of Graphical Processing Units (GPUs), which have arguably become one of the most expensive and sought after IT resources.To help with this goal, several of the current generation of GPUs support a partitioning feature, called Multi-Instance GPU (MIG) to allow multiple workloads to share a GPU, albeit with some constraints.In this paper we investigate how to optimize the placement of Large Language Model (LLM)-based AI Inferencing workloads on GPUs.We first identify and present several use cases that are encountered in practice that require workloads to be efficiently placed or migrated to other GPUs to make room for incoming workloads.The overarching goal is to use as few GPUs as possible and to further minimize memory and compute wastage on GPUs that are utilized.We have developed two approaches to address this problem: an optimization method and a heuristic method.<span class='px-1 mx-1 bg-yellow-200'>We benchmark these with two workload scheduling heuristics for multiple use cases. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.626</span></span>Our results show up to 2.85x improvement in the number of GPUs used and up to 70% reduction in GPU wastage over baseline heuristics.We plan to enable the SRE community to leverage our proposed method in production environments.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.06646v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-10</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Image Vectorization with Depth: convexified shape layers with depth ordering
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Image vectorization is a process to convert a raster image into a scalable vector graphic format.Objective is to effectively remove the pixelization effect while representing boundaries of image by scaleable parameterized curves.We propose new image vectorization with depth which considers depth ordering among shapes and use curvature-based inpainting for convexifying shapes in vectorization process.From a given color quantized raster image, we first define each connected component of the same color as a shape layer, and construct depth ordering among them using a newly proposed depth ordering energy.Global depth ordering among all shapes is described by a directed graph, and we propose an energy to remove cycle within the graph.After constructing depth ordering of shapes, we convexify occluded regions by Euler's elastica curvature-based variational inpainting, and leverage on the stability of Modica-Mortola double-well potential energy to inpaint large regions.This is following human vision perception that boundaries of shapes extend smoothly, and we assume shapes are likely to be convex.Finally, we fit B\'{e}zier curves to the boundaries and save vectorization as a SVG file which allows superposition of curvature-based inpainted shapes following the depth ordering.This is a new way to vectorize images, by decomposing an image into scalable shape layers with computed depth ordering.This approach makes editing shapes and images more natural and intuitive.We also consider grouping shape layers for semantic vectorization.<span class='px-1 mx-1 bg-yellow-200'>We present various numerical results and comparisons against recent layer-based vectorization methods to validate the proposed model. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.639</span></span></p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.06648v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-10</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Benchmarking Sub-Genre Classification For Mainstage Dance Music
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Music classification, with a wide range of applications, is one of the most prominent tasks in music information retrieval.To address the absence of comprehensive datasets and high-performing methods in the classification of mainstage dance music, this work introduces a novel benchmark comprising a new dataset and a baseline.Our dataset extends the number of sub-genres to cover most recent mainstage live sets by top DJs worldwide in music festivals.A continuous soft labeling approach is employed to account for tracks that span multiple sub-genres, preserving the inherent sophistication.For the baseline, we developed deep learning models that outperform current state-of-the-art multimodel language models, which struggle to identify house music sub-genres, emphasizing the need for specialized models trained on fine-grained datasets.<span class='px-1 mx-1 bg-yellow-200'>Our benchmark is applicable to serve for application scenarios such as music recommendation, DJ set curation, and interactive multimedia, where we also provide video demos. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.614</span></span><span class='px-1 mx-1 bg-yellow-200'>Our code is on \url{https://anonymous.4open.science/r/Mainstage-EDM-Benchmark/}. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.619</span></span></p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.06690v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-09</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Are Heterophily-Specific GNNs and Homophily Metrics Really Effective? Evaluation Pitfalls and New Benchmarks
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Over the past decade, Graph Neural Networks (GNNs) have achieved great success on machine learning tasks with relational data.However, recent studies have found that heterophily can cause significant performance degradation of GNNs, especially on node-level tasks.Numerous heterophilic benchmark datasets have been put forward to validate the efficacy of heterophily-specific GNNs and various homophily metrics have been designed to help people recognize these malignant datasets.Nevertheless, there still exist multiple pitfalls that severely hinder the proper evaluation of new models and metrics.In this paper, we point out three most serious pitfalls: 1) a lack of hyperparameter tuning; 2) insufficient model evaluation on the real challenging heterophilic datasets; 3) missing quantitative evaluation benchmark for homophily metrics on synthetic graphs.To overcome these challenges, we first train and fine-tune baseline models on $27$ most widely used benchmark datasets, categorize them into three distinct groups: malignant, benign and ambiguous heterophilic datasets, and identify the real challenging subsets of tasks.To our best knowledge, we are the first to propose such taxonomy.Then, we re-evaluate $10$ heterophily-specific state-of-the-arts (SOTA) GNNs with fine-tuned hyperparameters on different groups of heterophilic datasets.Based on the model performance, we reassess their effectiveness on addressing heterophily challenge.At last, we evaluate $11$ popular homophily metrics on synthetic graphs with three different generation approaches.<span class='px-1 mx-1 bg-yellow-200'>To compare the metrics strictly, we propose the first quantitative evaluation method based on Fr\'echet distance. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.706</span></span></p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.05755v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-09</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Optimizing Vehicular Users Association in Urban Mobile Networks
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>This study aims to optimize vehicular user association to base stations in a mobile network.We propose an efficient heuristic solution that considers the base station average handover frequency, the channel quality indicator, and bandwidth capacity.We evaluate this solution using real-world base station locations from S\~ao Paulo, Brazil, and the SUMO mobility simulator.We compare our approach against a state of the art solution which uses route prediction, maintaining or surpassing the provided quality of service with the same number of handover operations.<span class='px-1 mx-1 bg-yellow-200'>Additionally, the proposed solution reduces the execution time by more than 80\% compared to an exact method, while achieving optimal solutions. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.712</span></span></p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.05845v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-09</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Flash Cache: Reducing Bias in Radiance Cache Based Inverse Rendering
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>State-of-the-art techniques for 3D reconstruction are largely based on volumetric scene representations, which require sampling multiple points to compute the color arriving along a ray.Using these representations for more general inverse rendering -- reconstructing geometry, materials, and lighting from observed images -- is challenging because recursively path-tracing such volumetric representations is expensive.Recent works alleviate this issue through the use of radiance caches: data structures that store the steady-state, infinite-bounce radiance arriving at any point from any direction.However, these solutions rely on approximations that introduce bias into the renderings and, more importantly, into the gradients used for optimization.<span class='px-1 mx-1 bg-yellow-200'>We present a method that avoids these approximations while remaining computationally efficient. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.69</span></span>In particular, we leverage two techniques to reduce variance for unbiased estimators of the rendering equation: (1) an occlusion-aware importance sampler for incoming illumination and (2) a fast cache architecture that can be used as a control variate for the radiance from a high-quality, but more expensive, volumetric cache.We show that by removing these biases our approach improves the generality of radiance cache based inverse rendering, as well as increasing quality in the presence of challenging light transport effects such as specular reflections.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.05867v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-09</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                NeurLZ: On Systematically Enhancing Lossy Compression Performance for Scientific Data based on Neural Learning with Error Control
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Large-scale scientific simulations generate massive datasets that pose significant challenges for storage and I/O.While traditional lossy compression techniques can improve performance, balancing compression ratio, data quality, and throughput remains difficult.To address this, we propose NeurLZ, a novel cross-field learning-based and error-controlled compression framework for scientific data.By integrating skipping DNN models, cross-field learning, and error control, our framework aims to substantially enhance lossy compression performance.Our contributions are three-fold: (1) We design a lightweight skipping model to provide high-fidelity detail retention, further improving prediction accuracy.(2) We adopt a cross-field learning approach to significantly improve data prediction accuracy, resulting in a substantially improved compression ratio.(3) We develop an error control approach to provide strict error bounds according to user requirements.We evaluated NeurLZ on several real-world HPC application datasets, including Nyx (cosmological simulation), Miranda (large turbulence simulation), and Hurricane (weather simulation).<span class='px-1 mx-1 bg-yellow-200'>Experiments demonstrate that our framework achieves up to a 90% relative reduction in bit rate under the same data distortion, compared to the best existing approach. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.617</span></span></p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.05785v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td></td>
          <td>
            <h2 class="text-2xl tracking-tight pt-4 font-bold">LLMs</h2>
          </td>
        </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-16</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Rate-Splitting Multiple Access for Coexistence of Semantic and Bit Communications
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>In the sixth generation (6G) of cellular networks, the demands for capacity and connectivity will increase dramatically to meet the requirements of emerging services for both humans and machines.Semantic communication has shown great potential because of its efficiency, and suitability for users who only care about the semantic meaning.<span class='px-1 mx-1 bg-yellow-200'>But bit communication is still needed for users requiring original messages. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.622</span></span>Therefore, there will be a coexistence of semantic and bit communications in future networks.This motivates us to explore how to allocate resources in such a coexistence scenario.We investigate different uplink multiple access (MA) schemes for the coexistence of semantic users and a bit user, namely orthogonal multiple access (OMA), non-orthogonal multiple access (NOMA) and rate-splitting multiple access (RSMA).We characterize the rate regions achieved by those MA schemes.The simulation results show that RSMA always outperforms NOMA and has better performance in high semantic rate regimes compared to OMA.We find that RSMA scheme design, rate region, and power allocation are quite different in the coexistence scenario compared to the bit-only communication, primarily due to the need to consider the understandability in semantic communications.Interestingly, in contrast to bit-only communications where RSMA is capacity achieving without any need for time sharing, in the coexistence scenario, time sharing helps enlarging RSMA rate region.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.10314v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-16</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                The 20 questions game to distinguish large language models
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p><span class='px-1 mx-1 bg-yellow-200'>In a parallel with the 20 questions game, we present a method to determine whether two large language models (LLMs), placed in a black-box context, are the same or not. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.641</span></span>The goal is to use a small set of (benign) binary questions, typically under 20.We formalize the problem and first establish a baseline using a random selection of questions from known benchmark datasets, achieving an accuracy of nearly 100% within 20 questions.<span class='px-1 mx-1 bg-yellow-200'>After showing optimal bounds for this problem, we introduce two effective questioning heuristics able to discriminate 22 LLMs by using half as many questions for the same task. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.697</span></span>These methods offer significant advantages in terms of stealth and are thus of interest to auditors or copyright owners facing suspicions of model leaks.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.10338v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-16</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Large Language Model Enhanced Hard Sample Identification for Denoising Recommendation
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Implicit feedback, often used to build recommender systems, unavoidably confronts noise due to factors such as misclicks and position bias.Previous studies have attempted to alleviate this by identifying noisy samples based on their diverged patterns, such as higher loss values, and mitigating the noise through sample dropping or reweighting.Despite the progress, we observe existing approaches struggle to distinguish hard samples and noise samples, as they often exhibit similar patterns, thereby limiting their effectiveness in denoising recommendations.To address this challenge, we propose a Large Language Model Enhanced Hard Sample Denoising (LLMHD) framework.<span class='px-1 mx-1 bg-yellow-200'>Specifically, we construct an LLM-based scorer to evaluate the semantic consistency of items with the user preference, which is quantified based on summarized historical user interactions. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.626</span></span>The resulting scores are used to assess the hardness of samples for the pointwise or pairwise training objectives.To ensure efficiency, we introduce a variance-based sample pruning strategy to filter potential hard samples before scoring.Besides, we propose an iterative preference update module designed to continuously refine summarized user preference, which may be biased due to false-positive user-item interactions.Extensive experiments on three real-world datasets and four backbone recommenders demonstrate the effectiveness of our approach.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.10343v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-16</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Learnings from a Large-Scale Deployment of an LLM-Powered Expert-in-the-Loop Healthcare Chatbot
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p><span class='px-1 mx-1 bg-yellow-200'>Large Language Models (LLMs) are widely used in healthcare, but limitations like hallucinations, incomplete information, and bias hinder their reliability. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.688</span></span><span class='px-1 mx-1 bg-yellow-200'>To address these, researchers released the Build Your Own expert Bot (BYOeB) platform, enabling developers to create LLM-powered chatbots with integrated expert verification. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.662</span></span>CataractBot, its first implementation, provides expert-verified responses to cataract surgery questions.A pilot evaluation showed its potential; however the study had a small sample size and was primarily qualitative.In this work, we conducted a large-scale 24-week deployment of CataractBot involving 318 patients and attendants who sent 1,992 messages, with 91.71\% of responses verified by seven experts.Analysis of interaction logs revealed that medical questions significantly outnumbered logistical ones, hallucinations were negligible, and experts rated 84.52\% of medical answers as accurate.As the knowledge base expanded with expert corrections, system performance improved by 19.02\%, reducing expert workload.<span class='px-1 mx-1 bg-yellow-200'>These insights guide the design of future LLM-powered chatbots. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.742</span></span></p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.10354v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-16</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Instigating Cooperation among LLM Agents Using Adaptive Information Modulation
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p><span class='px-1 mx-1 bg-yellow-200'>This paper introduces a novel framework combining LLM agents as proxies for human strategic behavior with reinforcement learning (RL) to engage these agents in evolving strategic interactions within team environments. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.652</span></span>Our approach extends traditional agent-based simulations by using strategic LLM agents (SLA) and introducing dynamic and adaptive governance through a pro-social promoting RL agent (PPA) that modulates information access across agents in a network, optimizing social welfare and promoting pro-social behavior.Through validation in iterative games, including the prisoner dilemma, we demonstrate that SLA agents exhibit nuanced strategic adaptations.The PPA agent effectively learns to adjust information transparency, resulting in enhanced cooperation rates.This framework offers significant insights into AI-mediated social dynamics, contributing to the deployment of AI in real-world team settings.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.10372v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-16</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                LLM as BT-Planner: Leveraging LLMs for Behavior Tree Generation in Robot Task Planning
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Robotic assembly tasks are open challenges due to the long task horizon and complex part relations.Behavior trees (BTs) are increasingly used in robot task planning for their modularity and flexibility, but manually designing them can be effort-intensive.<span class='px-1 mx-1 bg-yellow-200'>Large language models (LLMs) have recently been applied in robotic task planning for generating action sequences, but their ability to generate BTs has not been fully investigated. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.612</span></span><span class='px-1 mx-1 bg-yellow-200'>To this end, We propose LLM as BT-planner, a novel framework to leverage LLMs for BT generation in robotic assembly task planning and execution. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.7</span></span><span class='px-1 mx-1 bg-yellow-200'>Four in-context learning methods are introduced to utilize the natural language processing and inference capabilities of LLMs to produce task plans in BT format, reducing manual effort and ensuring robustness and comprehensibility. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.633</span></span><span class='px-1 mx-1 bg-yellow-200'>We also evaluate the performance of fine-tuned, fewer-parameter LLMs on the same tasks. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.665</span></span><span class='px-1 mx-1 bg-yellow-200'>Experiments in simulated and real-world settings show that our framework enhances LLMs' performance in BT generation, improving success rates in BT generation through in-context learning and supervised fine-tuning. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.657</span></span></p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.10444v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-16</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Schrodinger's Memory: Large Language Models
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p><span class='px-1 mx-1 bg-yellow-200'>Memory is the foundation of LLMs' functionality, yet past research has lacked an in-depth exploration of their memory capabilities and underlying theory. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.78</span></span><span class='px-1 mx-1 bg-yellow-200'>In this paper, we apply UAT theory to explain the memory mechanism of LLMs and propose a new approach for evaluating LLM performance by comparing the memory capacities of different models. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.732</span></span><span class='px-1 mx-1 bg-yellow-200'>Through extensive experiments, we validate our theory and the memory abilities of LLMs. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.745</span></span><span class='px-1 mx-1 bg-yellow-200'>Finally, we compare the capabilities of the human brain and LLMs, highlighting both their similarities and differences in terms of working mechanisms. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.69</span></span></p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.10482v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-16</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Code Vulnerability Detection: A Comparative Analysis of Emerging Large Language Models
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>The growing trend of vulnerability issues in software development as a result of a large dependence on open-source projects has received considerable attention recently.<span class='px-1 mx-1 bg-yellow-200'>This paper investigates the effectiveness of Large Language Models (LLMs) in identifying vulnerabilities within codebases, with a focus on the latest advancements in LLM technology. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.64</span></span><span class='px-1 mx-1 bg-yellow-200'>Through a comparative analysis, we assess the performance of emerging LLMs, specifically Llama, CodeLlama, Gemma, and CodeGemma, alongside established state-of-the-art models such as BERT, RoBERTa, and GPT-3. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.692</span></span><span class='px-1 mx-1 bg-yellow-200'>Our study aims to shed light on the capabilities of LLMs in vulnerability detection, contributing to the enhancement of software security practices across diverse open-source repositories. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.753</span></span>We observe that CodeGemma achieves the highest F1-score of 58\ and a Recall of 87\, amongst the recent additions of large language models to detect software security vulnerabilities.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.10490v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-16</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Causal Language Modeling Can Elicit Search and Reasoning Capabilities on Logic Puzzles
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p><span class='px-1 mx-1 bg-yellow-200'>Causal language modeling using the Transformer architecture has yielded remarkable capabilities in Large Language Models (LLMs) over the last few years. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.631</span></span><span class='px-1 mx-1 bg-yellow-200'>However, the extent to which fundamental search and reasoning capabilities emerged within LLMs remains a topic of ongoing debate. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.747</span></span>In this work, we study if causal language modeling can learn a complex task such as solving Sudoku puzzles.To solve a Sudoku, the model is first required to search over all empty cells of the puzzle to decide on a cell to fill and then apply an appropriate strategy to fill the decided cell.Sometimes, the application of a strategy only results in thinning down the possible values in a cell rather than concluding the exact value of the cell.In such cases, multiple strategies are applied one after the other to fill a single cell.We observe that Transformer models trained on this synthetic task can indeed learn to solve Sudokus (our model solves $94.21\%$ of the puzzles fully correctly) when trained on a logical sequence of steps taken by a solver.We find that training Transformers with the logical sequence of steps is necessary and without such training, they fail to learn Sudoku.We also extend our analysis to Zebra puzzles (known as Einstein puzzles) and show that the model solves $92.04 \%$ of the puzzles fully correctly.In addition, we study the internal representations of the trained Transformer and find that through linear probing, we can decode information about the set of possible values in any given cell from them, pointing to the presence of a strong reasoning engine implicit in the Transformer weights.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.10502v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-16</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Context-aware Code Segmentation for C-to-Rust Translation using Large Language Models
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>There is strong motivation to translate C code into Rust code due to the continuing threat of memory safety vulnerabilities in existing C programs and the significant attention paid to Rust as an alternative to the C language.<span class='px-1 mx-1 bg-yellow-200'>While large language models (LLMs) show promise for automating this translation by generating more natural and safer code than rule-based methods, previous studies have shown that LLM-generated Rust code often fails to compile, even for relatively small C programs, due to significant differences between the two languages and context window limitations. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.696</span></span><span class='px-1 mx-1 bg-yellow-200'>We propose an LLM-based translation scheme that improves the success rate of translating large-scale C code into compilable Rust code. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.669</span></span><span class='px-1 mx-1 bg-yellow-200'>Our approach involves three key techniques: (1) pre-processing the C code to better align its structure and expressions with Rust, (2) segmenting the code into optimally sized translation units to avoid exceeding the LLM's context window limits, and (3) iteratively compiling and repairing errors while maintaining consistency between translation units using context-supplementing prompts. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.643</span></span>Compilation success is an essential first step in achieving functional equivalence, as only compilable code can be further tested.In experiments with 20 benchmark C programs, including those exceeding 4 kilo lines of code, we successfully translated all programs into compilable Rust code without losing corresponding parts of the original code.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.10506v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-16</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p><span class='px-1 mx-1 bg-yellow-200'>Transformer-based large Language Models (LLMs) become increasingly important in various domains. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.685</span></span>However, the quadratic time complexity of attention operation poses a significant challenge for scaling to longer contexts due to the extremely high inference latency and GPU memory consumption for caching key-value (KV) vectors.This paper proposes RetrievalAttention, a training-free approach to accelerate attention computation.To leverage the dynamic sparse property of attention, RetrievalAttention builds approximate nearest neighbor search (ANNS) indexes upon KV vectors in CPU memory and retrieves the most relevant ones via vector search during generation.Due to the out-of-distribution (OOD) between query vectors and key vectors, off-the-shelf ANNS indexes still need to scan O(N) (usually 30% of all keys) data for accurate retrieval, which fails to exploit the high sparsity.RetrievalAttention first identifies the OOD challenge of ANNS-based attention, and addresses it via an attention-aware vector search algorithm that can adapt to queries and only access 1--3% of data, thus achieving a sub-linear time complexity.<span class='px-1 mx-1 bg-yellow-200'>RetrievalAttention greatly reduces the inference cost of long-context LLM with much lower GPU memory requirements while maintaining the model accuracy. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.678</span></span><span class='px-1 mx-1 bg-yellow-200'>Especially, RetrievalAttention only needs 16GB GPU memory for serving 128K tokens in LLMs with 8B parameters, which is capable of generating one token in 0.188 seconds on a single NVIDIA RTX4090 (24GB). <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.714</span></span></p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.10516v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-12</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Securing Large Language Models: Addressing Bias, Misinformation, and Prompt Attacks
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p><span class='px-1 mx-1 bg-yellow-200'>Large Language Models (LLMs) demonstrate impressive capabilities across various fields, yet their increasing use raises critical security concerns. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.726</span></span><span class='px-1 mx-1 bg-yellow-200'>This article reviews recent literature addressing key issues in LLM security, with a focus on accuracy, bias, content detection, and vulnerability to attacks. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.748</span></span><span class='px-1 mx-1 bg-yellow-200'>Issues related to inaccurate or misleading outputs from LLMs is discussed, with emphasis on the implementation from fact-checking methodologies to enhance response reliability. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.721</span></span><span class='px-1 mx-1 bg-yellow-200'>Inherent biases within LLMs are critically examined through diverse evaluation techniques, including controlled input studies and red teaming exercises. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.74</span></span>A comprehensive analysis of bias mitigation strategies is presented, including approaches from pre-processing interventions to in-training adjustments and post-processing refinements.<span class='px-1 mx-1 bg-yellow-200'>The article also probes the complexity of distinguishing LLM-generated content from human-produced text, introducing detection mechanisms like DetectGPT and watermarking techniques while noting the limitations of machine learning enabled classifiers under intricate circumstances. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.679</span></span><span class='px-1 mx-1 bg-yellow-200'>Moreover, LLM vulnerabilities, including jailbreak attacks and prompt injection exploits, are analyzed by looking into different case studies and large-scale competitions like HackAPrompt. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.676</span></span><span class='px-1 mx-1 bg-yellow-200'>This review is concluded by retrospecting defense mechanisms to safeguard LLMs, accentuating the need for more extensive research into the LLM security field. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.81</span></span></p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.08087v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-12</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                The CLC-UKET Dataset: Benchmarking Case Outcome Prediction for the UK Employment Tribunal
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>This paper explores the intersection of technological innovation and access to justice by developing a benchmark for predicting case outcomes in the UK Employment Tribunal (UKET).<span class='px-1 mx-1 bg-yellow-200'>To address the challenge of extensive manual annotation, the study employs a large language model (LLM) for automatic annotation, resulting in the creation of the CLC-UKET dataset. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.652</span></span>The dataset consists of approximately 19,000 UKET cases and their metadata.Comprehensive legal annotations cover facts, claims, precedent references, statutory references, case outcomes, reasons and jurisdiction codes.Facilitated by the CLC-UKET data, we examine a multi-class case outcome prediction task in the UKET.Human predictions are collected to establish a performance reference for model comparison.Empirical results from baseline models indicate that finetuned transformer models outperform zero-shot and few-shot LLMs on the UKET prediction task.<span class='px-1 mx-1 bg-yellow-200'>The performance of zero-shot LLMs can be enhanced by integrating task-related information into few-shot examples. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.699</span></span>We hope that the CLC-UKET dataset, along with human annotations and empirical findings, can serve as a valuable benchmark for employment-related dispute resolution.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.08098v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-12</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Rethinking Programmed I/O for Fast Devices, Cheap Cores, and Coherent Interconnects
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p><span class='px-1 mx-1 bg-yellow-200'>Conventional wisdom holds that an efficient interface between an OS running on a CPU and a high-bandwidth I/O device should be based on Direct Memory Access (DMA), descriptor rings, and interrupts: DMA offloads transfers from the CPU, descriptor rings provide buffering and queuing, and interrupts facilitate asynchronous interaction between cores and device with a lightweight notification mechanism. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.626</span></span>In this paper we question this wisdom in the light of modern hardware and workloads, particularly in cloud servers.We argue that the assumptions that led to this model are obsolete, and in many use-cases use of programmed I/O, where the CPU explicitly transfers data and control information to and from a device via loads and stores, actually results in a more efficient system.We quantitatively demonstrate these advantages using three use-cases: fine-grained RPC-style invocation of functions on an accelerator, offloading of operators in a streaming dataflow engine, and a network interface targeting for serverless functions.Moreover, we show that while these advantages are significant over a modern PCIe peripheral bus, a truly cache-coherent interconnect offers significant additional efficiency gains.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.08141v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-12</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                LLM-POTUS Score: A Framework of Analyzing Presidential Debates with Large Language Models
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Large language models have demonstrated remarkable capabilities in natural language processing, yet their application to political discourse analysis remains underexplored.<span class='px-1 mx-1 bg-yellow-200'>This paper introduces a novel approach to evaluating presidential debate performances using LLMs, addressing the longstanding challenge of objectively assessing debate outcomes. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.718</span></span>We propose a framework that analyzes candidates' "Policies, Persona, and Perspective" (3P) and how they resonate with the "Interests, Ideologies, and Identity" (3I) of four key audience groups: voters, businesses, donors, and politicians.<span class='px-1 mx-1 bg-yellow-200'>Our method employs large language models to generate the LLM-POTUS Score, a quantitative measure of debate performance based on the alignment between 3P and 3I. We apply this framework to analyze transcripts from recent U.S. presidential debates, demonstrating its ability to provide nuanced, multi-dimensional assessments of candidate performances. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.631</span></span>Our results reveal insights into the effectiveness of different debating strategies and their impact on various audience segments.<span class='px-1 mx-1 bg-yellow-200'>This study not only offers a new tool for political analysis but also explores the potential and limitations of using LLMs as impartial judges in complex social contexts. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.724</span></span>In addition, this framework provides individual citizens with an independent tool to evaluate presidential debate performances, which enhances democratic engagement and reduces reliance on potentially biased media interpretations and institutional influence, thereby strengthening the foundation of informed civic participation.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.08147v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-12</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Cross-Attention Based Influence Model for Manual and Nonmanual Sign Language Analysis
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p><span class='px-1 mx-1 bg-yellow-200'>Both manual (relating to the use of hands) and non-manual markers (NMM), such as facial expressions or mouthing cues, are important for providing the complete meaning of phrases in American Sign Language (ASL). <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.633</span></span>Efforts have been made in advancing sign language to spoken/written language understanding, but most of these have primarily focused on manual features only.In this work, using advanced neural machine translation methods, we examine and report on the extent to which facial expressions contribute to understanding sign language phrases.We present a sign language translation architecture consisting of two-stream encoders, with one encoder handling the face and the other handling the upper body (with hands).We propose a new parallel cross-attention decoding mechanism that is useful for quantifying the influence of each input modality on the output.The two streams from the encoder are directed simultaneously to different attention stacks in the decoder.Examining the properties of the parallel cross-attention weights allows us to analyze the importance of facial markers compared to body and hand features during a translating task.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.08162v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-12</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Fine-tuning Large Language Models for Entity Matching
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p><span class='px-1 mx-1 bg-yellow-200'>Generative large language models (LLMs) are a promising alternative to pre-trained language models for entity matching due to their high zero-shot performance and their ability to generalize to unseen entities. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.665</span></span><span class='px-1 mx-1 bg-yellow-200'>Existing research on using LLMs for entity matching has focused on prompt engineering and in-context learning. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.755</span></span><span class='px-1 mx-1 bg-yellow-200'>This paper explores the potential of fine-tuning LLMs for entity matching. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.758</span></span><span class='px-1 mx-1 bg-yellow-200'>We analyze fine-tuning along two dimensions: 1) The representation of training examples, where we experiment with adding different types of LLM-generated explanations to the training set, and 2) the selection and generation of training examples using LLMs. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.618</span></span>In addition to the matching performance on the source dataset, we investigate how fine-tuning affects the model's ability to generalize to other in-domain datasets as well as across topical domains.Our experiments show that fine-tuning significantly improves the performance of the smaller models while the results for the larger models are mixed.Fine-tuning also improves the generalization to in-domain datasets while hurting cross-domain transfer.<span class='px-1 mx-1 bg-yellow-200'>We show that adding structured explanations to the training set has a positive impact on the performance of three out of four LLMs, while the proposed example selection and generation methods only improve the performance of Llama 3.1 8B while decreasing the performance of GPT-4o Mini. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.658</span></span></p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.08185v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-12</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                LLM Honeypot: Leveraging Large Language Models as Advanced Interactive Honeypot Systems
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>The rapid evolution of cyber threats necessitates innovative solutions for detecting and analyzing malicious activity.Honeypots, which are decoy systems designed to lure and interact with attackers, have emerged as a critical component in cybersecurity.In this paper, we present a novel approach to creating realistic and interactive honeypot systems using Large Language Models (LLMs).By fine-tuning a pre-trained open-source language model on a diverse dataset of attacker-generated commands and responses, we developed a honeypot capable of sophisticated engagement with attackers.Our methodology involved several key steps: data collection and processing, prompt engineering, model selection, and supervised fine-tuning to optimize the model's performance.Evaluation through similarity metrics and live deployment demonstrated that our approach effectively generates accurate and informative responses.<span class='px-1 mx-1 bg-yellow-200'>The results highlight the potential of LLMs to revolutionize honeypot technology, providing cybersecurity professionals with a powerful tool to detect and analyze malicious activity, thereby enhancing overall security infrastructure. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.742</span></span></p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.08234v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-12</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Large Language Models still struggle in challenging scenarios that leverage structured data, complex reasoning, or tool usage.<span class='px-1 mx-1 bg-yellow-200'>In this paper, we propose Source2Synth: a new method that can be used for teaching LLMs new skills without relying on costly human annotations. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.751</span></span>Source2Synth takes as input a custom data source and produces synthetic data points with intermediate reasoning steps grounded in real-world sources.Source2Synth improves the dataset quality by discarding low-quality generations based on their answerability.We demonstrate the generality of this approach by applying it to two challenging domains: we test reasoning abilities in multi-hop question answering (MHQA), and tool usage in tabular question answering (TQA).Our method improves performance by 25.51% for TQA on WikiSQL and 22.57% for MHQA on HotPotQA compared to the fine-tuned baselines.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.08239v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-12</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question Answering
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>People often capture memories through photos, screenshots, and videos.While existing AI-based tools enable querying this data using natural language, they mostly only support retrieving individual pieces of information like certain objects in photos and struggle with answering more complex queries that involve interpreting interconnected memories like event sequences.<span class='px-1 mx-1 bg-yellow-200'>We conducted a one-month diary study to collect realistic user queries and generated a taxonomy of necessary contextual information for integrating with captured memories. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.601</span></span>We then introduce OmniQuery, a novel system that is able to answer complex personal memory-related questions that require extracting and inferring contextual information.<span class='px-1 mx-1 bg-yellow-200'>OmniQuery augments single captured memories through integrating scattered contextual information from multiple interconnected memories, retrieves relevant memories, and uses a large language model (LLM) to comprehensive answers. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.689</span></span>In human evaluations, we show the effectiveness of OmniQuery with an accuracy of 71.5%, and it outperformed a conventional RAG system, winning or tying in 74.5% of the time.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.08250v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-12</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p><span class='px-1 mx-1 bg-yellow-200'>Large language models (LLMs) show remarkable potential to act as computer agents, enhancing human productivity and software accessibility in multi-modal tasks that require planning and reasoning. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.679</span></span>However, measuring agent performance in realistic environments remains a challenge since: (i) most benchmarks are limited to specific modalities or domains (e.g. text-only, web navigation, Q&A, coding) and (ii) full benchmark evaluations are slow (on order of magnitude of days) given the multi-step sequential nature of tasks.To address these challenges, we introduce the Windows Agent Arena: a reproducible, general environment focusing exclusively on the Windows operating system (OS) where agents can operate freely within a real Windows OS and use the same wide range of applications, tools, and web browsers available to human users when solving tasks.We adapt the OSWorld framework (Xie et al., 2024) to create 150+ diverse Windows tasks across representative domains that require agent abilities in planning, screen understanding, and tool usage.Our benchmark is scalable and can be seamlessly parallelized in Azure for a full benchmark evaluation in as little as 20 minutes.To demonstrate Windows Agent Arena's capabilities, we also introduce a new multi-modal agent, Navi.Our agent achieves a success rate of 19.5% in the Windows domain, compared to 74.5% performance of an unassisted human.Navi also demonstrates strong performance on another popular web-based benchmark, Mind2Web.We offer extensive quantitative and qualitative analysis of Navi's performance, and provide insights into the opportunities for future research in agent development and data generation using Windows Agent Arena.   Webpage: https://microsoft.github.io/WindowsAgentArena   Code: https://github.com/microsoft/WindowsAgentArena</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.08264v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td></td>
          <td>
            <h2 class="text-2xl tracking-tight pt-4 font-bold">Developer Research</h2>
          </td>
        </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-11</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Regulatory Requirements Engineering in Large Enterprises: An Interview Study on the European Accessibility Act
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Context: Regulations, such as the European Accessibility Act (EAA), impact the engineering of software products and services.<span class='px-1 mx-1 bg-yellow-200'>Managing that impact while providing meaningful inputs to development teams is one of the emerging requirements engineering (RE) challenges.    <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.647</span></span>Problem: Enterprises conduct Regulatory Impact Analysis (RIA) to consider the effects of regulations on software products offered and formulate requirements at an enterprise level.Despite its practical relevance, we are unaware of any studies on this large-scale regulatory RE process.   Methodology:We conducted an exploratory interview study of RIA in three large enterprises.We focused on how they conduct RIA, emphasizing cross-functional interactions, and using the EAA as an example.   Results: RIA, as a regulatory RE process, is conducted to address the needs of executive management and central functions.It involves coordination between different functions and levels of enterprise hierarchy.Enterprises use artifacts to support interpretation and communication of the results of RIA.Challenges to RIA are mainly related to the execution of such coordination and managing the knowledge involved.   Conclusion: RIA in large enterprises demands close coordination of multiple stakeholders and roles.Applying interpretation and compliance artifacts is one approach to support such coordination.However, there are no established practices for creating and managing such artifacts.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.07313v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-10</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Think-on-Process: Dynamic Process Generation for Collaborative Development of Multi-Agent System
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p><span class='px-1 mx-1 bg-yellow-200'>Software development is a collaborative endeavor that requires individuals from different departments to work together in order to collectively develop a high-quality software system. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.719</span></span>In this context, people have begun to explore a method that leverages multi-agent systems based on LLMs to carry out software development.<span class='px-1 mx-1 bg-yellow-200'>However, existing research tends to rigidly fix the software development process in a framework in code form, thus failing to dynamically adjust the software development process in real-time to meet the more flexible and variable software environment. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.634</span></span>In this paper, we propose a dynamic process generation framework, named ToP (Think-on-Process).The core idea of ToP is to leverage experiential knowledge (i.e., process models) to guide LLMs in generating software development processes (i.e., instances).These instances will guide multi-agent in software development and employ a compiler to provide feedback on the development outcomes.Subsequently, we utilize heuristic algorithms to filter the instances and apply process mining algorithms to derive process model.Finally, the process model will be converted into text, formatted as prompts, to enhance the ability of LLMs to generate other instances.Experiments demonstrate that our framework ToP significantly enhances the dynamic process generation capability of the GPT-3.5 and GPT-4 for five categories of software development tasks.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.06568v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-09-09</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                A Novel Idea Generation Tool using a Structured Conversational AI (CAI) System
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p><span class='px-1 mx-1 bg-yellow-200'>This paper presents a novel conversational AI-enabled active ideation interface as a creative idea-generation tool to assist novice designers in mitigating the initial latency and ideation bottlenecks that are commonly observed. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.626</span></span>It is a dynamic, interactive, and contextually responsive approach, actively involving a large language model (LLM) from the domain of natural language processing (NLP) in artificial intelligence (AI) to produce multiple statements of potential ideas for different design problems.Integrating such AI models with ideation creates what we refer to as an Active Ideation scenario, which helps foster continuous dialogue-based interaction, context-sensitive conversation, and prolific idea generation.A pilot study was conducted with thirty novice designers to generate ideas for given problems using traditional methods and the new CAI-based interface.The key parameters of fluency, novelty, and variety were used to compare the outcomes qualitatively by a panel of experts.The findings demonstrated the effectiveness of the proposed tool for generating prolific, diverse and novel ideas.The interface was enhanced by incorporating a prompt-engineered structured dialogue style for each ideation stage to make it uniform and more convenient for the designers.The resulting responses of such a structured CAI interface were found to be more succinct and aligned towards the subsequent design stage, namely conceptualization.The paper thus established the rich potential of using Generative AI (Gen-AI) for the early ill-structured phase of the creative product design process.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2409.05747v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-08-28</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Software Solutions for Newcomers' Onboarding in Software Projects: A Systematic Literature Review
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>[Context] Newcomers joining an unfamiliar software project face numerous barriers; therefore, effective onboarding is essential to help them engage with the team and develop the behaviors, attitudes, and skills needed to excel in their roles.However, onboarding can be a lengthy, costly, and error-prone process.Software solutions can help mitigate these barriers and streamline the process without overloading senior members.[Objective] This study aims to identify the state-of-the-art software solutions for onboarding newcomers.[Method] We conducted a systematic literature review (SLR) to answer six research questions.[Results] We analyzed 32 studies about software solutions for onboarding newcomers and yielded several key findings: (1) a range of strategies exists, with recommendation systems being the most prevalent; (2) most solutions are web-based; (3) solutions target a variety of onboarding aspects, with a focus on process; (4) many onboarding barriers remain unaddressed by existing solutions; (5) laboratory experiments are the most commonly used method for evaluating these solutions; and (6) diversity and inclusion aspects primarily address experience level.[Conclusion] We shed light on current technological support and identify research opportunities to develop more inclusive software solutions for onboarding.<span class='px-1 mx-1 bg-yellow-200'>These insights may also guide practitioners in refining existing platforms and onboarding programs to promote smoother integration of newcomers into software projects. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.625</span></span></p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2408.15989v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-08-26</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Trust, but Verify: Evaluating Developer Behavior in Mitigating Security Vulnerabilities in Open-Source Software Projects
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>This study investigates vulnerabilities in dependencies of sampled open-source software (OSS) projects, the relationship between these and overall project security, and how developers' behaviors and practices influence their mitigation.Through analysis of OSS projects, we have identified common issues in outdated or unmaintained dependencies, including pointer dereferences and array bounds violations, that pose significant security risks.<span class='px-1 mx-1 bg-yellow-200'>We have also examined developer responses to formal verifier reports, noting a tendency to dismiss potential issues as false positives, which can lead to overlooked vulnerabilities. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.645</span></span>Our results suggest that reducing the number of direct dependencies and prioritizing well-established libraries with strong security records are effective strategies for enhancing the software security landscape.Notably, four vulnerabilities were fixed as a result of this study, demonstrating the effectiveness of our mitigation strategies.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2408.14273v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td class="inline-block">
            <p class='font-bold text-black px-2 mx-1 text-xs w-24'>2024-08-21</p>
          </td>
          <td>
            <div x-data="{open: false}">
              <span @click="open = ! open" class="hover:underline cursor-pointer decoration-2 decoration-green-600 text-gray-800 text-sm">
                Leveraging Large Language Models for Enhancing the Understandability of Generated Unit Tests
              </span>
              <div x-show="open" x-collapse.duration.500ms class="text-sm text-gray-500 pt-2">
                <div class="text-center pt-2"></div>
                <p class="pt-2">
                  <p>Automated unit test generators, particularly search-based software testing tools like EvoSuite, are capable of generating tests with high coverage.Although these generators alleviate the burden of writing unit tests, they often pose challenges for software engineers in terms of understanding the generated tests.To address this, we introduce UTGen, which combines search-based software testing and large language models to enhance the understandability of automatically generated test cases.We achieve this enhancement through contextualizing test data, improving identifier naming, and adding descriptive comments.<span class='px-1 mx-1 bg-yellow-200'>Through a controlled experiment with 32 participants from both academia and industry, we investigate how the understandability of unit tests affects a software engineer's ability to perform bug-fixing tasks. <span style='font-size: 0.65rem;' class='text-purple-500 font-bold'>0.644</span></span>We selected bug-fixing to simulate a real-world scenario that emphasizes the importance of understandable test cases.We observe that participants working on assignments with UTGen test cases fix up to 33% more bugs and use up to 20% less time when compared to baseline test cases.From the post-test questionnaire, we gathered that participants found that enhanced test names, test data, and variable names improved their bug-fixing process.</p>
                </p>
              <p class="pb-2 pt-2 text-center">
                <a class="underline decoration-2 text-green-600 text-md pt-2" href='http://arxiv.org/abs/2408.11710v1' target="_blank">
                  link
                </a>
              </p>
            </div>
          </div>
        </td>
      </tr><tr>
          <td></td>
          <td>
            <h2 class="text-2xl tracking-tight pt-4 font-bold">Data Annotation Techniques</h2>
          </td>
        </tr></tbody>
  </table>
  <br><br>
</div>
</div>
<script>
  document.addEventListener("DOMContentLoaded", function() {
    renderMathInElement(document.body, {
      // customised options
      // • auto-render specific keys, e.g.:
      delimiters: [
      {left: '$$', right: '$$', display: true},
      {left: '$', right: '$', display: false},
      {left: '\\(', right: '\\)', display: false},
      {left: '\\[', right: '\\]', display: true}
      ],
      // • rendering keys, e.g.:
      throwOnError : false
    });
  });
</script>
</body>
</html>