Log-Structured-Merge tree-based Key-Value Stores (LSM-KVSs) are important data storage building blocks in modern IT infrastructure. However, tuning their performance involves configuring over 100 parameters, a task typically done manually or with limited parameters in auto-tuning mechanisms. This paper explores and answers the following question: can we leverage LLM’s understanding of the system and LSM-KVS components for unrestricted parameter-pool tuning of LSM-KVS?LLMs are trained on readily available LSM-KVS source code, research papers, and open materials enabling the machines to have human-like understanding. We investigate integrating Large-Language Models (LLMs) into an automated tuning framework for LSM-KVS to enhance the tuning capability and interactivity. Our framework utilizes LLMs to recommend tailored configurations with calibrated prompts based on hardware, system, and workload information. Initial results demonstrate upto 3X throughput improvements and an upto 9X reduction in p99 latency across various hardware and workloads compared to the out-of-box configuration for the LSM-KVS.
SIGMOD
CaaS-LSM: Compaction-as-a-Service for LSM-based Key-Value Stores in Storage Disaggregated Infrastructure
Qiaolin Yu, Chang Guo, Jay Zhuang , and 3 more authors
Optimizing LSM-based Key-Value Stores (LSM-KVS) for disaggregated storage is essential to achieve better resource utilization, performance, and flexibility. Most of the existing studies focus on offloading the compaction to the storage nodes to mitigate the performance penalties caused by heavy network traffic between computing and storage. However, several critical issues are not addressed including the strong dependency between offloaded compaction and LSM-KVS, resource load-balancing, compaction scheduling, and complex transient errors.To address the aforementioned issues and limitations, in this paper, we propose CaaS-LSM, a novel disaggregated LSM-KVS with a new idea of Compaction-as-a-Service. CaaS-LSM brings three key contributions. First, CaaS-LSM decouples the compaction from LSM-KVS and achieves stateless execution to ensure high flexibility and avoid coordination overhead with LSM-KVS. Second, CaaS-LSM introduces a performance- and resource-optimized control plane to guarantee better performance and resource utilization via an adaptive run-time scheduling and management strategy. Third, CaaS-LSM addresses different levels of transient and execution errors via sophisticated error-handling logic. We implement the prototype of CaaS-LSM based on RocksDB and evaluate it with different LSM-based distributed databases (Kvrocks and Nebula). In the storage disaggregated setup, CaaS-LSM achieves up to 8X throughput improvement and reduces the P99 latency up to 98% compared with the conventional LSM-KVS, and up to 61% of improvement compared with state-of-the-art LSM-KVS optimized for disaggregated storage.
2023
ASU
Optimizing Consistency and Performance Trade-off in Distributed Log-Structured Merge-Tree-based Key-Value Stores
It is said that every adversity presents the opportunity to grow. The current pandemic is a lesson to all healthcare infrastructure stakeholders to look at existing setups with an open mind. This chapter’s proposed solution offers technology assistance to manage patient data effectively and extends the hospital data management system’s capability to predict the upcoming need for healthcare resources. Further, the authors intend to supplement the proposed solution with crowdsourcing to meet hospital demand and supply for unprecedented medical emergencies. The proposed approach would demonstrate its need in the current pandemic scenario and prepare the healthcare infrastructure with a more streamlined and cooperative approach than before.
Springer
Recognizing child unsafe apps through user reviews on the google play store
Ashwini Dalvi, Irfan Siddavatam, Viraj Thakkar , and 2 more authors
In Advanced Computing and Intelligent Technologies , May 2022
Google Play Store serves as a platform to host, download, and review android applications. Many researchers have explored the user review section and worked on approaches and solutions that would prove a more effective pipeline to enable developer feedback on application issues and praised features proving the section’s abundance of information. This work uses this same data to attempt a novel use case of determining child unsafe apps on Google Play Store. User reviews are collected using a crawler and categorized for selected keywords relating to child, media, and India. Since Google Play Store does not provide a definitive number of downloads, this work attempts to mitigate this challenge by instead calculating the user density for an application. The user density helps establish the engagement users have with an application and is calculated by the difference in the timestamps of the most and least recent reviews divided by the sum of total reviews and its upvotes for an application. 60,620 reviews from 1,600 applications were extracted to validate the proposed concept. This concept has proved effective in recognizing applications that present child unsafe content while also offering a novel concept of calculating user density.
2021
Elsevier
Explainability using decision trees and monte carlo simulations
Irfan Siddavatam, Ashwini Dalvi, Viraj Thakkar , and 3 more authors
The prominence of AI algorithms has reached new heights in todays world. Algorithms are implemented in each and every aspect of our life with the primary intention of improving it. However, the working of most of these AIs is not entirely understood, this causes the problem. Since the models are not understood, the researchers do not know how to manually improve on it, creating an artificial ceiling. Understanding these algorithms is the key to realize how these machines think and not only learn from them but also teach it to understand better. The primary idea put forward in this paper is to explain a black-box model using a mimicking simulation, rather than the usual calculative approaches. The primary idea put forward is to understand how an AI works using a decision tree that will be designed to mimic the AI. It proposes the use of a Decision Tree based approach along with randomization using Monte Carlo simulations for a more precise simulation of the black-box.
IEEE
Link Harvesting on the Dark Web
Ashwini Dalvi, Irfan Siddavatam, Viraj Thakkar , and 3 more authors
In 2021 IEEE Bombay Section Signature Conference (IBSSC) , Nov 2021
In this information age, web crawling on the internet is a prime source for data collection. And with the surface web already being dominated by giants like Google and Microsoft, much attention has been on the Dark Web. While research on crawling approaches is generally available, a considerable gap is present for URL extraction on the dark web. With most literature using the regular expressions methodology or built-in parsers, the problem with these methods is the higher number of false positives generated with the Dark Web, which makes the crawler less efficient. This paper proposes the dedicated parsers methodology for extracting URLs from the dark web, which when compared proves to be better than the regular expression methodology. Factors that make link harvesting on the Dark Web a challenge are discussed in the paper.