Viraj Thakkar | publications

2024

HotStorage
Can Modern LLMs Tune and Configure LSM-based Key-Value Stores?

Viraj Thakkar, Madhumitha Sukumar, Jiaxin Dai , and 2 more authors

In Proceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems , Santa Clara, CA, USA, Jun 2024

Abs Bib

Log-Structured-Merge tree-based Key-Value Stores (LSM-KVSs) are important data storage building blocks in modern IT infrastructure. However, tuning their performance involves configuring over 100 parameters, a task typically done manually or with limited parameters in auto-tuning mechanisms. This paper explores and answers the following question: can we leverage LLM’s understanding of the system and LSM-KVS components for unrestricted parameter-pool tuning of LSM-KVS?LLMs are trained on readily available LSM-KVS source code, research papers, and open materials enabling the machines to have human-like understanding. We investigate integrating Large-Language Models (LLMs) into an automated tuning framework for LSM-KVS to enhance the tuning capability and interactivity. Our framework utilizes LLMs to recommend tailored configurations with calibrated prompts based on hardware, system, and workload information. Initial results demonstrate upto 3X throughput improvements and an upto 9X reduction in p99 latency across various hardware and workloads compared to the out-of-box configuration for the LSM-KVS.
@inproceedings{10.1145/3655038.3665954, author = {Thakkar, Viraj and Sukumar, Madhumitha and Dai, Jiaxin and Singh, Kaushiki and Cao, Zhichao}, title = {Can Modern LLMs Tune and Configure LSM-based Key-Value Stores?}, year = {2024}, month = jun, isbn = {9798400706301}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3655038.3665954}, doi = {10.1145/3655038.3665954}, booktitle = {Proceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems}, pages = {116–123}, numpages = {8}, keywords = {Automatic Tuning and Configuration, LSM-KVS, Large Language Models}, location = {Santa Clara, CA, USA}, series = {HotStorage '24}, }
SIGMOD
CaaS-LSM: Compaction-as-a-Service for LSM-based Key-Value Stores in Storage Disaggregated Infrastructure

Qiaolin Yu, Chang Guo, Jay Zhuang , and 3 more authors

Proc. ACM Manag. Data, May 2024

Abs Bib

Optimizing LSM-based Key-Value Stores (LSM-KVS) for disaggregated storage is essential to achieve better resource utilization, performance, and flexibility. Most of the existing studies focus on offloading the compaction to the storage nodes to mitigate the performance penalties caused by heavy network traffic between computing and storage. However, several critical issues are not addressed including the strong dependency between offloaded compaction and LSM-KVS, resource load-balancing, compaction scheduling, and complex transient errors.To address the aforementioned issues and limitations, in this paper, we propose CaaS-LSM, a novel disaggregated LSM-KVS with a new idea of Compaction-as-a-Service. CaaS-LSM brings three key contributions. First, CaaS-LSM decouples the compaction from LSM-KVS and achieves stateless execution to ensure high flexibility and avoid coordination overhead with LSM-KVS. Second, CaaS-LSM introduces a performance- and resource-optimized control plane to guarantee better performance and resource utilization via an adaptive run-time scheduling and management strategy. Third, CaaS-LSM addresses different levels of transient and execution errors via sophisticated error-handling logic. We implement the prototype of CaaS-LSM based on RocksDB and evaluate it with different LSM-based distributed databases (Kvrocks and Nebula). In the storage disaggregated setup, CaaS-LSM achieves up to 8X throughput improvement and reduces the P99 latency up to 98% compared with the conventional LSM-KVS, and up to 61% of improvement compared with state-of-the-art LSM-KVS optimized for disaggregated storage.
@article{10.1145/3654927, author = {Yu, Qiaolin and Guo, Chang and Zhuang, Jay and Thakkar, Viraj and Wang, Jianguo and Cao, Zhichao}, title = {CaaS-LSM: Compaction-as-a-Service for LSM-based Key-Value Stores in Storage Disaggregated Infrastructure}, year = {2024}, issue_date = {June 2024}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, volume = {2}, number = {3}, url = {https://doi.org/10.1145/3654927}, doi = {10.1145/3654927}, journal = {Proc. ACM Manag. Data}, month = may, articleno = {124}, numpages = {28}, keywords = {LSM-tree, disaggregated infrastructure, key-value store}, }

2023

ASU

Optimizing Consistency and Performance Trade-off in Distributed Log-Structured Merge-Tree-based Key-Value Stores

Viraj Deven Thakkar

May 2023

@techreport{thakkar2023optimizing,
  title = {Optimizing Consistency and Performance Trade-off in Distributed Log-Structured Merge-Tree-based Key-Value Stores},
  author = {Thakkar, Viraj Deven},
  year = {2023},
  institution = {Arizona State University},
}

Wiley

FLASH: Web-Form’s Logical Analysis & Session Handling Automatic Form Classification and Filling on Surface and Dark Web

Ashwini Dalvi, Viraj Thakkar, Smit Moradiya , and 4 more authors

Robotic Process Automation, May 2023

Bib

@article{dalvi2023flash,
  title = {FLASH: Web-Form's Logical Analysis \& Session Handling Automatic Form Classification and Filling on Surface and Dark Web},
  author = {Dalvi, Ashwini and Thakkar, Viraj and Moradiya, Smit and Vedpathak, Aditya and Siddavatam, Irfan and Kazi, Fark and Bhirud, SG},
  journal = {Robotic Process Automation},
  pages = {61--100},
  year = {2023},
  publisher = {Wiley Online Library},
}

2022

IGI
ML-enabled informed intervention for crowdsourcing-based optimization of medical resources

Irfan Siddavatam, Ashwini Dalvi, Abhishek Patel , and 3 more authors

May 2022

Abs Bib

It is said that every adversity presents the opportunity to grow. The current pandemic is a lesson to all healthcare infrastructure stakeholders to look at existing setups with an open mind. This chapter’s proposed solution offers technology assistance to manage patient data effectively and extends the hospital data management system’s capability to predict the upcoming need for healthcare resources. Further, the authors intend to supplement the proposed solution with crowdsourcing to meet hospital demand and supply for unprecedented medical emergencies. The proposed approach would demonstrate its need in the current pandemic scenario and prepare the healthcare infrastructure with a more streamlined and cooperative approach than before.
@incollection{siddavatam_ml-enabled_2022, type = {chapter}, title = {ML-enabled informed intervention for crowdsourcing-based optimization of medical resources}, copyright = {Access limited to members}, url = {https://www.igi-global.com/chapter/ml-enabled-informed-intervention-for-crowdsourcing-based-optimization-of-medical-resources/www.igi-global.com/chapter/ml-enabled-informed-intervention-for-crowdsourcing-based-optimization-of-medical-resources/288821}, language = {en}, urldate = {2021-11-17}, journal = {Handbook of Research on Applied Intelligence for Health and Clinical Informatics}, author = {Siddavatam, Irfan and Dalvi, Ashwini and Patel, Abhishek and Panchal, Aditya and Vedpathak, Aditya S. and Thakkar, Viraj}, year = {2022}, doi = {10.4018/978-1-7998-7709-7.ch017} }
Springer
Recognizing child unsafe apps through user reviews on the google play store

Ashwini Dalvi, Irfan Siddavatam, Viraj Thakkar , and 2 more authors

In Advanced Computing and Intelligent Technologies , May 2022

Abs Bib

Google Play Store serves as a platform to host, download, and review android applications. Many researchers have explored the user review section and worked on approaches and solutions that would prove a more effective pipeline to enable developer feedback on application issues and praised features proving the section’s abundance of information. This work uses this same data to attempt a novel use case of determining child unsafe apps on Google Play Store. User reviews are collected using a crawler and categorized for selected keywords relating to child, media, and India. Since Google Play Store does not provide a definitive number of downloads, this work attempts to mitigate this challenge by instead calculating the user density for an application. The user density helps establish the engagement users have with an application and is calculated by the difference in the timestamps of the most and least recent reviews divided by the sum of total reviews and its upvotes for an application. 60,620 reviews from 1,600 applications were extracted to validate the proposed concept. This concept has proved effective in recognizing applications that present child unsafe content while also offering a novel concept of calculating user density.
@inproceedings{dalvi_recognizing_2022, address = {Singapore}, series = {Lecture {Notes} in {Networks} and {Systems}}, title = {Recognizing child unsafe apps through user reviews on the google play store}, isbn = {9789811621642}, doi = {10.1007/978-981-16-2164-2_9}, language = {en}, booktitle = {Advanced {Computing} and {Intelligent} {Technologies}}, publisher = {Springer}, author = {Dalvi, Ashwini and Siddavatam, Irfan and Thakkar, Viraj and Vedpathak, Aditya and Patel, Abhishek}, editor = {Bianchini, Monica and Piuri, Vincenzo and Das, Sanjoy and Shaw, Rabindra Nath}, year = {2022}, keywords = {User review , Google play store , Content classification }, pages = {111--120} }

2021

Elsevier
Explainability using decision trees and monte carlo simulations

Irfan Siddavatam, Ashwini Dalvi, Viraj Thakkar , and 3 more authors

May 2021

Abs Bib

The prominence of AI algorithms has reached new heights in todays world. Algorithms are implemented in each and every aspect of our life with the primary intention of improving it. However, the working of most of these AIs is not entirely understood, this causes the problem. Since the models are not understood, the researchers do not know how to manually improve on it, creating an artificial ceiling. Understanding these algorithms is the key to realize how these machines think and not only learn from them but also teach it to understand better. The primary idea put forward in this paper is to explain a black-box model using a mimicking simulation, rather than the usual calculative approaches. The primary idea put forward is to understand how an AI works using a decision tree that will be designed to mimic the AI. It proposes the use of a Decision Tree based approach along with randomization using Monte Carlo simulations for a more precise simulation of the black-box.
@techreport{siddavatam_explainability_2021, address = {Rochester, NY}, type = {{SSRN} {Scholarly} {Paper}}, title = {Explainability using decision trees and monte carlo simulations}, url = {https://papers.ssrn.com/abstract=3868707}, language = {en}, number = {ID 3868707}, urldate = {2021-11-17}, institution = {Social Science Research Network}, author = {Siddavatam, Irfan and Dalvi, Ashwini and Thakkar, Viraj and Vedpathak, Aditya and Moradiya, Smit and Jain, Apoorva}, month = may, year = {2021}, keywords = {Artificial Intelligence, Decision Tree, XAI, Monte Carlo Simulation} }
IEEE
Link Harvesting on the Dark Web

Ashwini Dalvi, Irfan Siddavatam, Viraj Thakkar , and 3 more authors

In 2021 IEEE Bombay Section Signature Conference (IBSSC) , Nov 2021

Abs Bib

In this information age, web crawling on the internet is a prime source for data collection. And with the surface web already being dominated by giants like Google and Microsoft, much attention has been on the Dark Web. While research on crawling approaches is generally available, a considerable gap is present for URL extraction on the dark web. With most literature using the regular expressions methodology or built-in parsers, the problem with these methods is the higher number of false positives generated with the Dark Web, which makes the crawler less efficient. This paper proposes the dedicated parsers methodology for extracting URLs from the dark web, which when compared proves to be better than the regular expression methodology. Factors that make link harvesting on the Dark Web a challenge are discussed in the paper.
@inproceedings{dalvi_link_2021, title = {Link {Harvesting} on the {Dark} {Web}}, doi = {10.1109/IBSSC53889.2021.9673428}, booktitle = {2021 {IEEE} {Bombay} {Section} {Signature} {Conference} ({IBSSC})}, author = {Dalvi, Ashwini and Siddavatam, Irfan and Thakkar, Viraj and Jain, Apoorva and Kazi, Faruk and Bhirud, Sunil}, month = nov, year = {2021}, keywords = {Uniform resource locators, Text recognition, IEEE Sections, Crawlers, Web pages, Data collection, Information age, Hyperlink Extraction, Dark Web, Web Scraping, Link Harvesting}, pages = {1--5} }