Performance Analysis and Optimization of High Performance Computing/Storage Systems

The exponential disparity between the speeds of CPUs and storage systems continues to widen. Data-intensive computations may require hundreds of disks per CPU to utilize modern processors. Even previously CPU-bound workloads are now becoming I/O-bound. Our efforts are focused on finding and fixing I/O performance problems on high performance computing systems by creating a suite of tools to benchmark, trace, profile, analyze, and visualize file and storage systems.

This project includes several storage deduplication efforts.

Journal Articles:

# Title (click for html version) Formats Published In Date Comments
1 Cluster and Single-Node Analysis of Long-Term Deduplication Patterns PDF BibTeX ACM Transactions on Storage (TOS) May 2018  
2 vNFS: Maximizing NFS Performance with Compounds and Vectorized I/O PDF BibTeX ACM Transactions on Storage (TOS) Sep 2017  
3 Is NFSv4.1 Ready for Prime Time? PDF BibTeX ;login: The USENIX Magazine Jun 2015  
4 Don't Thrash: How to Cache Your Hash on Flash PDF BibTeX The Proceedings of the VLDB Endowment (PVLDB) Aug 2012  

Conference and Workshop Papers:

# Title (click for html version) Formats Published In Date Comments
1 vNFS: Maximizing NFS Performance with Compounds and Vectorized I/O PDF BibTeX 15th USENIX Conference on File and Storage Technologies (FAST 2017) Feb 2017 Nominated for best paper award
2 A Long-Term User-Centric Analysis of Deduplication Patterns PDF BibTeX 32nd IEEE Conference on Mass Storage Systems and Technologies (MSST 2016) May 2016  
3 Using Hints to Improve Inline Block-Layer Deduplication PDF BibTeX 14th USENIX Conference on File and Storage Technologies (FAST 2016) Feb 2016  
4 Newer Is Sometimes Better: An Evaluation of NFSv4.1 PDF BibTeX International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS 2015) Jun 2015  
5 Dmdedup: Device-mapper Deduplication Target PDF BibTeX 2014 Ottawa Linux Symposium Jul 2014  
6 Don't Thrash: How to Cache Your Hash on Flash PDF BibTeX 38th International Conference on Very Large Data Bases (VLDB '12) Aug 2012  
7 Generating Realistic Datasets for Deduplication Analysis PS PDF BibTeX 2012 USENIX Annual Technical Conference (ATC 2012) Jun 2012  
8 Extracting Flexible, Replayable Models from Large Block Traces PS PDF BibTeX Tenth USENIX Conference on File and Storage Technologies (FAST 2012) Feb 2012  
9 Don't Thrash: How to Cache your Hash on Flash PS PDF BibTeX 3rd USENIX Workshop in Hot Topics in Storage and File Systems (HotStorage 2011) Jun 2011  
10 Benchmarking File System Benchmarking: It *IS* Rocket Science PS PDF BibTeX 13th USENIX Workshop in Hot Topics in Operating Systems (HotOS XIII) May 2011  
11 DARC: Dynamic Analysis of Root Causes of Latency Distributions PS PDF BibTeX International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS 2008) Jun 2008 Source code and benchmark information.

Technical Reports:

# Title (click for html version) Formats Published In Date Comments
1 Design and Implementation of an Open-Source Deduplication Platform for Research PDF BibTeX Stony Brook U. CS TechReport FSL-15-03 Dec 2015 Ph.D. Research Proficiency Exam (RPE)
2 Linux NFSv4.1 Performance Under a Microscope PDF BibTeX Stony Brook U. CS TechReport FSL-14-02 Aug 2014  
3 A Context Aware Block Layer: The Case for Block Layer Deduplication PDF BibTeX Stony Brook U. CS TechReport FSL-12-04 May 2012 M.S. Thesis

Current Students:

# Name (click for home page) Program Member Since
1 Umit Ibrahim Akgun PhD Sep 2017
2 Tyler Estro PhD May 2018
3 Wei Su PhD May 2019

Past Students:

# Name (click for home page) Program Period Current Location
1 Zhen Cao PhD May 2014 - Jan 2019 Software Engineer, Google, Network Infrastructure (Sunnyvale, CA)
2 Ming Chen PhD May 2012 - Apr 2017 Software Engineer, Google Cloud Platform, Google Cloud Platform (New York, New York)
3 Nikolai Joukov PhD Jan 2004 - Dec 2006 Research Staff Member, Storage and Data Services Research group, IBM T. J. Watson Research Center (Hawthorne, NY)
4 Sonam Mandal PhD Jun 2013 - Dec 2016  
5 Vasily Tarasov PhD Jan 2008 - Nov 2013 Research Staff Member, Scale-out Storage Software, IBM Research - Almaden (San Jose, USA)
6 Avishay Traeger PhD Sep 2003 - Aug 2008 R&D, Stratoscale (Herzeliya, Israel)
7 Sun (Jason) Zhen PhD Nov 2014 - Dec 2016 PhD candidate National University of Defense Technology
8 Ankit Aggarwal MS Jan 2018 - Dec 2018 Software Engineer II - Self Driving, Uber, Uber (San Francisco, CA)
9 Aashray Arora MS Sep 2014 - Dec 2015 Member of Technical Staff, Core Data Path, Nutanix (San Jose, CA)
10 Akshay Aurora MS Jan 2019 - Dec 2019  
11 Geetika Babu Bangera MS Jan 2017 - Dec 2017 Member Technical Staff, Software, NetApp, Inc. (Sunnyvale, CA)
12 Akhilesh Chaganti MS Jan 2014 - May 2015 Member of Technical Staff, Disaster Recovery Group Nutanix (Seattle, WA)
13 Arvind Chaudhary MS Sep 2014 - Dec 2015 Member of Technical Staff, CNA group, VMware Inc. (Palo Alto, CA)
14 Abhishek Gupta MS May 2014 - Dec 2015 Member of Technical Staff, vSAN, VMware Inc. (Palo Alto, CA)
15 Pragesh Jagnani MS Jan 2019 - Dec 2019 Software Development Engineer, Amazon Selection and Catalog Systems, Amazon (Seattle, WA)
16 Deepak Jain MS Sep 2012 - Dec 2013 Member of Technical Staff, Project FVP - Engineering, Pernixdata Inc. (San Jose, USA)
17 Mehul Jain MS Jan 2019 - Dec 2019  
18 Tushar Jain MS Jan 2017 - Dec 2017 Software Development Engineer, Amazon Web Services, Amazon Web Services (Seattle, WA)
19 Farhaan Jalia MS Jan 2017 - Dec 2017 Member of Technical Staff II, Cloud Native Group, VMware Inc. (Bellevue, WA)
20 Aneesh Joshi MS Aug 2019 - May 2020 Member of Technical Staff, Core Data Path, Nutanix, Inc. (San Jose, CA)
21 Shobhit Khandelwal MS Jan 2019 - Dec 2019  
22 Koundinya Santhosh Kumar MS Sep 2010 - Dec 2011 Senior Development Software Engineer, Advanced Software Development and Performance, SanDisk (Milpitas, CA)
23 Noopur Anil Maheshwari MS Aug 2017 - Dec 2018 Software Engineer, HPE (Nimble Storage) (Sunnyvale, CA)
24 Amar Mudrankit MS Jan 2011 - May 2012 Software Engineer, Advanced Development Group at Fusion-IO (San Jose, CA)
25 Vithiya Muthukumar MS Jan 2016 - Dec 2016 Software Engineer, Cisco Systems (San Jose, CA)
26 Ritika Nevatia MS Sep 2018 - Dec 2019 Software Engineer, iCloud, Apple Inc. (Seattle, WA)
27 Dongju Ok MS Sep 2014 - May 2016 Software Engineer, Application Team, Commvault Systems Inc. (Tinton Falls, NJ)
28 Karthikeyani Palanisami MS May 2012 - Jun 2013 Member of Technical Staff, Project MARS - Engineering, NetApp Inc (Sunnyvale, USA)
29 Nidhi Panpalia MS Jan 2017 - Dec 2017 Development Engineer, AWS Lambda, Amazon (Seattle, WA)
30 Dhanashri Patil MS Jan 2018 - Dec 2018 Senior Software Engineer, Dell Technologies (Isilon) (Seattle, WA)
31 Deepika Peringanji MS Jan 2016 - Dec 2016 SDE 2, VSAN team, VMware Inc. (Palo Alto, CA)
32 Dhivahar Perumal MS Sep 2018 - May 2019 Software Engineer, Data Services Team (CASL), Nimble Storage - HPE (San Jose, USA)
33 Vinothkumar Raja MS Sep 2016 - Dec 2017 Software Engineer, Pure Storage Inc. (Mountain View, CA)
34 Venkatakrishnan Rajagopalan MS Jan 2016 - Dec 2016 Member of the Technical Staff, VMware Inc. (Palo Alto, CA)
35 Arun Ramachandran MS Aug 2016 - Dec 2017 Member of Technical Staff, Core Infrastructure, Nutanix (San Jose, CA)
36 Hari Prasath Raman MS Jan 2016 - Dec 2016 Software Engineer, Bloomberg (New York, NY)
37 Vineeth Ramesh MS Jan 2018 - Dec 2018 Software Engineer, Dialpad, Dialpad (San Francisco, CA)
38 Rahul Rane MS Jan 2018 - Dec 2018 Software Engineer, HPE (Nimble Storage) (Sunnyvale, CA)
39 Prateek Roy MS Jan 2018 - Dec 2018 Software Engineer, NVIDIA, NVIDIA (Santa Clara, CA)
40 Saish Sali MS Jan 2018 - Dec 2018 Software Development Engineer II, Amazon, Amazon (Sunnyvale, CA)
41 Kunal Shah MS Jan 2017 - Dec 2017 Member of Technical Staff, Cloud Operating Systems Business Unit - Lightwave, VMware Inc. (Bellevue, WA)
42 Krapi Ravindra Shah MS Jan 2019 - Dec 2019 Assistant VP, Data Platforms, Tradeweb Markets LLC. (Jersey City, NJ)
43 Rushabh Shah MS Jan 2017 - Dec 2017 Software Engineer, Facebook Inc. (Menlo Park, CA)
44 Sagar Shah MS Jan 2017 - Dec 2017 Associate Senior Engineer, Petuum OS, Petuum (Pittsburgh, PA)
45 Mukul Sharma MS Aug 2016 - Dec 2017 Member of the Technical Staff, Core Data Path, Nutanix (San Jose, CA)
46 Varun Shastry MS Sep 2014 - Dec 2015 Member of Technical Staff, Disaster Recovery Team, Nutanix Inc. (San Jose, CA)
47 Siddesh Shinde MS Jan 2018 - Dec 2018 Member of Technical Staff, Core Data Path, Cohesity Inc (San Jose, CA)
48 Gyumin Sim MS Jan 2010 - Dec 2010 Software Engineer, Data Center Power Team Google (Mountain View, CA)
49 Swaminathan Sivaraman MS Jan 2017 - Dec 2017 Software Development Engineer, Google Assistant, Google (Mountain View, CA)
50 Nilesh Somani MS May 2018 - Dec 2019  
51 Jatin Sood MS Jan 2019 - Dec 2019  
52 Kumar Sourav MS May 2014 - Dec 2015 Member of Technical Staff 2, UPIT (Next gen. snapshot technology) group, VMware Inc. (Palo Alto, CA)
53 Aayush Sureka MS Sep 2018 - Dec 2019  
54 Ivan Deras Tabora MS Jan 2007 - Dec 2007 Teacher, Computer Science, Universidad Tecnologica Centroamericana (San Pedro Sula, Cortes, Honduras)
55 Sachin Tiwari MS Aug 2016 - Dec 2017 Member of Technical Staff, Cloud Platform Infrastructure VMware (Palo Alto, CA)
56 Vivek Tiwari MS Sep 2015 - Dec 2015 Software Engineer, LinkedIn (Sunnyvale, CA)
57 Sagar Trehan MS Sep 2012 - Dec 2013 Member of Technical Staff, CASL Performance Group - Engineering, Nimble Storage Inc (San Jose, USA)
58 Bharath Kumar Reddy Vangoor MS Aug 2015 - Dec 2016 Software Engineer, Azure Storage, Microsoft (Pittsburgh, PA)
59 Amrith Arunachalam BS May 2018 - Dec 2018  
60 Abraham Spitalny BS Jul 2019 - Dec 2019  
61 Yinuo Zhang BS Aug 2019 - May 2020  
62 Henry Nelson HS Sep 2015 - Aug 2017 CS undergraduate at CMU

Sponsors:

# Sponsor Amount Period Type Title (click for award abstract)
1 NSF Computer and Network Systems (CNS) Core $823,142 2019-2023 PI CNS Core: III: Medium: Collaborative Research: Optimizing and Understanding Large Parameter Spaces in Storage Systems
2 NSF Formal Methods in the Field (FMitF) $748,300 2019-2022 PI FMitF: Track I: NLP-Assisted Formal Verification of the NFS Distributed File System Protocol
3 NSF CISE Research Infrastructure (CRI) $129,867 2017-2020 PI Collaborative Research: CI-SUSTAIN: National File System Trace Repository
4 Microsoft Corporation $20,000 2016-2017 Sole-PI Microsoft Azure Cloud Credits
5 NSF Core Techniques and Technologies for Advancing Big Data Science & Engineering (BIGDATA) $444,267 2013-2016 Lead-PI BIGDATA: Small: DCM: Collaborative Research: An efficient, versatile, scalable, and portable storage system for scientific data containers
6 NetApp Advanced technlogy Group $40,000 2011 Sole PI Dedup Workload Modeling, Synthetic Datasets, and Scalable Benchmarking
7 NSF HECURA $760,253 2006-2009 Lead-PI File System Tracing, Replaying, Profiling, and Analysis on HEC Systems


(Last updated: Thu Sep 3 10:25:54 EDT 2020)