The amount of data available to researchers and practitioners is growing exponentially, and nowhere is this more apparent than in network data: massive graphs such as web crawl graphs, social networks, and biological networks are readily available for analysis. This data is incredibly valuable to companies, computational sociologists, criminologists, security specialists, and computational biologists, who need to compute small but interesting features in order to analyze these networks and derive meaning from them. For these truly massive networks (which cannot be stored on a single machine), efficient algorithms are critical. However, existing software for their analysis is still in the early stages of development. To date, only simple techniques have been used to design algorithms across many machines, and an interesting open question is how to increase efficiency in this scenario. An important tool used to jumpstart these analyses is graph partitioning, which can help to even divide data (or computation) across many machines. However, little work has been done to investigate high-quality graph partitioning of networks that do not fit in a single machine.
Together, we will design and build state of the art software to partition truly massive networks—networks that cannot fit into memory on a single machine. We will first investigate standard techniques such as recursive bipartitioning, label propagation, and local search as well as lesser known techniques such as graph reductions. The software that we write together will be released as an open source package that you can include in your portfolio and show potential employers.
COSC 101, COSC 102; Intermediate proficiency in C/C++ or Java programming languages.
Familiarity with algorithms; preference for students who have taken COSC 302.