DIMACS is very proud for the achievements of the 2019 DIMACS REU students during their summer research experience. This page includes the abstracts of the final papers that the students submitted to the program, summarizing their summer research.
Abstract: Traditionally coding has been associated with error correction of data that has been transmitted across a noisy channel. Recently coding has been applied to increased fault tolerance for distributed storage systems. Data storage is extremely important to companies such as Amazon and Google who wish to deliver content efficiently and in a timely manner.However, an aspect that is often neglected is understanding how many users that the system can support. This is the goal of the service capacity problem. The problem has been studied in a continuous setting, however, our goal is to study some of the connections of the service capacity problem to problems in graph theory and combinatorial optimization. Overall these connections may be able to give us insight into a codes' load-balancing properties in distributed storage systems.
Abstract: Ephemeral messaging is messaging to a public phone number or email account, where all messages are publicly displayed on a website but disappear after a short period of time. We have collected and analyzed over 200,000 text and email messages to explore the uses of ephemeral messages and the security and privacy issues that they present. Unlike private and permanent messages, very few ephemeral messages are part of personal conversations, and most of them contain codes to verify accounts. From these messages, we find that many companies may have a bug or bias in their random number generator, leading to their reuse of verification codes. We also find that many messages contain very specific and personal information that strangers may use to conduct social engineering or access personal, private accounts.
Abstract: Graphs are everywhere and are growing increasingly compact with data. Understanding the data found within requires a summarization process to dwindle the information down to a human digestible amount. When summarizing text, translating how humans build connections and realize the semantic meaning of phrases to a computer is far from trivial. Some methods involve computing scores for each word based on how often they occur in a given text and computing word vectors to determine similarity between words. However, none of these processes by themselves work well to accurately summarize text. The result of my process shows that combining techniques can produce promising summarized representations of information. These results demonstrate the potential that finding the perfect combination of techniques has. I anticipate my process to be a starting point for more sophisticated combinations of methods to develop.
Abstract: During pancreatic tumor progression, tumor cells typically disseminate from the primary site to other organs including the liver. These disseminated tumor cells may remain dormant for months, years, or decades before entering metastasis. Single cell research on dormant tumor cells and tumor cells can uncover how gene expression and copy numbers vary within these cell populations. While comparing single cells for both pancreatic tumor and dormant liver cells, two distinct clusters/cell types were distinguished. One cluster consisted of tumor cells, and some dormant tumor cells mixed in. These dormant tumor cells are most likely tumor-like cells. The other cluster identified only consisted of dormant tumor cells which means it may be a group of normal cells or a subpopulation of tumor cells. In addition a copy number analysis was ran for each sample, and showed that in comparison to a reference genome, both the tumor and dormant cells did not have many differences in copy numbers. These results demonstrate that the tumor and dormant tumor cells are very similar in gene expression, and additional analysis is needed on the small changes in gene expression that account for their differences. Furthermore, if specific genes expressed are responsible for the dormancy of a tumor, anti-cancer drugs can be developed that regulate the gene expression of a tumor cell to look like that of a dormant cell.
Abstract: Ribosomes are present in all cells, and are the site of protein synthesis. These molecules are therefore crucial to all biological functions in which proteins play a role. The structures of ribosomes from many organisms have been solved to a high resolution, giving a near complete description of their complex geometry. However, the functions of many regions of the ribosome remain unknown, as does the relationship between structure and function of certain regions. We study the geometry of a motif found near the point at which peptides exit the ribosome. In particular, we study the writhing number of this motif, and we show that this region of the ribosome differs between prokaryotes and eukaryotes, as well as between different species.
Abstract: Sea level change is an important issue that will impact millions of people living on United States coastlines by the end of this century (Sweet et al, 2017). This project seeks to develop a model for sea level variability that can make more appropriate predictions with respect to the different components. Using data collected from tide gauge stations located along the United States coastlines, we created a model that predicts monthly local sea level variability. This was accomplished using Gaussian Process Modeling, a machine learning technique that allows us to capture the different components of sea level change. These components include the trend, seasonal cycle, and interannual variability. Using this method of modeling also allows us to create more accurate predictions, which leads to a more compelling visualization of sea level change than simply using linear regression. Analysis with our model allows for comparisons between different aspects of sea level variability among the different stations.
Abstract: Digital forensics is a branch of forensic science involving the recovery and investigation of data from digital devices ("Law Technology" 2018). The increase in cyberattacks and the wide-spread use of technology across the nation has made digital forensics an important component of the criminal justice system. To help avoid challenges when recovering data, digital forensics professionals should be properly trained and receive relevant certifications. The purpose of this project was to analyze digital forensics training and certification requirements for the Department of Homeland Security investigative units and State and Local Law Enforcement, work with the Federal Law Enforcement Training Center (FLETC) to identify opportunities and gaps in digital forensics training, and recommend digital forensics training and certification pathways to standardize training and certification across all of Homeland Security. Two analyses were completed regarding the cybersecurity, digital forensics, cyber forensics, and computer forensics fields to address this problem. The first was a labor analysis focusing on the number of job openings, job titles, certifications, specialized skills, and National Institute of Standards and Technology (NIST) skills. The second was a course training analysis, which includes a comparison of the cost, proficiency level, duration, investigation step applicability, forensic category, and delivery method of each course. This is an ongoing project, but perhaps the discoveries from the analyses will contribute to the Criminal Investigations Network Analysis (CINA) project team's research regarding the certification and training requirements for the Department of Homeland Security and State and Local Law enforcement for the Federal Law Enforcement Training Center (FLETC).
Abstract: Our work focused on a variant of the circuit minimization problem (MCSP), denoted MKTP, which studies resource-bounded Kolmogorov complexity in place of circuit size. These problems have gained attention as promising candidates for NP-intermediate problems. We show that MKTP and Graph Isomorphism (GI) are hard for the class DET via nonuniform projections. Previous results have proven hardness under logspace reductions and nonuniform NC0 reductions, our paper strengthens these results.
Abstract: We studied a new method for simulating T(n)-time-bounded Turing machines by unbounded circuits with low depth and size polynomial in T. Our construction offers lower depth circuits than the ones by Ryan Williams in the case that the total number of reversals of every tape head is less than O(T=logT). This continues a research direction of Borodin to study the connections between simultaneous Turing machine time and space complexity and simultaneous circuit size and depth complexity, which is heavily connected to the research of Hopcroft, Paul, and Valiant on time and space. Finally, reducing the depth of circuits for Turing machines has applications to parallelizing sequential computations, as discussed in Ryan Williams' aforementioned paper.
Abstract: We studied the Glauber Dynamics for the ferromagnetic Potts Model. We attempt to prove that the initial coloring of all blue has highest probability of ending all blue for any fixed number of updates. For two colors, a solution is provided. For three colors, we prove the result for a restricted class of graphs, and disprove the existence of a monotone coupling for many other graphs.
Abstract: Phylogenetic trees have been used in biology to graphically represent hierarchical relationships between genes. Many methods used to construct these phylogenetic trees often produce trees that are too large for easy visualization, or statistical analysis. Hence, reducing the dimensionality of large phylogenetic trees is a key to better understanding large data sets. We study what methods are best suited for tree dimensionality reduction and how we can use those methods to reconstruct trees that suit our data set of influenza from 1993-2017. Through analyzing the topology of reduced trees, we concluded that mutations rate and reassortment reduces the effectiveness of vaccines, resulting in individuals being unprotected against the new strains in circulation. Thus, as vaccine effectiveness increases, branched evolution in influenza decreases. Most importantly, the methods we used are beneficial in visualizing and analyzing any type of clonal genomic data.
Abstract: Our work focuses on building preparedness and resilience within vulnerable communities, specifically religious communities and houses of worship, against violent extremism. In today's society, the threat of mass casualty attacks and targeted violence is a global issue, with a notable increase in anti-Islamic and anti-Semitic attacks, as well as an increase in attacks by white nationalists. Many houses of worship have been targets of these attacks with numerous lives lost and many others left physically and emotionally impaired. With anti-Semitic and anti-Islamic incidents occurring globally, it is important for houses of worship to have the tools and guidance to prepare for and prevent further damage; however, there is no widely available, comprehensive resource of best practices or methods for religious communities. The R.E.S.I.L.I.E.N.C.E. model was created as a basis for our study. We researched case studies that highlighted best practices that can educate vulnerable communities about resources and methods to better protect themselves. We have found many strategies to do so, such as building connections between the public and enlisted guardians, sharing information through constant communications, implementing the best practices and plans for a situation, providing sufficient resources for said plans, and neutralizing negative mindsets through encouragement and empowerment. Previously, these strategies were not compiled into a single resource that would be available to the public and vulnerable communities, but with this research, that can change. There are many roles and responsibilities community members can assume to build readiness. While analyzing these roles and responsibilities, we also found many exemplary ways to utilize the aforementioned strategies and build resiliency against any form of violence within any vulnerable community.
Abstract: During the 3D printing process, there is a risk of defects such as pores being formed in the part body. These defects weaken the part, so there is a need for methods to predict the presence of these pores quickly during the printing process to take rapid corrective action. We have developed a deep learning model for this purpose, which uses thermal images of the melt pool and part body to generate a prediction. Two separate convolutional neural network classifiers take in input from nearby time steps and produce two outputs, which are then synthesized via an adaptive decision-level fusion process. The combined program offers an initial accuracy between 90% and 93% with minimal computational expense, allowing it to be used as a heuristic to determine problem cases for further observation via interpolative boundary-finding methods.
Abstract: The use of technology has spread in many industries, including the law enforcement sector. An example of how technology applies in The Department of Homeland Security (DHS) and local law enforcement services is the use of digital forensics to collect and present evidence in court proceedings. The such use of technology has improved the storage and accessibility of critical data in law enforcement sector. However, digital forensic also have some downsides, such as changing laws and technology. As a result, homeland security and law enforcement personnel require continuous training and issuance of certification to enhance their skills in the collection of digital evidence and protecting the public.
Abstract: We consider the problem of maintaining a maximum weight matching in the online setting with small recourse. The servers are known in advance, and clients come in one at a time with all of their weighted edges. We consider doing this both exactly and approximately. In the exact case, we show an upper bound that differs from the lower bound by only logarithmic factors. The upper bound is proved by a reduction to the problem in the unweighted setting. We also address solving the problem approximately and show that we can use O(1) amortized recourse to maintain a 2 + ε approximate solution.
Abstract: The field of neuroscience has long been challenged in quantifying the nature of the neurological disorder of schizophrenia. With the recent introduction of computer modeling, researchers have been able to more formally examine the mechanisms which have been proposed to explain the symptoms and behaviors found in those with the disorder. A variety of biophysical models, often centered around certain neurotransmitter systems such as the dopamine or glutamate systems, have been produced and studied over the past few decades. While these models do often align with certain symptoms found in individuals with schizophrenia, their predictions often contradict certain other observed symptoms or behaviors. This offers a challenge to researchers to either find a different holistic representation of the entire disorder or focus on a particular behavior found frequently in the disorder and explain it in a way that corroborates the relevant preexisting scientific literature. In this work, we opt for the latter by examining the proposed mechanism of apical amplification and its potential role in schizophrenia through the use of a dendritic compartment model architecture. We demonstrate that a neuron model with dendritic compartments is able to mimic the behavior of a neurotypical population in a visual classification task involving distinguishing a contour made up of specifically oriented gratings from a surround of randomly oriented gratings. This neuron model utilizes apical amplification to learn and replicate when individuals in the target population are able to distinguish the contour from the surround and when they are not. This suggests that apical amplification is a plausible mechanism for understanding the underlying neuronal mechanisms used in visual classification tasks. We hope to further this hypothesis in later research by perturbing the network parameters trained on the neurotypical population to then mimic the behavior of the population of individuals with schizophrenia. The new parameters of the model and their distinct configuration in contrast to that of the neurotypical population should provide a biologically plausible insight into whether aberrant functioning of the apical compartment is a reasonable hypothesis for explaining the impairments observed in individuals with schizophrenia in visual classification tasks.
Abstract: We reviewed the work of Kiessling, Lienert and Tahvildar-Zadeh on relativistic interacting dynamics of a two-body quantum-mechanial system consisting of one electron and one photon in one space dimension. We conducted numerical experiments to explore the role of various parameters in the theory and their influence on the trajectories of the two particles, and we studied the possibility of applying this theory to the analysis of Compton scattering.
Abstract: In Razborov's 1995 paper "Unprovability of Lower Bounds on Circuit Size in Certain Fragments of Bounded Arithmetic", we are told that a certain connection exists between communication complexity and circuit complexity: we are told that if we have a DAG-protocol of size S for the monotone mKW game of f, then there exists a monotone fanout circuit of size S for f, and vice versa. In this work, we begin by discussing a restriction we must place on DAG-protocols for the first direction of this claim to be true; we call this restriction being dead leaf free. We then explore protocols that have what we call reducibility at merges, and we explore what a protocol having reducibility at merges can tell us about the communication problem underlying the protocol. We then think about the relationship between protocols that have reducibility at merges and protocols that are dead leaf free, and we introduce a few open problems about this relationship -some computational in flavor, some combinatorial. Finally, we introduce a new kind of protocol called tie breaking protocols, and we show that tie breaking protocols that are dead leaf free and what we call fair characterize monotone comparator circuits. We end by discussing how we anticipate this protocol can be useful to finding a separation between monotone formulas and monotone comparator circuits.
Abstract: We surveyed recent clinical results to compare reproducibility of results given by p-values as compared with other statistical summaries. The aim was to study the stability of the p-values relative to their corresponding parameter estimate. We found that there the variability among p-values was larger than the variation among estimates, supporting the skepticism of the reliance on p-values in statistical inference.