DIMACS
DIMACS REU 2025

General Information

me
Student: Rohit Bhagat
Office: CoRE 446
School: Rutgers University
E-mail: rohit.bhagat@rutgers.edu
Project: Learning, Random Walks, and Error Correction over Deletion Channels
Mentor: Salim El Rouayheb

Project Description

This project aims to develop efficient codes for communicating over the deletion channel. The deletion channel remains an open problem. The goal of this project is to explore redundancies in the English language to create a code that is resilient to deletions in the adversarial case. This will involve the use of graph structures and random walks as well as LLMs.


Weekly Log

Week 1:
I moved in to Rutgers this week and familiarized myself with the program through orientations. My mentor and I had a preliminary meeting to discuss the general direction we would like to take the project in. I began some background reading on the topic, including Hamming codes and VT codes. I also started reading chapters of Cover's Elements of Information Theory. Aside from that, this week was mostly about getting situated and meeting all the great people here at the DIMACS REU program. I truly enjoy being around like-minded people and learning about their interests and project topics. We also run our own social events, and we've definitely started forming a nice community.
Week 2:
I continued covering more background material, such as reading Chapters 2 and 3 of Cover's Elements of Information Theory and Cover's Information Theory lectures. I also prepared and delivered Presentation 1. The main work for this week was experimenting with the capabilities of ChatGPT to recover text with deleted characters. I wrote a Python workflow that would input a recent news article and delete each character with a fixed probabiility. That would be fed into ChatGPT, and ChatGPT would try to recover the original text. I kept increasing the deletion probability until ChatGPT could no longer accurately reconstruct the original text. This shows that there is some inherent redundancy in the English language that can be exploited to develop codes that can be used for the deletion channel.
Week 3:
I wrote a web scraper to easily obtain texts from the Internet (recent news articles) to feed into ChatGPT. I also optimized the workflow to make it easier to feed prompts into ChatGPT, and I finished reading Chapter 3 of Cover and began Chapter 7. I also came up with a basic (though not very efficient) code for the deletion channel, which can probably be optimized. Since the idea is to map binary to English characters and transmit the English over the channel, I thought about mapping every sequence of 16 bits to a 4 letter English word. Also, we had Culture Day on Friday, for which I gave two presentations.
Week 4:
I continued scraping news articles and feeding deleted texts into ChatGPT with different deletion probabilities to analyze ChatGPT's probability of error. I've started looking for a phase shift—a critical deletion probability such that the probabilility of error is 0 (or around 10-6) for all deletion probabilities below it and the probability of error is non-zero for all deletion probabilities above it. This requires a massive text on the order of millions of characters, and the text must be "clean" of grammatical errors and specific unpredictable information such as numbers. I've begun using AI to generate such clean texts. I also continued reaching Chapter 7 of Cover.

My mentor and I also discussed possible means of encoding binary to English. One such possibility is mapping sequences of bits into words by indexing the English dictionary. This will be our first mapping attempt, and hopefully it performs better than benchmark methods from coding theory such as repeition code. We also discussed the possibility of removing all spaces from a text because the text is still decipherable. This would be a good way to increase efficiency.

Presentations