Amanda Wang's REU 2024 Web Page

About Me



Name: Amanda Wang
Email: aw4309@princeton.edu
Home Institution: Princeton University
Project: Truth Learning in a Social and Adversarial Setting
Mentor: Professor Jie Gao

About My Project

Please click here to view my final presentation slides.

Research Log

Week 1: May 28 - June 31

I did some background reading to refresh my memory on sequential social learning models, information cascades, and voting rules. I worked on a presentation that would introduce my project to the rest of the REU participants next week.

Week 2: June 3 - 7

I read some more recent papers on several possible directions to explore for the summer. I mostly focused on a paper proving NP-hardness for computing full Bayesian updates to agent beliefs in a slightly relaxed setting. This paper actually proved a stronger claim (PSPACE-hardness) and used a reduction from 3-SAT which was a little convoluted. Filip found an earlier paper by the same authors proving NP-hardness for the same problem using a reduction from VERTEX-COVER, which was much neater. We hope to use a similar reduction to show NP-hardness for determining if a given network permits truth learning under some (equivalently, a best-case) decision ordering.

Week 3: June 10 - 14

This week I worked on constructing a reduction to show NP-hardness. Early in the week, Rhett suggested a revised version of the statement for the decision problem, which omitted the use of a family of graphs for the asymptotic part the definition and instead just asked if the learning rate could be lower bounded by some fixed threshold. This proved to be the right claim to formulate the reduction with, and later in the week, Filip proposed a reduction from 3-SAT. His reduction gave a construction for a digraph that encoded variable assignments by placing double edges between nodes representing variables and their negations. This meant the best-case ordering on the digraph would result in either the variable having a better chance of correctly predicting the ground truth (x = 1), or its negation having the higher probability (x = 0). Each clause was then represented by a single clause node, with incoming edges from each of the nodes corresponding to the literals in the clause. The idea was that this would give clause nodes that weren't satisfied a lower probability of predicting the ground truth, which would hopefully correspond to a lower network learning rate if the formula wasn't satisfiable.

Filip and I then did a morning of algebra. This turned out to not work, because clauses that were satisfied by having all 3 true literals resulted in clause nodes with a substantially (enough) higher probability of predicting correctly than those that were satisfied with just 1 true literal. So the best-case non-satisfiable formula did not necessarily have a lower learning rate than the worst-case satisfiable formula, especially as the number of clauses increased.

I then spent a day thinking about reductions to other NP-hard problems, like MAX-INDEPENDENTSET or VERTEX-COVER. But I felt that neither of these problems were actually easier to work with. As far as I know, evaluating the learning rate under the best ordering can't just be boiled down to finding some characterizing property of the network (like size of the largest independent set/clique), and the structure of the VERTEX-COVER reduction in the last paper didn't seem varied enough to give separable learning rates.

Toward the end of the week, I thought about adjusting Filip's reduction to separate the satisfiable and non-satisfiable learning rates. After a little trial and error, I fairly quickly came up with the idea of adding edges between all literals in a clause, as a means of letting the boosted probability of success for any satisfied literal propagate to the other two literals, while letting the increased 'depth' of the graph reduce the gap between the all true and one true cases. It wasn't clear that this worked and the probabilities at this point were pretty unmanageable, so I went ahead and had Mathematica grind it out for me. Based on the plots produced, this seemed like it would work for formulas with arbitrarily large number of clauses, although there were still some loose ends to tie up - but overall, I was pretty optimistic that the adjustments I needed to make wouldn't affect the final result too much, and we'd have a working reduction fairly soon!

Week 4: June 17 - 21

I produced a preliminary informal writeup to try to more clearly communicate the reduction with some loose ends resolved. Then, Filip and I began working on writing it up more formally. This took us the whole week, partly due to the construction being sort of difficult to clearly express and the probabilities being incredibly messy. We tried to make things cleaner by using bounds when possible and also fixed some errors I made when initially computing the probabilities.

Meanwhile, I thought a little bit about proving that Bayesian aggregation always does at least as well as majority dynamics for achieving truth learning, but it seemed difficult to approach without some kind of stronger technique than just basic induction. It felt like something sort of coupling-ey could be useful, but I didn't really have any ideas, especially given whatever I did would have to apply to any graph.

Week 5: June 24 - June 28

At the start of this week, I completed the writeup for the reduction and Filip and I "finalized" (tentatively) it together. During our Monday meeting, Prof. Gao brought up another paper she'd found about learning from crowds in which the private signals of participants are biased against the ground truth, rather than towards it (as we'd previously been considering). This setting is actually fairly reasonable, since we may still want to learn from crowds in scenarios where misconceptions are widespread. We discussed adapting this setting for network learning.

I continued thinking a bit about coupling and Bayesian vs. majority dynamics. I found some literature on using coupling to analyze interacting particle systems, which felt somewhat similar to our setting, albeit much more structured and with simpler update strategies. At this point, I'm not sure a coupling argument would be feasible, since our update strategies are pretty complex and only probabilistic through the current agent's private signal. So even modelling our network learning as a Markov process would seem somewhat not straightforward.

Week 6: July 1 - 5

Week 7: July 8 - 12

Week 8: July 15 - 19

Week 9: July 22 - 26

Week 10: July 29 - August 2

References

1.

Funding and Acknowlegements

Thank you to Prof. Jie Gao, Dr. Lazaros Gallos, Omar Garcia, and Lawrence Frolov for their mentorship and support throughout the program. Thank you also to NSF grant CNS-2150186, Charles University in Prague, and the DIMACS REU as a whole for their funding and accomodations. This was truly an unforgettable experience! :)