Theory Group Seminar Notes

Rishit Dagli

October 2022

Contents

 Introduction
 1 Lower Bounds for Locally Decodable Codes from Semirandom CSP Refutation
  1.1 Abstract
  1.2 Locally Decodable Codes
  1.3 How to prove the Theorem
  1.4 Normally Decodable Codes
  1.5 Proof: Going from LDC to XOR
  1.6 Proof: Existing q-LDC lower bound for q even
  1.7 Proof: k3 lower bound
  1.8 Conclusion
 2 Algorithms for the ferromagnetic Potts model on expanders
  2.1 Abstract
  2.2 The Ferromagnetic Potts Model
  2.3 The Problem
  2.4 Antiferromagnetic Potts model
  2.5 Main Results
  2.6 Potts Distribution
  2.7 Results
  2.8 Polymer Methods
 3 Statistical Learning using Compression
  3.1 Abstract
  3.2 Some background
  3.3 Density Estimation
  3.4 Binary Classification (with adv. pertubations)
  3.5 Sample Compression
  3.6 Gaussian Mixture Models
  3.7 Compression Framework
  3.8 Conclusion

Introduction

These are my notes for the seminars that happen in the Theory Group at The University of Toronto. Many thanks to Professor Allan Borodin for allowing me to attend the Theory Group seminars and helping out.

A PDF of these notes is available at https://rishit-dagli.github.io/cs-theory-notes/main.pdf. An online version of these notes are available at https://rishit-dagli.github.io/cs-theory-notes.

The Theory Group focuses on theory of computation. The group is interested in using mathematical techniques to understand the nature of computation and to design and analyze algorithms for important and fundamental problems.

The members of the theory group are all interested, in one way or another, in the limitations of computation: What problems are not feasible to solve on a computer? How can the infeasibility of a problem be used to rigorously construct secure cryptographic protocols? What problems cannot be solved faster using more machines? What are the limits to how fast a particular problem can be solved or how much space is needed to solve it? How do randomness, parallelism, the operations that are allowed, and the need for fault tolerance or security affect this?

1 Lower Bounds for Locally Decodable Codes from Semirandom CSP Refutation

7th October 2022

The related paper: Combinatorial lower bounds for 3-query LDCs by Alrabiah et al. [1]. Seminar by Peter Manohar. [2] [3]

1.1 Abstract

A code C is a q-locally decodable code (q-LDC) if one can recover any chosen bit bi of the k-bit message b with good confidence by randomly querying the n-bit encoding x on at most q coordinates. Existing constructions of 2-LDCs achieve blocklength n = exp(O(k)), and lower bounds show that this is in fact tight. However, when q = 3, far less is known: the best constructions have n = subexp(k), while the best known lower bounds, that have stood for nearly two decades, only show a quadratic lower bound of n Ω(k2) on the blocklength.

In this talk, we will survey a new approach to prove lower bounds for LDCs using recent advances in refuting semirandom instances of constraint satisfaction problems. These new tools yield, in the 3-query case, a near-cubic lower bound of n Ω~(k3), improving on prior work by a polynomial factor in k.

1.2 Locally Decodable Codes

Take codes b 0,1k x 0,1n

Codes x are read by the decoder, i [k], bi^ 0,1

Definition 1. C is a (q,δ,𝜖)-locally decodable if for any x with (x,Enc(b)) δn, Decx(i) = bi w.p. 1 2 + 𝜖 for any i.

Ask the question, what is the best possible rate for a q-LDC given a q?

q Lower Bound Upper Bound
2 2Ω(k) nn 2k
3 k2 nn exp(ko(1))
O(1), evenk q q+1 nn exp(ko(1))
O(1), oddkq+1 q1 nn exp(ko(1))

Focus on the case q = 3, we have gotten better bounds:

k n 2k (1)
k2 n exp(exp(log k log log k))

In [1], they show that a better minimum bound can be found than these existing ones for q = 3:

k3 n (2)

The main result is that:

Theorem 1. Let C be a (3,δ,𝜖)-locally decodable codes. Then n Ω~δ,𝜖(k3).

Semi-random CSP refutation comes to our aid to prove this! The intuitive way to put this theorem is that q-LDC lower bound is same as refuting ”LDC” q-XOR.

1.3 How to prove the Theorem

The idea:

1.4 Normally Decodable Codes

We can see that the decoder we have can arbitrary but WLOG we can assume there are q-unif hypergraphs H1,H2,Hk where every Hi is such that:

Hi ( [n] q)

We can also see that:

Each Hi is a matching such that |Hi| δn

and, Dec(i) picks C Hi and outputs jCxj

One such example is the Hadmard code:

b 0,1kf = (b,v) v0,1k (3)
bi = f(ei) = f(v) + f(v + ei)

Can think of this as v and v + ei are connected.

Matching vector codes are mh

1.5 Proof: Going from LDC to XOR

We suppose that our code is linear and that there exists q-unif hypergraphs H1,H2,Hk.

We also know that:

Each Hi is a matching such that |Hi| δn

and, Dec(i) picks C Hi and outputs jCxj

So, we start by considering a q-XOR instance ψb:

Vars: {xj}j[n] Over Equations:  jCxj = bi,i [k],C Hi

We can write down the maximum fraction of satisfiable constraints: val(ψb) = 1 for any b 0,1k.

It is sufficient now if we can argue that ψb is unsat with high probability for some random b when n k q q2 .

Now we need to refute XOR, there are many ways to argue unsatisfiability of an XOR instance. One reason why we can not use probablistic approaches here is that ψb only has k bits of randomness.

One way we can have some success here is to use a refutation algorithm

ψ A algval(ψ)

With this the guarantee then would be val(ψ) algval(ψ) which is similar to saying that if algval(ψ) < 1 then A refutes ψ. The ideal goal would be to refute random ψ with m constraints with high probability

However, we take a look at semi-random XOR. Our refutation algorithm and the guarantee will still be the same:

ψ A algval(ψ)

with the guarantee that val(ψ) algval(ψ).

So, now we generate semi-random ψwm constraints:

The equation we have is:

jCxj = bc (4)

And we also already know that

ψb is  jC

And, xj = bi,i [k],C Hi.

ψb is almost semi-random.

Thus, we have shown 1 Part 1 of Proof.

1.6 Proof: Existing q-LDC lower bound for q even

q-LDC XOR instance ψb is encoded by:

We now have a goal to argue that ψb unsat with high probability for random when b when n kq(q2)

frac. constraints satisfied by x {±1}n is 1 2 + f(x) 2 .

Here f(x) is:

f(x) = 1 m ibi CHi jCxj (5)
m = k δn

This makes our goal to be to certify with high probability that:

max x{±1}nf(x) < 1 when n k q q2 (6)

We will now try to refute ψb. With Equation 5 and Equation 6 to refute ψb is like showing:

w.h.p. max x{±1}nf(x) < 1 where f(x) = 1 m ibi CHi jCxj (7)

 when n k q q2 .

The idea is to design a matrix A N×N so that:

f(x) ||A||1 = max z,w{±1}NzTAw

As shown by Wein et al. [4] the matrix A can be indexed by

S ( [n] l)

Assign xy such that yTAy f(x)

and ys := jSxj which is simply the tensor product.
 

We need to now be able to answer how to set A(S,T)

yTAy = S,TySyTA(S,T) = S,TA(S,T) jSTxj (8)

Which shows that we are actually using symmetric difference here.

We say that if S T = C hi then jSTxj = bi

A(S,T) = bi if S T = C Hi

yTAy = i=1kb i Chi St=C jCxj = Dmf(x) (9)

Here D = number of S,T where S T = C.

Simplifying an earlier statement we can also say from here that: AC(S,T) = 1 if S T = C.

For which Ai = ChiAC and A = i=1kbiAi

Set yS := jSxj

yTAy = Dmf(x)Dmf(x) ||A||1

Note that the way we defined D here it only depends on |C| = q, we can say:

D =( q q 2)( n q l q 2)

Also we know Ac N×N and N =( n l) .

We have already proven that ||A||1 Dmmax xf(x) Dm Dδnk

It is also interesting to note that ||A||1 N||A||2 and we still need to be able to show that with high probability that ||A||1 is not too large.

Matrix Bernstein: with high probability over bi, ||A||2 kl where is the maximum number of 1’s in a row in any Ai.

Expected number of 1’s per row is δnD N n( l n)q2.

We can optimistically suppose that n( l n)q2 however this also needs l n12q.

Then D δnk ||A||1 Nkl

k l since δnD N

Now take l = n12qkq(q2) n

So, = 2l q

Because Hi are matchings, a random row will have only δnD N 1’s.

The idea now is to prune off all the bad rows or columns in A to get B such that:

||A||1 ||B||1 + o(N)

And, B δn( l n)q2

And now we can just use B instead which will prove q-LDC lower bound for q even.

1.7 Proof: k3 lower bound

Recall, q-LDC XOR instance ψb is encoded by:

The goal is argue that ψb is unsatisfiable with high probability for random b. And the idea is to design a matrix A N×N so that:

f(x) ||A||1 = max z,w{±1}NzTAw

The previous approach fails because the A from before requires q to be even.

One attempt is to represent rows as |S| = l and columns as |T| = l + 1. However this will only get us to k n.

We need to derive more constraints, using Ci Cj get us to nk constraints so each nj is in k constraints new nk2 constraints.

The matrix A is indexed by S, A(S,T) = bibj. The calculation is now:

nk2D ||A|| 1 Nkl

An optimist approach is NkD N = nk( l n)2

l n k

k nk3 n

The row pruning tricks would still work provided that any {u,v} is in at most polylog(n) constraints.

1.8 Conclusion

This proof for q = 3 is not generalizable for all odd q and neither is a reduction to 2-LDC. This is particularly true because of the row pruning step.

2 Algorithms for the ferromagnetic Potts model on expanders

14th October 2022

The related paper: Algorithms for the ferromagnetic Potts model on expanders by Carlson et al. [5]. Seminar by Aditya Potukuchi.

2.1 Abstract

The ferromagnetic Potts model is a canonical example of a Markov random field from statistical physics that is of great probabilistic and algorithmic interest. This is a distribution over all 1-colorings of the vertices of a graph where monochromatic edges are favored. The algorithmic problem of efficiently sampling approximately from this model is known to be #BIS-hard, and has seen a lot of recent interest. I will outline some recently developed algorithms for approximately sampling from the ferromagnetic Potts model on d-regular weakly expanding graphs. This is achieved by a significantly sharper analysis of standard ”polymer methods” using extremal graph theory and applications of Karger’s algorithm to count cuts that may be of independent interest. I will give an introduction to all the topics that are relevant to the results.

2.2 The Ferromagnetic Potts Model

PIC

Figure 1: A sample graph

We start by defining some basic notation:

Notice that for β < 0 it means that we take the antiferromagnetic case. Here we talk more about when β > 0 meaning it is ferromagnetic.

This could have quite some applications:

2.3 The Problem

we know p(χ) exp(β m(χ))

Now for β = 0 it means that we are doing a uniform q-coloring of V

For β = we do a uniform proper coloring of G

What we need to do is given G and β, efficiently sample a coloring from this distribution.

p(χ) = exp(βm(χ)) χexp(βm(χ)) (10)

We add the normalizing factor here:

Nomalizing factor  = χexp(βm(χ))

Now we can also say,

χexp(βm(χ)) =: ZG(q,β) (11)

A partition function of the model/distribution is very important for this POV.

Our problem is that given G and β we want to efficiently sample a color distribution. We give 2 facts:

  1. It is enough to compute ZG(q,β)
  2. #P-hard

We now modify the problem as: Given G and β, efficiently sample approximately a colouring from this distribution.

𝜖 approximation will have us sample a law from q such that ||p q||TV D 𝜖, thus

||p q||TV D := 1 2 χ|p(χ) q(χ)| (12)

We modify our original problem template to now be: Given G and β, efficiently sample 𝜖-approximately a colouring from this distribution.

Fully Polynomial Almost Uniform Sampler can allow us to sample 𝜖-approximately in poly(G, 1 𝜖) time.

Instead Fully Polynomial Time Approximation Scheme: 1 ± 𝜖-factor approximation in poly(G, 1 𝜖) time.

We can also show for a fact that FPTASFPAUS.

2.4 Antiferromagnetic Potts model

The Antiferromagnetic Potts model:

p(χ) expβ m(χ) (13)

where β < 0

Given G and β < 0, we want to be able to give an FPAUS for this distribution. It is then equivalent to instead work on the problem: given G and β < 0, give an FPTAS for its partition function ZG(q,β).

From some previous work, we know that there exists a βc such that:

We can say that this is #BIS-hard (bipartite independent sets). Thus, doing this is at least as hard as an FPTAS for the number of independent sets in bipartite graphs. If our graph has no bipartiteness then this becomes a NP-hard problem.

For now, let’s consider the problem given a bipartite graph G, design an FPTAS for the number of individual sets in G. This accurately captures the difficulty of: the number of proper q-colorings of a bipartite graph for q 3, the number of stable matchings, the number of antichains in posets.

2.5 Main Results

For our purposes we assume that G is always a d-regular graph on n vertices. Now for a set S V , we define it’s edge boundary as:

(S) := #(uv G|u S,vS)

Now, G is an η expander if for every S V of size at most n2, we have |(S)| η|S|. For example we can take a discrete cube Qd with vertices {0,1}d, uv is an edge if u and v differ in exactly 1 coordinate.

Using a simplification of the Harper’s Theorem we can say that Qd is a 1-expander [7].

Theorem 2. For each 𝜖 > 0 and there is a d = d(𝜖) and q = q(𝜖) such that there is an FPTAS for ZG(q,β) where G is a d-regular 2-expander providing the following conditions hold:

The main result shown was that

Theorem 3. For each 𝜖 > 0, and d large enough, there is an FPTAS for ZG(q,β) where G for the class of d-regular triangle-free 1-expander grpahs providing the following conditions hold:

This was previously known for:

Something to note here is that q poly(d) should not be a necessary condition.

As well as as in the case β (1 𝜖)β0 does not require expansion or even that q poly(d).

2.6 Potts Distribution

We first write the order-disorder threshold of the ferromagnetic Potts model

β0 :=ln( q − 2 (q − 1)1−2∕d − 1) β0 =2lnq d (1 + O(1 q)) (14)

We want to be able to know more about how the Potts distribution looks for β < (1 𝜖)β0 and for β > (1 + 𝜖)β0

PIC

Figure 2: Rough picture of the Potts model

2.7 Results

Another result we have is:

Theorem 4. For each 𝜖 > 0, let d be large enough q poly(d), and G be a d-regular 2-expander graph on n vertices then,

The strategy we have, to prove the theorem for β < (1 𝜖)β0:

2.8 Polymer Methods

The motivating idea is to visualize the state for β large at low temperature as ground state + defects.

Typical Colouring = Ground State + Defects

Polymer methods are pretty useful in such cases. These were first proposed in [8] and originated in statistical physics. We take G to be our defect graph and each node in this represents a defect.

Now using Polymer methods X GY

PIC

Figure 3: Proof Schematic

Ideas is to ZG(q,β) Zred + Zblue + where Zred eβnd2

Zredeβnd2 = IV (G) γIwγ where wγ is the weight of polymer γ.

We now move towards cluster expansion: multivariate in the wγ Taylor expansion of:

ln( IV (G) γIwγ)

This is an infinite sum, so convergence is not guaranteed however convergence can be established by verifying the Kotecký-Preiss criterion.

We also want to answer how many connected subsets are there of a given edge boundary in an η-expander?

A heuristic we have is to count the number of such subsets that contain a given vertex u: a typical connected subgraph of size a is tree-like, i.e., has edge boundary a d.

Working backward, a typically connected subgraph with edge boundary size b has O(bd) vertices. The number of such subgraphs number of connected subgraphs of size O(bd) containing u. The original number of subsets is also Number of rooted (at u) trees with O(bd) vertices and maximum degree at most d = dO(bd). Thus,

Theorem 5. At most dO(1+1η)bd connected subsets in an η expander that contains u have edge boundary of size at most b.

Another question to ask is how many q-colorings of an η-expander induce at most k non-monochromatic edges?

Easiest way is to make k non-monochrimatic edges is to color all but kd randomly chosen vertices with the same color. Now, k small these vertices likely form an independent set. we now color these kd vertices arbitrarily. There are:

( n kd)qkd+1

ways.

Theorem 6. For η-expanders and q poly(d) there are at most n4qO(kd) possible colourings.

Now we also know the maximum value of ZG(q,β) over all graphs G with n vertices, m edges, and max degree d. This will always be attained when G is a disjoint union of Kd+1 and K1

3 Statistical Learning using Compression

18th October 2022

The related paper: Adversarially Robust Learning with Tolerance by Ashtiani et al. [9]. Seminar by Hassan Ashtiani.

3.1 Abstract

Characterizing the sample complexity of different machine learning tasks is one of the central questions in statistical learning theory. For example, the classic Vapnik-Chervonenkis theory characterizes the sample complexity of binary classification. Despite this early progress, the sample complexity of many important learning tasks — including density estimation and learning under adversarial perturbations — are not yet resolved. In this talk, we review the less conventional approach of using compression schemes for proving sample complexity upper bounds, with specific applications in learning under adversarial perturbations and learning Gaussian mixture models.

3.2 Some background

We start by defining some notation:

3.3 Density Estimation

Our goal is that for every DZ, AZ,H(S) we want it to be comparable to OPT with high probability.

We take the example of density estimation in this case L(DZ,h) = dTV (Dz,h). Now, AZ,H probably approximately correct learns H with m(𝜖,δ) samples if for all DZ and for all 𝜖 with a δ (0,1). Now if S DZm(𝜖,δ) then:

PrS[L(DZ,AZ,H(S)) > 𝜖 + C OPT] < δ (15)

Now if we take the example of C = 2, let H be the set of all Gaussians in d then:

m(𝜖,δ) = O (d2 + log 1δ 𝜖2 )

We will now modify the above equation. Now, AZ,H probably approximately correct learns H with m(𝜖) samples if for all DZ and for all 𝜖 (0,1). Now if S DZm(𝜖) then:

PrS[L(DZ,AZ,H(S)) > 𝜖 + C OPT] < 0.01 (16)

For the example of C = 2, let H be the set of all Gaussians in d then:

m(𝜖,δ) = O (d2 𝜖2 )

3.4 Binary Classification (with adv. pertubations)

For the example of binary classification, we have Z = X ×{0,1} and h is some model which maps from h : X {0,1}.

We also have l(h,x,y) = 1h(x)y and then we will have the L be L(DZ,h) = E(x,y)DZl(h,x,y).

Now, AZ,H probably approximately correct learns H with m(𝜖) samples if for all DZ and for all 𝜖 (0,1). Now if S DZm(𝜖) then:

PrS[L(DZ,AZ,H(S)) > 𝜖 + C OPT] < 0.01 (17)

Now H is the set of all half spaces in d then:

m(𝜖) = O ( d 𝜖2 )

For the example of binary classification, we have Z = X ×{0,1} and h is some model which maps from h : X {0,1}.

We also have lU(h,x,y) = adversarial pertubations and then we will have the LU be LU(DZ,h) = E(x,y)DZlU(h,x,y).

Now, AZ,H probably approximately correct learns H with m(𝜖) samples if for all DZ and for all 𝜖 (0,1). Now if S DZm(𝜖) then:

PrS[LU(D Z,AZ,H(S))𝜖 + OPT] < 0.01 (18)

Now, consider the following H1 and H2:

PIC

Figure 4: Trade Offs

Here H2 is richer which can make it contain better models as well as harder to learn. We can characterize the sample complexity of learning H using Binary classification, with Binary classification with adversarial perturbations or with Density estimation.

In the case of binary classification the VC-dimension quantifies complexity:

m(𝜖) = Θ (V C(H) 𝜖2 )

The upper bound here is achieved using simple ERM

L(S,h) = 1 |S| (x,y)Sl(h,x,y)
h = argminhHL(S,h)

And then for uniform convergence:

suphH|L(S,h) L(Dz,h)| = O (V C(H) |S| )w.p. > 0.99 (19)

We now introduce sample compression as an alternative.

3.5 Sample Compression

The idea is to try and answer how should we go about compressing a given training set? In classic information theory, we would compress it into a few bits. In the case of sample compression, we want to try to compress it into a few samples.

If we just take the simple example of linear classification Number of required bits is unbounded (depends on the sample).

It has already been shown by Littlestone and Warmuth [10] that Compressibility Learnability

m(𝜖) = Õ ( k 𝜖2 )

It has also been shown by Moran and Yehudayoff [11] Compressibility = Learnability

k = 2O(V C)

Conjecture 1 (Compression Conjecture). k = Θ(V C)

Sample Compression can be very helpful by:

Typical classifiers are often:

In the case of classification with adversarial perturbations we had l01(h,x,y) = 1{h(x)y} and lU(h,x,y) = supx¯U(x)l01(h,x¯,y)

and then we will have the LU be LU(DZ,h) = E(x,y)DZlU(h,x,y).

Now, AZ,H probably approximately correct learns H with m(𝜖) samples if for all DZ and for all 𝜖 (0,1). Now if S DZm(𝜖) then:

PrS[LU(D Z,AZ,H(S))𝜖 + OPT] < 0.01 (20)

However one of the problems with this is if the robust ERM works for all H

L(S,h) = 1 |S| (x,y)Sl(h,x,y)
h = argminhHL(S,h)

The robust ERM would not work for all H, uniform convergence can fail,

suphH|LU(S,h) LU(D z,h)|

can be unbounded.

We can say that any “proper learner” (outputs from H) can fail.

In a compression-based method the decoder should recover the labels of the training set and their neighbors and then compress the inflates set:

k = 2O(V C)

So,

mU(𝜖) = O (2V C(H) 𝜖2 ) (21)

There is an exponential dependence on V C(H).

Ashtiani et al. [9] introduced tolerant adversarial learning AZ,H PAC learns H with m(𝜖) samples

if DZ, 𝜖 (0,1), if S DZm(𝜖) then

PrS[LU(D Z,AZ,H(S)) > 𝜖 + infhHLV (D Z,AZ,H(S))] < 0.01 (22)

And,

mU,V (𝜖) = Õ (V C(H)dlog (1 + 1 γ) 𝜖2 ) (23)

The trick is to avoid compressing an infinite set and now our new goal is that the decoder should only recover labels of things in U(x).

To do so we can define a noisy empirical distribution (using V (x)) and then use boosting to achieve a super small error with respect to this distribution. And then, we encode the classifier using the samples used to train weak learners and the decoder smooths out the hypotheses.

It is interesting to think of Why do we need tolerance? There do exist some other ways to relax the problem and avoid 2O(V C)

This is also observable in the density estimation example.

3.6 Gaussian Mixture Models

Gaussian mixture Models are very popular in practice and are one of the most basic universal density approximators. These are also the building blocks for more sophisticated density classes and can think of them as multi-modal versions of Gaussians.

f(x) = w1N(x|μ1, 1) + w2N(x|μ2, 2) + w3N(x|μ3, 3)

We say F is Gaussian Mixture Model with k components in d. And we want to ask how many samples is needed to recover f F within L1 error 𝜖.

The number of samples m(d,k,𝜖).

To learn single Gaussian in d then

O (d2 𝜖2 ) = O (#params 𝜖2 )

samples are sufficient (and necessary).

Now if we have k Gaussian in d then we want to know if

O (kd2 𝜖2 ) = O (#params 𝜖2 )

samples are sufficient?

There have been some results on learning Gaussian Mixture Models.

Let us take the example of this graph. For a moment look at this as a binary classification problem. The decision boundary has a simple quadratic form!

PIC

Figure 5: A sample problem
V C dim = O(D2)

Here “Sample compression” does npt make sense as there are no “labels”.

3.7 Compression Framework

We have F which is a class of distributions (e.g. Gaussians) and we have. If A sends t points from m points and B approximates D then we say F admits (t,m)-compression.

Theorem 7. If F has a compression scheme of size (t,m) then sample complexity of learning F is

Õ ( t 𝜖2 + m)

Õ() hides polylog factors.

Small compression schemes imply sample-efficient algorithms.

Theorem 8. If F has a compression scheme of size (t,m) then k mixtures of F admits (kt,km) compression.

Distribution compression schemes extend to mixture classes automatically! So for the case of GMMs in d it is enough to come up with a good compression scheme for a single Gaussian!

For learning mixtures of Gaussians, the encoding center and axes of ellipsoid is sufficient to recover N(μ,Σ). This admits Õ(d2, 1 𝜖) compression! The technical challenge is encoding the d eigenvectors “accurately” using only d2 points.

σmax σmin can be large which is a technical challenge.

3.8 Conclusion

References

[1]    Omar Alrabiah, Venkatesan Guruswami, Pravesh Kothari, and Peter Manohar. A near-cubic lower bound for 3-query locally decodable codes from semirandom CSP refutation. Technical Report TR22-101, Electronic Colloquium on Computational Complexity (ECCC), July 2022.

[2]    Arnab Bhattacharyya, L. Sunil Chandran, and Suprovat Ghoshal. Combinatorial lower bounds for 3-query ldcs, 2019. URL https://arxiv.org/abs/1911.10698.

[3]    Venkatesan Guruswami, Pravesh K. Kothari, and Peter Manohar. Algorithms and certificates for boolean csp refutation: ”smoothed is no harder than random”, 2021. URL https://arxiv.org/abs/2109.04415.

[4]    Alexander S. Wein, Ahmed El Alaoui, and Cristopher Moore. The kikuchi hierarchy and tensor pca, 2019. URL https://arxiv.org/abs/1904.03858.

[5]    Charlie Carlson, Ewan Davies, Nicolas Fraiman, Alexandra Kolla, Aditya Potukuchi, and Corrine Yap. Algorithms for the ferromagnetic potts model on expanders, 2022. URL https://arxiv.org/abs/2204.01923.

[6]    Matthew Coulson, Ewan Davies, Alexandra Kolla, Viresh Patel, and Guus Regts. Statistical Physics Approaches to Unique Games. In Shubhangi Saraf, editor, 35th Computational Complexity Conference (CCC 2020), volume 169 of Leibniz International Proceedings in Informatics (LIPIcs), pages 13:1–13:27, Dagstuhl, Germany, 2020. Schloss Dagstuhl–Leibniz-Zentrum für Informatik. ISBN 978-3-95977-156-6. doi: 10.4230/LIPIcs.CCC.2020. 13. URL https://drops.dagstuhl.de/opus/volltexte/2020/12565.

[7]    Peter Frankl and Zoltán Füredi. A short proof for a theorem of harper about hamming-spheres. Discrete Mathematics, 34(3):311–313, 1981.

[8]    Tyler Helmuth, Will Perkins, and Guus Regts. Algorithmic pirogov-sinai theory. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, page 1009–1020, New York, NY, USA, 2019. Association for Computing Machinery. ISBN 9781450367059. doi: 10.1145/3313276.3316305. URL https://doi.org/10.1145/3313276.3316305.

[9]    Hassan Ashtiani, Vinayak Pathak, and Ruth Urner. Adversarially robust learning with tolerance, 2022. URL https://arxiv.org/abs/2203.00849.

[10]    Nick Littlestone and Manfred Warmuth. Relating data compression and learnability. 1986.

[11]    Shay Moran and Amir Yehudayoff. Sample compression schemes for vc classes. Journal of the ACM (JACM), 63(3):1–10, 2016.