Q3. Under Attack (35 points + 15 bonus points) You work for a cybersecurity company, HyperSec, to monitor online forums for fake news. Recently, you’ve noticed an increasing number of curious sentences that look like this: “NewHeaven scisetints fnid evenidce taht the erath is falt” These sentences are perfectly readable by humans, but when you feed them into your machine learning models, they are totally confused and make wildly incorrect predictions. Your automatic fake news detection system is under attack! a. (15 points – bonus) As a first s tep, y ou w ould l ike t o c lassify a s entence x a s e ither a dversarial ( y = 1 ) or not (y = −1). Your boss doesn’t want you to use the hinge loss because she’s worried that the attacker might be able to more easily reverse engineer the system. So you decide to investigate alternative loss functions. For each of the five loss functions Loss(x, y, w) below: • Determine whether Loss(x, y, w) is usable for classification if we minimize the training loss with stochastic gradient descent (SGD). • If the answer is yes, compute its gradient ∇wLoss(x, y, w). • If the answer is no, explain in one sentence why it is not usable. (i) [3 points] Loss(x, y, w) = max{1 − b(w · φ(x))yc, 0}, where bac returns a rounded down to the nearest integer. (ii) [3 points] Loss(x, y, w) = max{(w · φ(x))y − 1, 0}. 13 (iii) [3 points] Loss(x, y, w) = 1 − 2(w · φ(x))y if (w · φ(x))y ≤ 0 (1 − (w · φ(x))y) 2 if 0 < (w · φ(x))y ≤ 1 0 if (w · φ(x))y > 1 (iv) [3 points] Loss(x, y, w) = ( max(1 − (w · φ(x)), 0) + 10 if y = +1 max(1 + (w · φ(x)), 0) − 10 if y = −1 . (v) [3 points] Loss(x, y, w) = σ(−(w · φ(x))y), where σ(z) = (1 + e −z ) −1 is the logistic function. 14 b. (10 points) Your next job is to decide which features to use in order to solve the classification problem. Assume you have a set D of real English words. (i) [5 points] Suppose you have a sentence x which is a string; e.g., x = “erath is falt”. Write the Python code for taking a sentence x and producing a dict representing the following two feature templates: 1. x contains word 2. number of words in x that are not in D Assume that words in x are separated by spaces; e.g., the words in x above are “erath”, “is”, “falt”. def featureExtractor(x, D): phi = defaultdict(float) return phi 15 (ii) [2 points] If k is the number of unique words that occur in the training set and |D| is the number of words in the given set of real English words, what is the number of features that the linear classifier will have? (iii) [3 points] Suppose that an insider leaks HyperSec’s classification strategy to the attackers. The classifier itself was not leaked, just the classification strategy behind it, which reveals that HyperSec is using a dataset of adversarial sentences to train a classifier with the features defined in part (i). The attackers then use this information to try modifying any fixed sentence (e.g., “climate change is a hoax”) into something readable by humans (e.g., “clmaite canhge is a haox”) but classified (incorrectly) as non-adversarial by HyperSec. How can the attackers achieve this? Explain the adversarial approach and its steps. 16 c. (10 points) Having built a supervised classifier, you find it extremely hard to collect enough examples of adversarial sentences. On the other hand, you have a lot of non-adversarial text lying around. (i) [3 points] Suppose you have a total of 100,000 training examples that consists of 100 adversarial sentences and 99,900 non-adversarial sentences. You train a classifier and get 99.9% accuracy. Is this a good, meaningful result? Explain why or why not. (ii) [3 points] You decide to fit a g enerative m odel t o t he n on-adversarial t ext, w hich is a distribution p(x) that assigns a probability to each sentence x. For simplicity, let’s use a unigram model (see: https://en.wikipedia.org/wiki/N-gram ): p(x) = Yn i=1 pu(wi), where w1, w2, …, wn are the words in sentence x, and pu(w) is a probabilitiy distribution over possible w’s. Suppose you are given a single sentence “the cat in the hat” as training data. Compute the maximum likelihood estimate of the unigram model pu: w pu(w) 17 (iv) [4 points] Given an unseen sentence, your goal is to be able to predict whether that sentence is adversarial or not. You have a labeled dataset Dtrain = {(x1, y1), . . . ,(xn, yn)} and would like to use the unigram model to train your predictor. How could you use p(x) (from the previous problem) and Dtrain to obtain a predictor f(x) that outputs whether a sentence x is adversarial (y = 1) or not (y = −1)? Be precise in defining f(x). Hint: define a feature vector φ(x). f(x) = 18 d. (15 points) You notice that the adversarial words are often close to real English words. For example, you might see “erath” or “eatrh” as misspellings of “earth”. Furthermore, the actual number of adversarial words is rather small (it seems like the attacker just wants to reinforce the same messages). This makes you think of another unsupervised approach to try. Let D be the set of real English words as before and a1, . . . , an be the list of adversarial words you’ve found, and let dist(a, e) be the number of edits to transform some adversarial word a to the English word e (how exactly distance is defined is unimportant). We wish to choose K English words e1, . . . , eK ∈ D and assign each adversarial word ai to one of the chosen English words (zi ∈ {1, . . . , K}). Each English word e ∈ D incurs a cost c(e) if we choose it as one of the K words. Our goal is to minimize the total cost of choosing e1, . . . , eK plus the total number of edits from adversarial words a1, . . . , an to their assigned English words ez1 , . . . , ezn . As an example, let D = {“earth”, “flat”, “scientists”} with c(“earth”) = 1, c(“flat”) = 1, c(“scientists”) = 2, and a1 = “erath”, a2 = “falt”, a3 = “eatrh”. Then with K = 2, one possible assignment (presumably the best one) is e1 = “earth”, e2 = “flat”, z1 = 1, z2 = 2, z3 = 1. (i) [3 points] Define a loss function that captures the optimization problem above: Loss(e1, . . . , eK, z1, . . . , zn) = 19 (ii) [5 points] Derive an alternating minimization algorithm for optimizing the above objective. We alternate between two steps. In step 1, we optimize z1, . . . , zn. Formally write down this update rule as an equation for each zi where 1 ≤ i ≤ n. What is the runtime? You should specify runtime with big-Oh notation in terms of n, K and/or |D|. (iii) [5 points] In step 2, we optimize e1, . . . , eK. Formally write down this update rule as an equation for each ej where 1 ≤ j ≤ K. What is the runtime? You should specify runtime with big-Oh notation in terms of n, K and/or |D|. (iv) [2 points] Is the above procedure guaranteed to converge to the minimum cost solution? Explain why or why not. If not, what method what algorithm could you use with such guarantees? 20