Acceptability judgment collection in the field of generative syntax has generally proceeded informally, that is, without the formal methods familiar from experimental psychology. Two types of arguments have been proposed for the adoption of formal experimental techniques in generative syntax: (i) that formal experiments provide a potentially more sensitive measurement tool, and (ii) that informal techniques are in fact an unreliable measurement tool. While the first is relatively widely accepted, the second has become a matter of considerable debate because it suggests that the data that was used to construct current versions of generative theories are in fact faulty. In order to empirically investigate this claim, we tested all 469 data points in a popular generative syntax textbook (Adger, 2003) using 440 naïve participants, the magnitude estimation and yes-no tasks, and three different types of statistical analyses (traditional frequentist tests, linear mixed effects models, and Bayes factor analyses). This study suggests that the maximum replication failure rate for the informally reported results is 2%, or put another way, that the empirical foundation of generative syntactic theory is at least 98% replicable with formal experiments. These results suggest that (i) the extensive use of informally collected judgments in generative syntax has not led to theories constructed upon faulty data, and (ii) though there are several reasons for generative syntacticians to adopt formal experimental methods for data collection, the putative inadequacy of the empirical foundation of generative syntactic theories is not one of them.