This qualitative study is intended to illuminate factors that affect the generalizability of portfolio assessments of beginning teachers. By generalizability, we refer here to the extent to which the portfolio assessment supports generalizations from the particular evidence reflected in the portfolio to the conception of competent teaching reflected in the standards on which the assessment is based. Or, more practically, “The key question is, ‘How likely is it that this finding would be reversed or substantially altered if a second, independent assessment of the same kind were made?’” (Cronbach, Linn, Brennan, and Haertel, 1997, p. 1). In addressing this question, we draw on two kinds of evidence that are rarely available: comparisons of two different portfolios completed by the same teacher in the same year and comparisons between a portfolio and a multi-day case study (observation and interview completed shortly after portfolio submission) intended to parallel the evidence called for in the portfolio assessment. Our formative goal is to illuminate issues that assessment developers and users can take into account in designing assessment systems and appropriately limiting score interpretations.