A large hypoxic zone forms every summer on the Texas-Louisiana Shelf in the northern Gulf of Mexico due to nutrient and freshwater inputs from the Mississippi/Atchafalaya River System. Efforts are underway to reduce the extent of hypoxic conditions through reductions in river nutrient inputs, but the response of hypoxia to such nutrient load reductions is difficult to predict because biological responses are confounded by variability in physical processes. The objective of this study is to identify the major physical model aspects that matter for hypoxia simulation and prediction. In order to do so, we compare three different circulation models (ROMS, FVCOM, and NCOM) implemented for the northern Gulf of Mexico, all coupled to the same simple oxygen model, with observations and against each other. By using a highly simplified oxygen model, we eliminate the potentially confounding effects of a full biogeochemical model and can isolate the effects of physical features. In a systematic assessment, we found that (1) model-to-model differences in bottom water temperatures result in differences in simulated hypoxia because temperature influences the uptake rate of oxygen by the sediments (an important oxygen sink in this system), (2) vertical stratification does not explain model-to-model differences in hypoxic conditions in a straightforward way, and (3) the thickness of the bottom boundary layer, which sets the thickness of the hypoxic layer in all three models, is key to determining the likelihood of a model to generate hypoxic conditions. These results imply that hypoxic area, the commonly used metric in the northern Gulf which ignores hypoxic layer thickness, is insufficient for assessing a model's ability to accurately simulate hypoxia, and that hypoxic volume needs to be considered as well.