We determined the amount of time it took for intrinsic and extrinsic visual cues to determine the perceptual upright. The perceptual upright was measured using a probe, the identity of which depended on its perceived orientation (the Oriented Character Recognition Test). A visual background that filled the field of view and contained both intrinsic and extrinsic cues was presented in different orientations and for presentation times of between 50 and 500 ms followed by a mask. The contribution of each class of cue was identified by exploiting their different degrees of ambiguity. Intrinsic cues include scene structure (e.g., walls, floor and ceiling of an indoor scene) which indicates four potential up directions, and the horizon which indicates two possibilities. Extrinsic cues, which rely on information not in the image such as a surface acting as a support structure for an object, signal the direction of up unambiguously. The contribution of each class of visual cue could thus be identified from the number of cycles its effect showed as the background was presented in all orientations round the clock. While the more high-level extrinsic cues to up exerted a larger influence on the perceptual upright than the intrinsic cues, the magnitude of each cue's effect increased with presentation time at approximately the same rate with a time constant of about 60 ms. This finding poses a challenge for bottom-up theories of scene perception and suggests that low-level and high-level information are processed in parallel at least insofar as they indicate orientation.