Recent years have witnessed the widespread applications of mobiles and other portable electronic devices, involving a variety of screen content photos. Different from natural scene photos, screen content photos are composed of more lines, fewer colors and unique text messages. Thus, it is difficult to realize the satisfactory enhancement effect of screen content photos using traditional image enhancement technology since they are designed for the natural scene. In this paper, we develop a novel enhancement model for screen content photos by considering text and picture separately. To be specific, we first use a fully convolutional network to divide a screen content photo into three independent parts: picture region, foreground text region, and background. Second, an optimal modification of histogram is used to automatically enhance the picture region’s contrast, and the guided image filter is used to enhance the foreground text region. Third, the enhanced picture region, the enhanced foreground text region, and background are fused to obtain the final enhanced image. Experimental results show that our model has produced less noise and derived outstanding enhancement effect than the popular enhancement techniques. © 2021, Springer Nature Singapore Pte Ltd.