Found from DenseDepth, each layer can be set to trainable in the encoder, then the outputs of the model and the required layers for skip connections can be used directly. Ends up being much cleaner
Functional models are way easier to work with, and I don't need any advanced features that would require model subclassing