I received a report from a colleague today that he was getting timeout errors after clicking the save button on a list item edit form. Initial testing of web head performance showed no issues and the ULS log only reported timing out during the save.
Taking a deeper look, we established that the list in question had a custom 2013 workflow attached – aha! At the same time, OOTB publish workflows were taking longer than usual to complete.
Next step, we checked on the Workflow Manager log in the event view on each of the machines in our Workflow Manager farm. Lo and behold, we found a critical issue with connecting to the Service Bus on one of the servers. Turned out that all three service bus services were stopped. In checking the other two servers in the Workflow Manager quorum, they too showed stopped SB services.
My guess is that IT had rolled out a patch for Service Bus and not checked to see if the services restarted on each affected server. I believe Microsoft recently released a patch for Service Bus, which may or may not require a server reboot, which could account for the services having stopped and not restarted (expected after a reboot).
So, there you have it, if you come across these symptoms, check your workflow.