Is there a reason why this is returning as ‘inactive’?
The Inactive
state on a Resthook indicates that we have attempted to negotiate delivery of a payload several times with exponential backoff to the target service and have not received a successful response to any attempt. This can happen during an outage, or if the target is extremely (>5m) unresponsive to requests.
If you are sure that the target is correct and a correct response will be returned, resubmitting a request for verification will trigger a return to Verified
status.
https://developer.infusionsoft.com/docs/rest/#tag/REST-Hooks/operation/verify_a_hook_subscription
Hi Tom,
We are also experiencing this issue with a handful of our customers. Their webhooks are being put into an Inactive status, which prevents any future posts to us. Also, when one is put into that status we don’t know when it happens or why it happened. There is no notification at least that I’m aware of. Also, is there any additional logging or troubleshooting we can do to figure out the reason why a specific one went into that status so we can solve the underlying issue?
-Nate
CustomerHub
Good afternoon Nate!
As I stated above, resthooks are marked inactive after several sequential retries to deliver a payload. Specifically, we retry the request after a minimum of 30 seconds, 5 minutes and 30 minutes. If after the final retry we fail to receive a positive response from the endpoint we cease sending payloads until the endpoint is re-verified so that we neither DDOS a non-responsive server nor waste bandwidth sending requests that will no longer be processed.
If you are seeing resthooks marked inactive regularly then the best solution would be to ensure that we always receive an immediate 200 OK
response from your endpoint. I’ve seen problems before when clients use the resthook call as a trigger for a script which then synchronously calls back into the API to perform an action or other logic, which depending on the number of objects in the resthook payload may cause the return loop to exceed the acknowledgement time limit.
Personally, I’d take the route of catching the request (most likely by using a Cloud Function endpoint) and immediately pushing it asynchronously into a Google Cloud Task Queue for processing via a different server at my leisure, ensuring that no amount of request density would be able to delay response.
As for additional logging, no, we do not persist outgoing call logs for our services due to the excessive number of them we handle routinely, so there would be nothing I could retrieve there for you. Given that the only thing that I could tell you from this end even if we did persist it would be something like “Your endpoint gave a 500” or “Your endpoint timed out”, such logging would be of little use regardless.
I should also mention that if you are performing cron tasks it would be simple to do a GET /hooks
call and loop through the subscriptions for a given authorization, find any that are needing verification and submit those as a daily cleanup. That’s more of a patch than a fix, but if you are having persistent problems should enable you to get by while you clean up the more significant structural problems.
Thanks for the response Tom. We do add the processing of the hook to a background job to avoid additional response delays. I will add some additional logging on our end, and will probably need to implement a script to check for webhooks that may be set to Inactive. It would also be great if we could have some additional insight like you mentioned. A reason or description w the date would be very helpful
Edit: I was also wondering if there is any potential size limits to the webhook data requests. Most customers that experience this issue are high-use customers with hundreds of thousands of contacts and lot of tags (hundreds). So my question is also if there are any other potential limits/reasons that a hook may go into that state besides the ones you listed? Because the webhooks are queued on your end, I can see some of the requests be pretty sizeable.
There shouldn’t be a reasonable size limitation on our side; we aggregate records to send in batches which then are still measured in numbers of bytes, so it would take something like a hundred thousand records being added within a five-minute window to potentially begin causing problems with payload size. While the difference between two triggers and two hundred doesn’t affect our architecture meaningfully, any action is being taken on the receiving end based on the individual records indicated by the payload (such as looping the array of ids to retrieve them from the API) sent in one batch may multiplicatively increase the processing time required to complete those operations.