New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent delays when performing search queries #1475
Comments
Hi mikezopa, Sorry to hear this, I'm inclined to believe this is something that happens in the bowels of WebRequest itself. googling webrequest and 21 seconds yields numerous results such as this one WebRequest notoriously measures timeouts from time to send first byte to last byte read. Anything that needs to happen prior (DNS setup, enqueueing connections) uses the default timeout of 20 seconds. So even if you set the Timeout to 1second on the request, a slow DNS lookup can cause the overal request to fail only after 20 seconds. Are you able to tell us more about your connectionsettings setup, the exception that you see when this intermittent spikes happen? AFAIK we do properly dispose of all the request streams we start and have extensive unit tests of all the code paths to make sure we close the responsestreams. We also do extensive Wireshark sessions every now and then to make sure we are reusing TCP connections. To hear you still face these sporadic spikes is saddening but I would love to find out the root cause of this. |
Hi @Mpdreamz, Sorry for my delayed response, I hadn't noticed that the issue had been updated! Thank you for the fast initial response. With regards to the var settings =
new ConnectionSettings(new SniffingConnectionPool(nodeUris), this.indexName)
.ThrowOnElasticsearchServerExceptions();
this.client = new ElasticClient(settings); Thanks for the information regarding the WebRequest. We are making a larger number of requests which we have not been doing before. We used to be making a request once every few seconds, but during times of loading, one I don't believe there are any exceptions thrown when this error occurs, the request completes successfully, just very delayed. Unfortunately, the caller of this micro-service gives up after 5 seconds (as we can't wait longer than that for various reasons). Sometimes we see multiples of this delay on our micro-service, so we see spikes in lower counts at 42, 63 seconds etc. We might have a go at implementing some of these suggestions and let you know what we find! |
You can new We try to follow most of the suggestions in that thread already :) If you are running on Azure we typically recommend setting A little involved but the get a bit more clarity you could run your microservice with trace enabled on system.net: http://mikehadlow.blogspot.nl/2012/07/tracing-systemnet-to-debug-http-clients.html This might give some insights into peeks. |
With regards to some more investigation work, I have been unable to reproduce the problem by creating various integration tests which hammer requests to elasticsearch, even when using multiple threads (run via the NUnit test runner). This is leading me to suspect it might be something to do with hosting within IIS and how connections are managed there. I was hoping to reproduce on my local machine allowing me to use Wireshark and various tracing tools a bit more easily then when located on a server. I will see if I can reliably reproduce using load testing against the micro-service when its running under IIS. None of the suggestions in the thread helped us unfortunately! :) |
Hi @mikezopa closing this due to inactivity but please feel free to report back! This issue for us is never really closed (due to its heisenbug nature) and something we revisit often. |
I know I'm kicking a dead horse here, but I was having a similar issue with the first query in an automated test taking ~20 seconds. Turns out, in my case, it was probably a DNS issue. Switching from localhost to 127.0.0.1 solved the problem. If anyone else happens upon this, maybe that info will be helpful for them. |
We have a strange problem whereby most queries take a very short amount of time to execute (<50ms), but intermittently they will take almost exactly 21 seconds.
While trying to find a solution to the problem, I stumbled across an unanswered stack overflow question describing almost identical behaviour (and use case).
Our use case
We have a small micro service API (using NancyFx) which performs relatively simple queries and filters. We instantiate our
ElasticClient
at start up, and inject it where required.We enabled the elasticsearch slow query log on the index, and this shows that queries never take more than ~130ms to execute (and that is seldom). It also doesn't align with these 21s "blips" that we see in the logs.
We have 2 nodes in our production cluster, and the
ElasticClient
uses aSniffingConnectionPool
to utilise both nodes.At the moment I wonder if it is caused by the client connections to the production cluster being dropped, however, the really odd thing is that this delay is 21 seconds. I would expect the time taken to reconnect to be more random. We have a bizarre spike if we graph the request times. 95% < 1s, 97.5% < 2s, and then nothing until this spike of the last 2.5% of requests at 21s.
Any ideas on things to try would be appreciated.
The text was updated successfully, but these errors were encountered: