I have recently encountered a strange issue where all 3 Front End servers in the same pool ran out of disk space at the same time. When investigating both sqlservr.exe processes (RTCLOCAL and LYNCLOCAL) were consuming the whole page file (or all of the available space on C: as the pagefile was system managed). In Resource Monitor I could see the commit for both sqlservr.exe processes was extremely high. Apologies for the poor images, I was using iMessage to send the images to a colleague and didn’t screenshot:
As you can see the Available Memory is 26.6GB but Commited is 67.4GB/67.9GB and Resource Monitor shows sqlservr.exe consuming it all. Restarting the server caused the pagefile to increase rapidly and fill the disk space again. I set the page file size manually to stop the consumption of disk space but this also filled immediately after a restart causing the server to be unresponsive.
To resolve I updated the local SQL Express instances to Service Pack 2. They were still running RTM. Lync was CU10 and Windows 2012 R2 was patched up to March 2015.
The following post explains how to update the Lync Front Ends to SQL Express SP1 and the same process works for SP2: http://www.shudnow.net/2013/06/04/updating-sql-2012-express-to-sp1-on-lync-2013-servers/. I did notice that I needed to restart after updating each instance. So for each Front End:
- Stop both SQL services to stop consumption of pagefile
- SQLEXPR_x64_ENU.exe /ACTION=Patch /INSTANCENAME=RTCLOCAL /QS /HIDECONSOLE /IAcceptSQLServerLicenseTerms
- Restart Server
- SQLEXPR_x64_ENU.exe /ACTION=Patch /INSTANCENAME=LYNCLOCAL /QS /HIDECONSOLE /IAcceptSQLServerLicenseTerms
- Restart Server
SQL Express SP2 can be downloaded from here: http://www.microsoft.com/en-gb/download/details.aspx?id=43351.
After each SQL Express instance was updated the commit for the SQL processes was much lower (around 300,000KB). Once all Lync FE’s were back online the Lync Front End service would not start. Get-CsPoolFabricState and the Lync event logs indicated I needed to run Reset-CsPoolRegistrarState, details here: https://technet.microsoft.com/en-gb/library/jj619172.aspx?f=255&MSPPError=-2147217396
The SCCM 2012 R2 CU4 client had recently been pushed out to all machines at the same time the problem started. I suspect this was the root cause, although could not find anything concrete.
In summary:
- Don’t push out updates to all FE’s (or any Lync role) at the same time (Including SCCM, SCOM etc)
- Ensure that you update the local SQL Express instances on your Lync servers
Technical Architect at Symity