Ok, basically xyberviri seems right with a very basic idea. I'll try to explain it differently.
Server side
Fact : Actually, your server is too slow to handle a shitload of commands (and there are a ton of commands) when there is "too many" players logged on.
Fact : After a server reboot, there is a "10 minutes" lag free play for first logged in players, before too many commands hit your server.
Fact : When you ask players to log off, some do, and your server will remove their commands from the queue : the ones who stay online could play "4 minutes" lag free.
Fact : You are currently trying to pinpoint the commands that are too slow or cause memory leak on the server to decrease global lag and process the command queue faster.
Fact : Your command queue seems "infinite", because some commands will be answered to client minutes after, and none is discarded. (I tried a 5 minutes lag once ...)
Client side
Fact : Your command communication protocol is asynch, so the client can't wait a server answer after the player issued a command, it continue running and prompting for more commands.
Fact : Displayed lag is the "connectivity" lag, not the "command" lag (time you wait between gathering herb click and herb in inventory increase).
Fact : Some players are stupid or impatient, and will spam a button when it do "nothing" in-game.
Fact : We all play with lag and your queue system. So we press one button, another, another, another, and then the first command answer come back from the server, then the second, then the third ...
Fact : Some faggots are macroing and they don't use a timer. Their client issue a ton of command without any wait, increasing your total command queue.
Idea 1 : Implement a client-side "no retry yet" limit, to prevent button spamming on all commands and stupid macroing spamming, waiting for server answer (or cooldown if server answer could not be tracked).
Idea 1 : Implement a client-side "no retry yet" soft time-out, so the player could reissue his command in case of command loss on the wire or server side.
Idea 2 : Implement a client-side "not answered by the server yet" counter, counting the command in progress number, to track the client "command" lag, the one that matter.
Idea 2 : Implement a client-side "not answered by the server yet" maximum limit, to prevent the server queue explosion and transform the game in something playable by all.
Idea 2 : Implement a client-side "not answered by the server yet" time-out, so the player could reissue his command in case of command loss on the wire or server side.
More idea : Display "command" lag instead / with "connectivity" lag. Gather statistics on various commands if you have multiple queues.
Another idea : Implement a time-limited priority system on commands server-side to help with lag processing, like on critical system processing. Terraforming should be low priority in this system for example. It should help to regulate lag spikes.
Note : stomper said you didn't do a stress test because there was no need to do one. Bad. We are currently doing one for you. But we think that these ideas could help your server lag, not processing faster, but processing more smartly - only the one command we wait for, not the hundreds "before".
Conclusion :
With these limitations hard coded client side, your server will become lag free the time needed for you to pinpoint slower or bugged query without affecting your playerbase goodwill.
We may be wrong, you may have several parrallel queues for different types of commands or so, but you shoud, at least temporarly, limit the command queue increase client-side, that should prevent the lag easily.
Rewards :
Please credit xyberviri and Cyrus if this suggestion work. Blame them if it don't.