#432: SIP errors with CW on a Flash-HDD (Fixed)
| Reported by: | Release: | 1.2 | |
|---|---|---|---|
| Priority: | Normal | Milestone: | |
| Component: | Assigned to: |
When i install CW on a Flash-HDD (in my case a 1GB IDE Flash Module from Transcend and Debian/Lenny as OS) i get this SIP-Erros when i cancel a ringing call:
Jun 30 22:46:53 ERROR3044608912: chan_sip.c:14516 sipsock_read: We could NOT get the channel lock for SIP/10-812a – Call ID b3c8e0424b094405@192.168.116.102! Jun 30 22:46:53 ERROR3044608912: chan_sip.c:14517 sipsock_read: SIP MESSAGE JUST IGNORED: ACK
For testing i also installed a old Asterisk 1.2.13 (from debian source) on the same system, and i dont get this errors.
Ok, then i saw that sipsock_read function try 100 times (unsigned int lockretry = 100;) to repeat before i give the error, so i just set the lockretry to 100000, and then i worked. I added a counter and i can see, that after ~20000-40000 retries (what is ~40ms) the
if (p->owner && cw_mutex_trylock(&p->owner->lock))
in sipsock_read does not grab, because after 40ms the p->owner is NULL finally. But it should be NULL by the first attempt, because on a normal-HDD it does. So the question is, why it need ~40ms to get p->owner to NULL when we cancel a ringing call on CW@Flash-HDD?
Changelog:
test all changes no. Then i found: It must be one of the debian configure settings. Because i always build deb’s with dpkg-buildpackage. And when i do a pure install, the i dont get the errors. Now i test every of the debian configure settings.
after testing all configure options from the rule file in /debian step by step, i found out, that even one simple configure option for the path like:
./configure ‘—prefix=/usr/’ brings me back the error.
Only a pure ./configure gives me a callweaver without this erros. Crazy! and, just tested also giving some additional CFLAGS configure options like:
./configure CHOST='i686-pc-linux-gnu' CFLAGS='
is not a problem. So only one simple path option give the error.
the error is in libcallweaver.so Because when i simply replace the libcallweaver.so with one that was compiled with a pure ./configure its working without the error.
after long long debugging (but i really learned a lot about the internal structure of CW) i found, that the error is because the cw_hangup function in channel.c need to long to (re)close the cdr. And then we get a race condition so that the
if (p->owner && cw_mutex_trylock(&p->owner->lock)) does grap in sipsock_read/chan_sip, retrys 100times and then gives us the error.
Still dont know why the cdr closing does need longer with CW@Flash-HDD, but here starts the reason for the delay.
ok, it’s cdr_sqlite3! Because when cw_hangup function in channel.c tell the cdr’s it is closing time, then cdr_sqlite3 needs to long to write to cdr.db on a Flash-HDD. That’s it. So i just noload it. It should not be a hard job to code a lock/wait until cdr_sqlite3 or other cdr are ready, but this job can be done um, well… later :)
More details: cw_hangup calls all cdr’s to close now with post_cdr in cdr.c And here is also called res_sqlite to close now. Both res_sqlite and cdr_sqlite3 write slow on a Flash-HDD, so the hole cw_hangup progress does need unnaturally long time (up to 40ms)
For me not a problem. I dont need them both, so i just unload them. But perhaps i would be a idea to set a mark that cw_hangup is in progress, then sipsock_read can check if the channel owner is in hangup progress and does not need to lock it anymore, and dont need to throw us error messages.
The sqlite backends are probably doing fsyncs as part of their commit handling (they should be anyway). Not all filesystems, block device drivers and hard disks do proper write flushing and SSDs so far tend to have poor write performance, especially when they get small writes flushed separately. So the time to flush CDRs can vary a lot.
Then there’s the way the legacy * code (ab)uses locks. That isn’t going to change in a hurry and almost certainly not in the stable 1.2 branch.
What you can try is to use “batch = yes” in cdr.conf. That causes CDRs to be queued and written out in a separate background thread. I think the original intention was to batch updates to optimize database updates but a useful side-effect is that channel handling is no longer at the mercy of whatever the CDR backends are doing.
Thanks! It is working with batch = yes.
I have also find a other solution, that was to do the:
cw_cdr_detach(chan->cdr)
Just a lite bit later in cw_hanhup. After:
if (chan->tech->hangup)
res = chan->tech->hangup(chan);
That was also working for me.
- Status: changed from Open to Fixed
![Home ticket #432 [home]](/images/logo.png?1180520111)

RSS Feeds
i tested this with old RC5… no error with RC5. Any ideas what could be the influencing change?