Discussion:
Compacting CNFS buffers
(too old to reply)
Nigel Reed
2024-05-05 02:07:46 UTC
Permalink
Has anyone investigated the feasibility of compacting or compressing
the cnfs buffer files?

Here's a couple of scenarios to consider, keeping in mind that
generally, articles are not expired.

1. You are sent a bunch of articles but discover you've left some
binary newsgroups in your active file. You put this groups in your
expire list and delete rmgroup but you're left with a lot of empty
space, never to be used again unless the buffer recycles.

2. You receive a bunch of googlegroup spam articles that are deleted
via NOCEM, however considering there are so many, that leaves a lot of
unused space.


If you can find where an expired article is on disk and then find the
next article, you can just move it on disk and update the pointers to
the file. This could be a process that you just kick off or,
preferably, something that runs when innd isn't fully occupied using
spare cycles or something.


I know disk space is cheap these days but some people may be limited.
It would be good not to waste space.
--
End Of The Line BBS - Plano, TX
telnet endofthelinebbs.com 23
Julien ÉLIE
2024-05-07 10:14:26 UTC
Permalink
Hi Nigel,
Post by Nigel Reed
Has anyone investigated the feasibility of compacting or compressing
the cnfs buffer files?
Some people use ZFS to compress CNFS buffers (cancelled articles are
still present though). I am not aware of a compaction feature like the
one you want.
Post by Nigel Reed
If you can find where an expired article is on disk and then find
the next article, you can just move it on disk and update the
pointers to the file. This could be a process that you just kick off
or, preferably, something that runs when innd isn't fully occupied
using spare cycles or something.
I understand your point; I can add it to the wish list.

FWIW, though technically this is not what you are asking for, some
Post by Nigel Reed
1. You are sent a bunch of articles but discover you've left some
binary newsgroups in your active file. You put this groups in your
expire list and delete rmgroup but you're left with a lot of empty
space, never to be used again unless the buffer recycles.
You may want to configure Cleanfeed to reject binaries (including in
binary groups) so as not to store them and waste space. Since a few
weeks, NoCeM notices have also been sent for misplaced binaries (in
non-binary groups).
Post by Nigel Reed
2. You receive a bunch of googlegroup spam articles that are deleted
via NOCEM, however considering there are so many, that leaves a lot of
unused space.
Christoph Biedl implemented a new feature for INN 2.7.2 to store
articles by their Path header field. It is a new "path" option in
storage.conf. A typical use case is to store articles from a spammy
site in a small CNFS buffer to avoid overall retention impacts.

There's also the delayer program (in the contrib directory before INN
2.7.2) that you can use to delay articles, and give cancel control
articles and NoCeM messages time to arrive. For instance, by having a
frontend instance of innd receiving the articles from all your peers and
another local instance of innd fed by your frontend with a delay except
for cancels and NoCeM articles. The CNFS buffers of that second
instance will be spam free.
https://www.eyrie.org/~eagle/software/inn/docs/delayer.html
--
Julien ÉLIE

« Aequum est ut cuius participauit lucrum, participet et damnun. »
Nigel Reed
2024-05-08 05:43:09 UTC
Permalink
On Tue, 7 May 2024 12:14:26 +0200
Post by Julien ÉLIE
Hi Nigel,
Post by Nigel Reed
Has anyone investigated the feasibility of compacting or compressing
the cnfs buffer files?
Some people use ZFS to compress CNFS buffers (cancelled articles are
still present though). I am not aware of a compaction feature like
the one you want.
I am using ZFS with CNFS and it does a good job. I also want to use the
server for other purposes so reclaiming any space would be extremely
useful.
Post by Julien ÉLIE
Post by Nigel Reed
If you can find where an expired article is on disk and then find
the next article, you can just move it on disk and update the
pointers to the file. This could be a process that you just kick off
or, preferably, something that runs when innd isn't fully occupied
using spare cycles or something.
I understand your point; I can add it to the wish list.
That would be good.
Post by Julien ÉLIE
Post by Nigel Reed
1. You are sent a bunch of articles but discover you've left some
binary newsgroups in your active file. You put this groups in your
expire list and delete rmgroup but you're left with a lot of empty
space, never to be used again unless the buffer recycles.
You may want to configure Cleanfeed to reject binaries (including in
binary groups) so as not to store them and waste space. Since a few
weeks, NoCeM notices have also been sent for misplaced binaries (in
non-binary groups).
Unfortunately the articles are already in the CFS buffers. My bad for
forgetting to remove some binary groups from the active file. I did not
have cleanfeed running when importing since it's advised to turn off
perl and python filtering.
Post by Julien ÉLIE
Post by Nigel Reed
2. You receive a bunch of googlegroup spam articles that are deleted
via NOCEM, however considering there are so many, that leaves a lot
of unused space.
Christoph Biedl implemented a new feature for INN 2.7.2 to store
articles by their Path header field. It is a new "path" option in
storage.conf. A typical use case is to store articles from a spammy
site in a small CNFS buffer to avoid overall retention impacts.
I'll look into it, but again, the damage is already done.
Post by Julien ÉLIE
There's also the delayer program (in the contrib directory before INN
2.7.2) that you can use to delay articles, and give cancel control
articles and NoCeM messages time to arrive. For instance, by having
a frontend instance of innd receiving the articles from all your
peers and another local instance of innd fed by your frontend with a
delay except for cancels and NoCeM articles. The CNFS buffers of
that second instance will be spam free.
https://www.eyrie.org/~eagle/software/inn/docs/delayer.html
Sounds interesting but, again, I already have a lot of binary articles.
I'm not sure I want to set up a second server. I have a hard enough
time with one :)

I'll hold out hope someone with more knowledge than I also sees the
issue and decides to look into compacting CNFS buffers.

Thanks,
Nigel
--
End Of The Line BBS - Plano, TX
telnet endofthelinebbs.com 23
Julien ÉLIE
2024-05-13 20:56:50 UTC
Permalink
Hi Nigel,
Post by Nigel Reed
I'll hold out hope someone with more knowledge than I also sees the
issue and decides to look into compacting CNFS buffers.
It may as well be a new type of storage method, mixing the best of cnfs
and timecaf.
As far as I understand, the use case is to have large compacted buffers
without wrapping (articles do not expire but cancelled articles should
not be kept). It would correspond to timecaf except that a new CAF file
is created when it is full instead of every 256 seconds. Expiring CAF
files just compacts them if articles have been cancelled, releasing disk
space.
The feature may be implemented as an evolution of the current timecaf
method with options to parameterize it in storage.conf (like cnfs has
options). For instance with a maxart and a maxtime option to specify
the number of articles per CAF file (currently hard-coded to 262144) and
the number of seconds before creating a new CAF file (currently
hard-coded to 256 seconds but it may easily be a multiple of 256 seconds
so as to keep the current file naming). With maxtime set to 0, a new
file is created when maxart is reached.

Naturally, though it is more work, a totally new storage method could
also be created as timecaf is inherently linked to time and suffers from
the limitation that you cannot store more than maxart articles received
during maxtime seconds. They will just be dropped until a new CAF file
is created. It is not what you expect from the storage method you're
asking for. And re-using CNFS buffers may be tricky (to find and refill
holes, or to totally rewrite them - changing the storage tokens of all
articles).
--
Julien ÉLIE

« Vinum bonum laetificat cor hominis. »
Loading...