Hardware failure (20230118) UPDATE
A transitory solution has been set-up until reception of a new switch.
Some computations will restart soon.
Hardware failure (20230118)
Cluster switch died unexpectedly. All processing is interrupt until further notice.
Network infrastructure switching (20230110) Improved instructions
If you need to continue to use the service:
1) Remove the current cable.
2) Connect to the old internet network.
3) Identify the port you are connected to.
4) Contact me to acivate the port.
In case you cannot identify the port:
Plug the cable and contact me.
Note that these operations must be done one person at the time, in order to allow identifying the connection.
Network infrastructure switching (20230110)
Starting January 10th, all connections to the internal network (users.squares.lan) will move from the "flying wires" I originally draw, to the infrastructure installed by Gérald Houart.
This means, that
- I'll shut down the old switch and close all the existing connections
- almost all offices will be potentially reachable
- bandwidth should theoretically improve
End of HVS2 reception (20230103)
HVS-2 service in horizontal polarization ended yesterday to continue in vertical polarization.
Since no interesting traffic was ever present on this service, no effort were made to accommodate for the change.
In case an important change occurs, investments for a new LNB and cable would be considered.
New hardware (20221207) UPDATE
The three new nodes have been installed, increasing our processing capacity to a normal level.
New server is up. Services transfer will start soon in a progressive schedule.
PRIAM (20221117) UPDATE
Server is now placed in his new location.
Service resumed now...
Shutdown for maintenance.
Outage duration undetermined, but should not exend further than Nov. 22nd.
New hardware (20221114)
Oldest computing nodes (82xx) have been decommissioned. Farewell old friends.
Three new nodes will replace them soon.
Another server has also been ordred to transfer various services on new gears.
All these machines were ordered on October 18, and should arrive before end of current month.
During installation, some outages could occur due to the reorganisation of some server cases in the racks.
NO MAIL TODAY (20221027) RESOLVED
After having contacted IT support, argued, waited for hypothetical answers, they eventually allowed me to continue using the system which was up for 14 years now.
This was possible because I was apparently not the only one in the situation
. Now I have about 200 delayed mails to handle, so some patience will be required to be up-to-date.
NO MAIL TODAY (20221013)
Since Thursday October 13 2022 in the morning, I've been rejected from Micro$oft email system used at ULB.
So the only way to contact me will be phone (Tue -> Thu) and surface mail.
This also means that no SO2 alert will be sent until this will be solved.
I've tried to launch a ticket (twice), but without any effect so far (20221021).
AJAX disk failure (20221012) UPDATE
Disk replaced, rebuild on its way.
AJAX disk failure (20221011)
A new disk has been ordered. An outage of about 2 hours will occur, without prior notice, upon reception for replacement.
Black-Out (20221007) UPDATE
All services resumed correctly.
Black-Out (20221006) UPDATE
PELEE ended scrubbing, and is now back online.
PATROCLE still busy.
An unexpected power outage occurred last Saturday.
Recovery is in progress.
FS are in scrubbing, and will be made available as soon as completed.
Power outage (20220927) UPDATE
Servers have restarted gracefully. Processing was initiated.
Back processing should start as soon as possible to recover lost time.
Power outage (20220927) UPDATE
Services shutdown postponed to Friday 23rd.
Networks renaming (20220917)
Due to naming conflicts in domain names, local "cpm.ulb.ab.be" domain has been renamed to "users.squares.lan"
In case of problems, please allow a certain delay for propagation of information through all resolvers before firing a bug report.
TELAMON down (20220906) UPDATE
Telamon up again. UPS batteries are out-of-life, ordering new ones.
TELAMON down (20220905)
Telamon unexpectedly stopped working yesterday.
SQL operations were disrupted, breaking the automatic data processing. SQL is no operational on a secondary server, and processing has restarted.
Investigations are scheduled.
Airco repair (2022xxxx)
Leak has been found and the evaporator has to be replaced.
During these works to maintain a decent temperature for the sensitive hardware (He-Disks) all computing activities will be interrupted, and only reception will be maintained.
Date to be announced when known, but probably in September.
Power outage (20220927)
Due to high tension cabin replacement, there will be a power outage on Tuesday September 27th.
Being absent the preceding days, ALL the systems will be down from Thursday 22nd in the afternoon until Wednesday 28 somewhere in the day.
Reception will be left on as long as possible to avoid data losses as much a possible.
Airco maintenance (20220824)
Systems will be in minimal activity mode from 23rd to 25th of August.
Hector maintenance (20220728)
No mail received or sent (personal and alerts) until Tuesday 2nd of August.
Airco weakness and heat wave (20220717) UPDATE
16:28 UTC : Systems are now in minimal activity mode.
Airco weakness and heat wave (20220716)
During the heat wave of next week, due to the current status of the air conditioning system and
in order to avoid any damage to some sensitive parts, services will be reduced to minimal activity
from Sunday 17th afternoon till Wednesday 20th (somewhere in the day after manual restart).
Only essential servers will be maintained alive, and all processing and data sharing will be shut-down.
Data reception will be preserved as much as possible. Unprocessed data will be recovered later when conditions will be more clement.
Airco leak (20220714)
The airco maintaining the computer room at a decent temperature has been diagnosed to leak.
An urgent leak detection intervention will occur somewhere this summer. During that intervention cooling has to be shut off, and hence all computers will have to be powered down. This will include processing, data access, and probably reception. I won't be reachable by mail during this time too.
This process should take a whole day, and computers will be down form the day before, until the next working day after. Although all efforts will be done to warn users beforehand, interruption could occur without any notification, so don't rely on any viable services within the next weeks until end of summer.
Ajax failure (20220601) UPDATE
Reconstruction terminated, all services resumed.
Clytemnestre failure (20220601) UPDATE
Filesystem was not accessible anymore.
All services restarted.
Clytemnestre failure (20220530)
Hub server died unexpectedly yesterday. No data are available until it could be restarted. This means, internal SAMBA shares, and external RSYNC are affected.
Due to the train strike on Tuesday, the first physical inspection will be on Wednesday only.
Ajax failure (20220529)
A second failing disk will be replaced on June 1st.
Operational operations and data will be suspended during reconstruction, which is expected to last for about 6 hours.
LODI Version 20220522 (20220527)
LODI will be updated to accommodate for trends corrections and homogeneity between different platforms.
Back processing will be launched as soon as technically possible.
Ajax failures (20220517): UPDATE
Reconstruction terminated successfully. Processing restarted.
Ajax failures (20220517)
New attempt replacing the failing disk, with this time more careful detection of the hardware!
Rebuilding for about 6 hours (ETA: 1500Z)
Ajax failures (20220512): UPDATE
Reconstruction terminated successfully. Processing restarted.
Ajax failures (20220512): UPDATE
Data partially available. Only 2022 is missing until further notice.
Processing still interrupted. Resumption forecast for later this afternoon (~ 1600Z)
Failing disk replacement, postponed to Tuesday 17th.
Ajax failures (20220512)
Disks are delivered.
Processing, sharing is interrupted.
Unfortunately I removed the wrong disk from the array. A rebuild is in progress.
Data will remain unavailable until finished. Hopefully the correct drive could be replaced today...
Ajax failures (20220507)
One disk is near end of life, a second one is showing worrying signs. Replacement disks have been
All back processing of test data are interrupted.
Since I've no idea on when I'll be able to replace these disks, emergency preservation copy has been started in case of definitive crash in my absence.
In that fatal event, all 2022 data will be lost, and all systems will be heavily disturbed without prior notification.
Website has now been switched to fully encrypted.
HTTP/2 has been enabled.
Since not used anymore, MediaWiki has been removed.
Brescia and Chianti back-processing finished (20220421)
Back processing is now complete for operational PDU's.
All test PDU's are still to be processed for all products. This will be done after a well deserved pause and spring cleanup.
Certificates bis (20220421)
Fully signed certificates are only valid for outside access.
Internal access is encoded using self-signed certificates, which could raise a warning.
Certificates signing was broken due to interruption of plain HTTP.
New certificates are now operational, and all services are operational.
From now on, only secured HTTPS connections are accepted.
In a near future, all connections will require HTTP/2.
Nodes loss (20220407)
Two more second generation nodes are defunct.
Second disk installed and reconstruction started.
Estimated resuming of services Tuesday 12.
Patrocle: Disks just arrived.
First disk installed and reconstruction started.
Patrocle: two more disks are failing.
Data (2007 -> 201810) are unavailable until replacement, back-processing is suspended.
Brescia and Chianti back-processing started (20220322)
Back processing fully started (some were already done during the intermission).
Combined plots are processed manually on a random basis.
Maintenance (20220322) UPDATE
Patrocle and Pelee: Back online
Mail migration ULB.AC.BE -> ULB.BE (20220317) UPDATE
Mail are erratic! Even if sent, they sometime get lost!
PS: Don't forget to give a subject otherwise they are directly trashed!
Maintenance (20220317) UPDATE
Ajax: up again, processing restarted.
Patrocle and Pelee: Data still unavailable; probably back online Tuesday 22
Nodes: 1 dead, 1 sick
Ajax: to be upgraded; no data at all, no processing during this time.
Patrocle: new disk has arrived, filesystems are reconstructing.
Pelee: updated, offline for filesystem scrubbing.
Working nodes: Upgraded, 1 definitively lost
Brescia and Chianti back-processing delayed (20220311)
Due to a failing disk in one of the servers, operations are delayed.
Filesystem has been isolated. (201105 -> 201808)
Disk should be replaced soon (I hope -- quotation, ordering, delivery,...).
Forli back-processing ended (20220311)
After 784 days of continuous work, all Forli operations have finally ended.
This is likely the last reprocessing of such magnitude.
Now, after a quick update to Synology servers, reprocessing of Brescia and missing Chianti will start on Tuesday.
Mail migration ULB.AC.BE -> ULB.BE (20220222) UPDATE
No support for all this crap from ULB!
A dirty work-around has been applied, until next vexation.
This hack modifies the originating send name from "Automated SO2 Alert (on behalf of D. Hurtmans)..." to ""HURTMANS Daniel" ..."
Mail migration ULB.AC.BE -> ULB.BE (20220210)
ULB will abandon usage of normal protocols to read and send e-mail, by migrating completely to
Since all my systems are Micro$oft agnostic (and allergic), there is a high probability of mail disruption (no reading, no sending, no alerts, ...)
Should this occur, surface mail will be the only way of contacting me, welcome to 21st century.
Back-processing... (20220210) UPDATES
Next estimation puts the end of Forli reprocessing on March 17th 2022.
After that, Chianti reprocessing campaign will start, followed by (or concurrently with -- still TBD) the reprocessing of Brescia data to accommodate for the plume altitude corrections.
At the end of these processing of all the operational data (_o) a period of maintenance will be scheduled.
Back processing of test data (_t) will take place at a later time.
Back-processing... (20211120) UPDATES
We have reached a time period with more L2 pixels, slowing down the processing.
New estimations put the end of this back-processing campaign (if all goes well) on March 28th 2022.
Back-processing... (20211018) UPDATES
Due to the disappearance of Metop-A, back-processing is now benefiting of more horsepower.
New estimations put the end of this back-processing campaign (if all goes well) on February 24th 2022.
Brescia 20211010 (20211018)
Brescia has been updated to a new version which corrects a bug in SO2 altitude determination.
Roughly speaking all altitudes are 1 km lower than with previous version.
This is the end... (20211015)
At 18:00 UTC IASI-A mission will end definitively...
Metop-A will be deactivated by December 1st.
IASI-A reduced swath... (20210916)
Don't use / distribute any Forli results, there is something wrong (probably with the angles) because molecular amounts are varying along the swath much more than with IASI-B or -C.
CLYTEMNESTRE and others (20210818) MORE UPDATE
This was a bad idea. Overlayfs could not cope with changing data in one of the aggregated filesystems, leading to
stale file handles.
Reverting to unionfs for most of the systems.
CLYTEMNESTRE will continue using it until a better alternative is found. The only left option seems to be mergerfs...
CLYTEMNESTRE and others (20210818) UPDATED
After reading pro's and con's, all unionfs were replaced by overlay on all systems. Cross fingers!!!
CLYTEMNESTRE (20210817) UPDATED
A temporary solution has been set up using overlay.
More thorough evaluations of performances should be performed before making it permanent
Samba server is partly broken.
An obscure interaction with unionfs impeach the directories built from multiple remote filesystems to be readable (which means quasi all data).
I'm currently clueless on how to solve this.
Back processing...(20200117) UPDATES
Last estimations put the end of this back-processing campaign (if all goes well) on March 17th 2022.
DIOSCURES SQL database in split brain (20210731)
SQL database of DIOSCURES machines was in split brain mode.
All http services were interrupted for an hour in order to solve this issue.
AJAX servicing (2021 WEEK 30) UPDATE
Update complete (after a potential disaster!)
Services have resumed.
Potential risk of failures of some scripts in the next few days. Untimely reboots are not impossible.
AJAX servicing (2021 WEEK 30)
Ajax server OS will be upgraded. No service (processing, distribution, ...) will be available during this time.
Since this operation will require some later adjustments, some perturbations are expected for a few days after restart.
All data from 2021 will also be unavailable.
PATROCLE railing (20210715) COMPLETED
All services and operations should be up again.
PELEE translation (20210714) DONE
Pelee server will be translated to it's definitive place.
During these operations, server will remain off. This means data from September 2018 till 2020 will remain unavailable.
PATROCLE railing (20210713)
Mechanical operations done. Server restarted but now testing filesystem.
Will be online as soon as completed probably on July 15th.
Dish antenna intervention (20210708) DONE
Feed horn has been replaced. Only a minor power loss of 0.4 dBm
Daily plots LODI, SO2, and BRESCIA (20210708) SOLVED
Should be operational now.
PATROCLE railing (2021 WEEK 28)
Patrocle server units will be fitted with racking rails.
During these operations, server will remain off. This means data from 2007 till August 2018 will remain unavailable.
Interruption should not last more than a week.
Daily plots LODI, SO2, and BRESCIA (20210706)
Due to work in progress in order to solve the mail migration problem, all plots for LODI and BRESCIA are unavailable
until further notice.
This includes the end-of-the-day summary of SO2 alerts.
Dish antenna intervention (20210708)
A small intervention is forecast on the antenna around 0800UTC.
Reception will be interrupted during the operations. Interruption should not exceed 1 hour.
PELEE is dead, long life to PELEE (20210630)
The new PELEE server is now operational. All data should normally be available again.
As of today, all included, 157 TB of free storage space is available and 187 TB are used! This should leave some margins.
Mail migration ULB.AC.BE -> ULB.BE (20210618) THOUGHTS
The problem seems to be coming form the old mail sender address being encoded in the executable. Changing this is
However the new executable is incompatible with the current installation on nodes. An upgrade could solve this.
Upgrading nodes will break plot generation because the new version of GMT makes scripts incompatible.
Conclusion: a long work to update plotting scripts; a long work to update nodes; a long time before problem could be solved.
Mail migration ULB.AC.BE -> ULB.BE (20210617) CONFIRMATION
Due to the new ULB (actually Micro$oft!) mail "security features", the SO2 alert system is broken.
Some computing nodes dead (20210616)
They are refusing to restart. Probably a good time to replace them (they were already the oldest ones in the bunch).
Reception main server is dead (20210616) UPDATE
Restarted. UPS requested shutdown.
Unexpected halt of TELAMON (20210616)
Restarted. UPS requested shutdown. Still under recovery.
Mail migration ULB.AC.BE -> ULB.BE (20210424) FINAL
Migration completed. Only SO2 alert system is not 100% guaranteed.
Unexpected halt of TELAMON (20210408)
Data from 201809 -> 201906 are unavailable until further notice.
All processing queues are stopped.
Processing resumed; 4 nodes lost in reboot process.
Situation will remain the same as long as teleworking will be the rule.
Reception main server is dead (20210327) UPDATE
Transfer to the HVS-2 server seems only partly working. For some reasons, files are retrieved only randomly.
Syncing from Paris helps to recover so far...
Situation will remain the same as long as teleworking will be the rule.
Reception main server is dead (20210327) UPDATE
Reception has been transferred to the HVS-2 server in a minimalist version until original server will be accessible...
Reception main server is dead (20210326)
No Tellicast BAS and HVS-1 services are available due to the crash of the server.
This implies, no IASI reception, and hence no real-time processing (eg: Forli, and Brescia)!
Due to the interdiction of access to our facilities by the authorities, repair will be made when return will be possible...
PELEE is dead (20201021) UPDATE
All accessible files where copied. They will be soon made accessible again.
Please report any suspicious file (eg: unreadable or partially corrupt) in the range 201809-201907.
PELEE is dead (20201015)
After restarting the device, I started a reconstruction which has been interrupted by the "loss" of the drive
The whole device is now considered as lost.
I'm trying an emergency recovery of the files. This will be a long process involving weeks of testing after copy.
PELEE is down for a while (20201004)
A severe failure occurred. Two HDD crashed simultaneously. This is unfortunately an unrecoverable error for a RAID5.
First investigation shows a total loss of data for 2019 Jan-Jul and 2018 Sep-Dec (~15TB of data). There are almost no chances to somehow recover part of it.
Since this is the second severe alert in a few weeks for this server, I'll investigate the opportunity or feasibility to replace all remaining HDD's (14 * 3TB) or the full machine.
Another HDD failure (20200918)
While transferring to PELEE a disk crashed completely. Data are now offline and will be released when a new drive will be installed.
Reception failures (20200908) UPDATE
A new LNB head has been installed. Reception seems correct. Further monitoring needed before shouting victory.
PRIAM unavailable (20200908) UPDATE
Misconfiguration after disk insertion has been corrected.
Server has been restarted and is working fine.
HDD failures (20200908) UPDATE
Disk has been replaced and server is working again.
HDD failures (20200829)
Due to another failing disk in PATROCLE, all data from 2007 to 2017 included are offline until disks replacement.
PRIAM unavailable (20200821)
PRIAM seems stuck in limbo. I will try to unlock it on my next visit.
AJAX crash (20200818)
AJAX's UPS batteries died unexpectedly. Processing will restart as soon as possible.
HDD failures (20200801) UPDATE
PATROCLE's and PELEE's failing disks have been replaced. FS are back online, and back-processing has resumed.
PRIAM and CHRYSIPPE need still an intervention, whose date will be announced later.
HDD failures (20200716) UPDATE
Replacement disks have been ordered today.
Following machines will be down for repair as soon as new disks will arrive and as long as raid rebuilds will be in progress.
HDD failures (20200714)
Due to failing disks in PATROCLE, all data from 2007 to 2017 included are offline until disks replacement.
Reception failures (20200630)
Reception restarted after shifting the carriers frequencies by about -5MHz!
This drift is another symptom of the LNB failure.
Data are still received erratically.
Reception failures (20200629)
Reception is now completely dead. LNB should be replaced as soon as I can come back and access the antenna.
Reception failures (20200418)
Reception is chaotic since more than a week. No solution available until physical inspection of the antenna could be performed.
NH3 plots (20200326) [SOLVED]
Plots for NH3 seem wrong. Investigations are ongoing.
Data were improperly filtered. Plotting of wrong figures is now running.
Back processing restarted (20200324)
Missing data were processed, and "normal" back-processing has resumed.
AJAX lack of free data space (20200309)
AJAX data storage was full. No processed nor received data could be saved.
Some months of 2018 are now transferred to another server freeing space for a few weeks.
AJAX down again(20200120)
Reboot, and wait...
Removed all NFS mounts to Tellicast (even if not used they were mounted...)
... is now started. Expect an 8 (day platform)/day computation rate.
Please keep in mind that combined plots are still requested manually.
Server crash(20200109) - Updated
Same problem again...
Cause : hanging transfer of files from reception computer.
Solution : ????
A temporary patch using RSYNC has been implemented with the hope to stabilize the situation during the investigations of a possible permanent solution.
Server crash(20200109) - Update - Temporary solution?
The server is still unstable. Working on it...
All systems have been put in "safe" mode, and all services are down.
NFS system seems to be somehow incompatible between server and Tellicast reception machine since December update. Tellicast receptor should probably be updated too, with all the potential side effects!!!
Server crash(20200107) - Update
Ajax has restarted, and processing has been resumed. Back processing should start soon.
Unfortunately log files were erased preventing any forensic analysis.
Ajax crashed on Dec 22. Restart operations are on their way.
Missing LODI files(20191217)
Apparently recovery of missing LODI files is not working. Script is under investigation.
Forli Updates (20191213) - FINAL 20191216
Processing has restarted normally. Back-processing expected to begin in 2020.
Power outage (20191213) - Update 3
Operations are taking too much time. Processing restart is delayed until Monday.
Power outage (20191213) - Update 2
All nodes have been upgraded without too much trouble.
Forli upgrades are now ongoing...
PDU's database overflow (20191213)
Orbit numbers for IASI-A passed the barrier of 65536!!! Storage for orbit was 16 bits only (fully sufficient for a
5 years mission).
Storage has increased to 32 bits, enough for 2147483647 orbits i.e. about 458751 years.
Power outage (20191213) - Update 1
Servers have restarted without apparent damage. Services will be restarted as soon as possible.
Power outage (20191212) - REMINDER
A power outage is forecast during the night of 12th to 13th December, in order to certify the electrical
installations of the whole building.
No processing will occur between 12th 14:00UTC and 13th 11:00UTC. Reception will remain on, as long as possible, using the UPS.
This interruption will be a good time to update some of the machines to a fresher OS version, and to make the new Forli version operational.
Forli Updates (20191213) - UPDATE 20191205
Since I've received no comment so far, I take the changes for granted. New version is now freezed and should be as expected made default operational version on 20191213.
Priam down (20191105) - SOLVED
Due to the failure of a fan cooling HDD's, Priam and Chrysippe are down until further notice.
Forli Updates (20191213)
It has been a while since Forli processing has been updated (4 years!!!) with version 20151001.
This new release, aka 20191122, will bring up some improvements, corrections, and changes:
- Hitran database update to the latest available version, with largely corrected CO line intensities and positions and also updates for HNO3
- MT_CKD update and the related use of Line-Mixing for CO2 lines.
- Correction in the computation of absorbance look-up tables.
- Other correction partly implemented during the last BUFR update (May 2019): altitudes computation; correct usage of humidity; ...
Demo data will be temporarily available here, files are however still marked 20151001 for reading facility.
Once fully operative, back processing will be engaged (provisionally around mid January or at the latest early February).
Node failure (20191102) - SOLVED
The hard-drive of a node broke down causing the instability of the queue manager. NRT computations were halted.
Services have resumed now. Investigations on the opportunity of repairing the node will be made Monday.
Disks from a dead node were recycled. Node is up again.
Processing issue (20191018) - SOLVED
Processing was halted on Oct 16th due to a full storage on AJAX.
Excess data are being transferred on long term storage and processing resumes slowly.
Processing issue (20190909) - SOLVED
No titles are visible on the new plots, making them a bit awkward. Investigations are in progress.
Incompatibility with latest Ghostscript, which has been downgraded.
Processing issue (20190909) - SOLVED
Due to a incomplete update of GMT, no plots were generated for SO2, Brescia and Lodi.
Update is now complete and plots are in the process of being regenerated.
Processing problems (20190313) - SOLVED
Since the last upgrade all processing results are wrong. Don't trust them starting 20190311.
Problem is being investigated and all processing have been cancelled.
Sorry for the inconvenience.
Math library was incompatible with the new kernel/glibc. Library has been updated, and processing has resumed.
Back-processing will start as soon as systems are stabilized.
Power outage (20190310) - SOLVED
A pow er line just gave up. All reception and RSYNC services are down until reset (Monday).
Mars Attacks! Again (20190309)
Servers are currently undergoing severe attacks.
Attempts to break the report server have reach a peak of 7000 requests an hour. Ban rules have been implemented.
SSH attacks are also permanent, but to a more sustainable rate. This is also true for all exposed machines on the public network.
I encourage everyone to secure as much as possible their OS, change frequently password, and NEVER work with an administrative account.
New nodes 20180308 (Update)
The 3 new computing nodes have been delivered today. Installation should take place in the next days.
Data Access 20180408 (Final)
The server is "operational" and has been migrated to the rail kit in the proper rack.
IP survey 201807
During summer I'll make a survey on the internal IP usage (access to Clytemnestre, ...) in order to remove all unneeded entries and rationalize the DHCP and DNS.
Data Access 20180406 (Update)
External access has been migrated. However some instability is observed (probably due to too strict filtering
Sporadic interruptions are still expected.
Data Access 20180404 (Update)
Operations for internal data access seem to be going smoothly. No perturbations are therefore forecast for tomorrow.
New nodes 20180224
New computing nodes are on their way (2 weeks delay) in order to replace the old "Work820x" nodes which tend to be
They should take on the new IASI-C data.
Tellicast station 20180224
We have also received a new reception computer to replace the HVS-2 (very old system).
However a second network interface is missing and computer cannot be set-up.
Since no sensitive data are yet received on this channel the impact on normal operations should be negligible.
Data Access 20180222
Starting week 10, "Clytemnestre" and "Hesione" will go on retirement. A new server, which will replace both of them
in one unit, is currently in the installation phase.
Data access could be unavailable or unstable (due to multiple reboot) during this time. This is valid for both internal and external access.
Expected phasing is: samba services: Monday-Tuesday, external RSYNC: Wednesday-Friday
Sorry for the inconvenience (if any)...
Power Supply Dead 20181112
A power supply died unexpectedly. All filesystems were stuck, and data processing severely disturbed.
A new PSU has been ordered.
Power Outage 20181025
Due to work on water pipes above the main power lines, electricity will be shutdown for about an hour around 10:30.
Some services are already down.
Reception failure 20180927
Hard drive of the reception computer was full, of unexpected CrIS data arrived through HVS, preventing reception of
After a full trashing of those data, reception restarted normally.
Power Outage 20180904
Processing is interrupted sine die.
Water damage 20180808
Water leaked onto the main power lines causing a massive electrical spark. All systems went down (ungracefully).
Service will be restarting as soon as possible.
Retirement plan 20180712
One of the older nodes has definitively refused to boot.
These computers were running 24/7 for about 10 years, which sounds like a good life full of work for these machines.
Time has come to think of a renewal, goodbye old chaps :-)
Power Outage 20180616
Once again!!! A major problem has been discovered in the high power supply line. Faulty pieces have been replaced.
Services are partly restarted, and will be fully operational when I'll be back at work.
Power Outage 20180602
Yes, another one... and one dead node.
Power Outage 20180530
An unexpected power outage put all systems down this morning. A 60Amps fuse was broken and has been replaced.
Systems are recovering slowly and back-processing should start soon.
Brescia 20180401 (2018052127)
Short update: Back-processing has now successfully started. As plotting daily summary plots is time
consuming they used to stuck the queue, delaying the alert plots.
These are now produced on the node executing Brescia. While this speeds-up the alerts, daily summaries are still heavily delayed.
Work on power supply 20180514-20180915
Due to a change in power delivery to our building, electrical cabins will be replaced, and one or
several power outages could occur for a day or more during the above mentioned time period. So far
no precise schedule is available, and it will probably be short notice.
During these outages and, depending on the timing, probably a few hours before and after, all services will be stopped.
Brescia 20180401 (20180427)
During week 19, I'll start switching the Brescia processing to its new version.
This version will include all recent improvements made on SO2 altitude retrieval, and update dBT processing to HRI methods.
Update will imply a change in the plots available through the ULB MeTop/IASI website for SO2 alerts and for other species. These updates will be slightly delayed relatively to the processing.
In the first days, (possible) SO2 alerts sent by e-mail will lack their corresponding plots. They will be produced later.
When all operational processing will be running smoothly, back processing will be launched to get an homogeneous view.
BUFR V6.x (20180416)
Latest BUFR extractors are available here.
Outage (20180415) -- SOLVED for now
A circuit breaker tripped again this morning around 0300UTC.
Some computers, switches were abruptly put down.
Restart is forecast for Monday 0800UTC.
I've no clue on the source of this recurring problem. Technical support from the electricians will be requested.
HDD failure (20180409) - Update (20180410)
The faulty disk has been replaced. RAID is rebuilding.
Services will restart gradually today.
AJAX (the main server) has lost one of his hard drives. A ticket has been initiated with our hardware provider.
No services will be available until repair.
Service interruption (20180405)
A maintenance interruption is forecast on HESIONE. This will cause interruption of RSYNC services with IPSL.
Duration is not yet determined, but should not exceed two working days.
A circuit breaker tripped, causing an outage of some services. All servers have restarted.
Further investigations will be needed to find the source of this now recurring problem.
IASI_L2_v6.4 (20180307) updated
V6.4 has been deployed, and a bug in reading the CLP files made all the retrievals worthless. Patches were
Back-processing is running.
A new version of L2 will soon be deployed at Eumetsat. It will include a patch to account for the CO2
evolution during time.
This will affect mainly temperature profiles and accordingly Forli (and possibly Brescia) products.
Some preview tests files were processed for 20180107 -> 20180109 on IASI-B.
New year starts with inventory...
All L2 distributed by Eumetcast are now part of the inventory. This makes the display a bit awkward, but since it's for internal purpose...
SO2 data have been added to the database, and provisional HNO3 (NIT) placeholder inserted.
Some missing data have been recovered from IPSL, and back processed as needed.
New preview interface (20171026) updated
Interface has been also enabled for Brescia. New plots will be generated in the new format starting from 20171031.
Older plots, will be regenerated in due time.(eg: after Forli)
A new preview interface has been set up for Forli results. The design is theoretically "smart"phone friendly. Please report any odd behaviors.
Interpolation and scatter plots are now shown separately. A checkbox in the selection box allows to switch interpolation on or off.
All plots are now being regenerated, but this will take a while.
Some plots are now publicly available in a separate page.
Mars Attacks! (20171010)
Servers are currently undergoing severe attacks.
Countermeasures are being evaluated, but this will probably result in unexpected temporary services shutdowns.
Data Access (20171009)
Reports were given about data being inaccessible.
Filesystems were not correctly remounted at last reboot. Situation is no back to normal.
EumetCast reception (20171002) updated
A software upgrade should be performed this day. Reception will probably be shut down and data won't be received during this period.
As with all major software updates, unexpected problems could arise, therefore no duration could be provided.
Update has been performed without any problem. Reception and processing have resumed smoothly.
EumetCast reception (20170913) updated
An intervention (extension) will be performed on our Eumetcast reception system around 08:00 UTC.
During this time, no data will be received. Downtime should be relatively short, although no real estimation could
be provided so far.
Intervention last about 1 hour, and successfully allowed the installation of new modem for receiving second transponder. The presence of a splitter on the single cable arriving from the LNB induced a reduction of signal level, on transponder 1, of about 4dBm (-41 dBm to -45 dBm), which corresponds to a new power level at 40% of the preceding one (10% less than the 50/50 of an ideal splitter). More analysis on transponder 2 later next week.
SO2 Alerts (20170610)
A recent update, introduced a bug in the mail alerting system leading to an incorrect link to the alert picture.
Normally this should be corrected for future alerts.
All HDD have been upgraded to better disks. This took about 2 months!
Now the system seems stable and data are being transferred from other storages for the sake of better unification and readiness for the long awaited L1/L2 reprocessing.
Data are now all available again. Please report any suspicious behavior or file.
Power Outages (20170425)
Two micro outages occurred last night. All systems are down until further notice.
Brescia (20170303) -- SOLVED (20170304)
Since Feb 22, Brescia is crashing. The patch applied to use the corrected version of TWT files had introduced a
severe "protection fault" issue.
So far no solution has been found.
System is still unavailable and pourparler are ongoing with Western Digital to possibly replace the 36 drives,
which could misbehave with the Synology hardware.
On Feb 16th, Eumetsat changed without any prior warning the encoding of surface pressure. This broke the processing
of all products.
Software are now patched accordingly and correct processing is ongoing. Back processing of rotten data, will be launched as soon as possible.
Plots (20170216) -- SOLVED
Plot processing queue is broken. For unknown reason, the jobs remain stuck in queue and must be launched manually.
This could make the daily and alert plots to be delivered lately.
Server was overloaded by dead loop processes generated on the Eumetsat update of the 16th
Another recovery for nothing, another crash occurred while storing data on it. No L1 or L2 are available from 2007 to April 2013. Forli/Brescia results are also partly unavailable.
Just after the recovery of the incident occurred on Jan 23, another disk gave up. Interactions with Synology resumed.
No L1 or L2 are available from 2007 to April 2013. Forli/Brescia results are also partly unavailable.
Three disks simultaneously disappeared from the controller, leading to a crash of the RAID Structure and a total
loss of about 32TB of data.
This means that no L1 or L2 are available from 2007 to April 2013.
A ticket has been opened with Synology, with the hope to recover at least partly the lost data. Otherwise, a full download from Ether will have to be done, which will last around 40 to 50 days.
Mail server (20161028) Update
Apparently someone cut deliberately the power of the server without any permission.
The server had difficulty to restart due to this particularly brutal event. Now things seem to resume slowly.
My mail server went down... I will not answer any mail, and no alerts services will be available till Thursday November 3.
We've just received the new data server (116TB!). Set-up is ongoing.
We're facing mechanical problems to integrate it in the 19" rack.
Disk has been replaced and RAID partition is rebuilding. Access should be available tomorrow.
One disk of the RAID is failing. All services are down till a new disk will be plugged in.
Timeline service has been eventually resumed in a relatively elementary behavior.
Power Outage follow-up (20160704)
HESIONE has been fixed, with old spare components. This set-up is not guaranteed to work in the long term.
PRIAM error came from a misconfigured switch, which lost it's configuration at shutdown.
Power Outage results
HESIONE does not reboot anymore. This is unfortunately a definitive failure. This means no external services until
further notice (i.e.: RSYNC and reports).
PRIAM network interface seems damaged. Test will be performed next week to add a new interface if possible.
Power Outage (20160630-20160701)
There will be a power interruption from June 30th to July 1st. No operation will be available during that time.
Services will stop from 1400 UTC to around 0830 UTC, sorry for any inconvenience.
Processing and data access are down. Problem will be investigated Monday.
Processing will restart Tuesday after the last viability checks.
Processing restarted on Tuesday at around 10:00 Zulu. Missing data will be reprocessed when normal operation will be in steady state.
An unexpected maintenance has to be performed on AJAX. Outage should be short.
Operation have now resumed. (14:21Z)
Back Processing (20160106)
Back processing is now running in steady regime. Last estimation of processing time is 8 days platform /day.
This means that from today it remains about 15 months of uninterrupted computations (provisional end date: March 2017) !!!!
Network servers were successfully updated. Apparently most of the services are operational. So far only small problems are visible with some web pages due to the deprecation of PHP 5 in favor of PHP 7.
Forli version 20151001 is now operational. Back-processing has started and should last about 417 days (Finished
around Jan 2017)!!!
Server outage (20150927)
New server is almost operational.
Processing has restarted, and data are available.
Back-processing will start as soon as possible.
Server outage (20150923)
New server was delivered yesterday.
Installation has started.
Provisional restart date is Thursday Oct 1.
Server outage (20150902)
New server order has been made. Expect delivery delay of about 3 weeks.
Installation should last about one week.
Provisional restart date is Thursday Oct 1.
Server outage (20150824)
Main server is DEAD (hardware failure). No processing, reception and data available until replacement.
(20150825) An offer has been requested. Order will be placed ASAP. Expect a delay of 4 to 5 weeks between order and reception.
Server outage update (20150823)
Main server will be offline for investigations (as well as all other local services) Monday Aug 24th from 08:00 UTC
until further notice.
Due to another crash date has been advanced.
Server crash (20150820)
Main server crashed again. NFS daemon generates a "general protection fault" leading to a kernel panic.
In order to investigate the problem a maintenance shutdown will be performed next week.
SO2 alerts (20150504) Update
Mailing service has now resumed.
Outage (20150415) Updated
Main server unexpectedly died (root partition was full).
All process have recovered now... No apparent loss.
SO2 alerts (20150317) Update
Mailing service is broken since February (due to a security patch in glibc).
The compiler + library association being unable to compile successfully Brescia for the time being, I've no idea on when service could resume.
Plots are still available on usual webpage.
NPP-CrIS data from 2013 were purged to gain space on storage.
New https (20150125)
Changed server certificates (more secure ones) and removes SSL to keep only TLS.
New design (20150119)
A new design for the website has been implemented. Don't hesitate to send your comments, and eventual bug reports.
Back-processing to version 20140922 has resumed.
Migration from DVB-S to DVB-S2 is now complete. All receptions parameters seem correct.
Only a few PDU's were definitively lost during migration.
New BUFR extraction are available here.
This a preliminary version to be tested. As the new TWT are on a 110 grid instead of 90, an interpolation is performed to fit in the amp file structure.
Data skimming <sticky>
New plot selection criteria. Based upon statistical analysis of the residuals recommended values are used to avoid partly cloudy scenes:
|CO||-0.15/0.25 10-9||2.7 10-9|
|HNO3||-0.60/0.40 10-9||3.0 10-8|
|O3||-0.75/1.25 10-9||3.5 10-8|
|* insufficient statistics.|
Ozone is the most affected, as the standard flags are normally sufficient for CO and HNO3.