AJAX down again(20200120)
Reboot, and wait...
Removed all nfs mounts to tellicast (even if not used they wer emounted...)
... is now started. Expect an 8 (day platform)/day computation rate.
Please keep in mind that combined plots are still requested manually.
Server crash(20200109) - Updated
Same problem again...
Cause : hanging transfer of files from reception computer.
Solution : ????
A temporary patch using rsync has been implemented with the hope to stabilize the situation during the investigations of a possible pêrmanent solution.
Server crash(20200109) - Update - Temporary solution?
The server is still unstable. Working on it...
All systems have been put in "safe" mode, and all services are down.
NFS system seems to be somehow incompatible between server and Tellicast reception machine since December update. Tellicast receptor should probably be updated too, with all the potential side effects!!!
Server crash(20200107) - Update
Ajax has restarted, and processing has been resumed. Back processing should start soon.
Unfortunately log files were erased preventing any forensic analysis.
Ajax crashed on Dec 22. Restart operations are on their way.
Missing LODI files(20191217)
Apparently recovery of missing LODI files is not working. Script is under investigation.
Forli Updates (20191213) - FINAL 20191216
Processing has restarted normally. Back-processing expected to begin in 2020.
Power outage (20191213) - Update 3
Operations are taking too much time. Processing restart is delayed until Monday.
Power outage (20191213) - Update 2
All nodes have been upgraded without too much trouble.
Forli upgrades are now ongoing...
PDU's database overflow (20191213)
Orbit numbers for IASI-A passed the barrier of 65536!!! Storage for orbit was 16 bits only (fully sufficient for a 5 years mission).
Storage has increased to 32 bits, enough for 2147483647 orbits i.e. about 458751 years.
Power outage (20191213) - Update 1
Servers have restarted without apparent damage. Services will be restarted as soon as possible.
Power outage (20191212) - REMINDER
A power outage is forecast during the night of 12th to 13th December, in order to certify the electrical installations of the whole building.
No processing will occur between 12th 14:00UTC and 13th 11:00UTC. Reception will remain on, as long as possible, using the UPS.
This interruption will be a good time to update some of the machines to a fresher OS version, and to make the new Forli version operational.
Forli Updates (20191213) - UPDATE 20191205
Since I've received no comment so far, I take the changes for granted. New version is now freezed and should be as expected made default operational vertsion on 20191213.
Priam down (20191105) - SOLVED
Due to the failure of a fan cooling HDD's, priam and chrysippe are down until further notice.
Forli Updates (20191213)
It has been a while since Forli processing has been updated (4 years!!!) with version 20151001.
This new release, aka 20191122, will bring up some improvements, corrections, and changes:
- Hitran database update to the latest available version, with largely corrected CO line intensities and positions and also updates for HNO3
- MT_CKD update and the related use of Line-Mixing for CO2 lines.
- Correction in the computation of absorbance look-up tables.
- Other correction partly implemented during the last BUFR update (May 2019): altitudes computation; correct usage of humidity; ...
Demo data will be temporarily available here, files are however still marked 20151001 for reading facility.
Once fully operative, back processing will be engaged (provisonnally around mid January or at the latest early February).
Node failure (20191102) - SOLVED
The hard-drive of a node broke down causing the instability of the queue manager. NRT computations were halted.
Services have resumed now. Investigations on the opportunity of repairing the node will be made Monday.
Disks from a dead node were recycled. Node is up again.
Processing issue (20191018) - SOLVED
Processing was halted on Oct 16th due to a full storage on AJAX.
Excess data are being transfered on long term storage and processing resumes slowly.
Processing issue (20190909) - SOLVED
No titles are visible on the new plots, making them a bit akward. Investigations are in progress.
Incompatibilty with latest ghostscript, which has been downgraded.
Processing issue (20190909) - SOLVED
Due to a incomplete update of GMT, no plots were generated for SO2, Brescia and Lodi.
Update is now complete and plots are in the process of being regenerated.
Processing problems (20190313) - SOLVED
Since the last upgrade all processing results are wrong. Don't trust them starting 20190311.
Problem is being investigated and all processing have been cancelle.
Sorry for the inconvenience.
Math library was incompatible with the new kernel/glibc. Library has been updated, and processing has resumed.
Backprocessing will start as soon as systmes are stabilized.
Power outage (20190310) - SOLVED
A power line just gave up. All reception and rsync services are down until reset (Monday).
Mars Attacks! Again (20190309)
Servers are currently undergoing severe attacks.
Attempts to break the report server have reach a peak of 7000 requests an hour. Ban rules have been implemented.
SSH attacks are also permanent, but to a more sustainable rate. This is also true for all exposed machines on the public network.
I encourage everyone to secure as much as possible their OS, change frequently password, and NEVER work with an administrative account.
New nodes 20180308 (Update)
The 3 new computing nodes have been delivered today. Installation should take place in the next days.
Data Access 20180408 (Final)
The server is "operational" and has been migrated to the rail kit in the proper rack.
IP survey 201807
During summer I'll make a survey on the internal IP usage (access to clytemnestre, ...) in order to remove all unneeded entries and rationalize the dhcp and dns.
Data Access 20180406 (Update)
External acces has been migrated. However some instability is observed (probably due to too strict filtering rules).
Sporadic interruptions are still expected.
Data Access 20180404 (Update)
Operations for internal data access seem to be going smoothly. No perturbations are therefore forecast for tomorrow.
New nodes 20180224
New computing nodes are on their way (2 weeks delay) in order to replace the old "work820x" nodes which tend to be unstable.
They should take on the new IASI-C data.
Tellicast station 20180224
We have also received a new reception computer to replace the HVS-2 (very old system).
However a second network interface is missing and computer cannot be set-up.
Since no sensitive data are yet received on this channel the impact on normal operations should be negligible.
Data Access 20180222
Starting week 10, "clytemnestre" and "hesione" will go on retirement. A new server, which will replace both of them in one unit, is currently in the installation phase.
Data access could be unavailable or unstable (due to multiple reboot) during this time. This is valid for both internal and external access.
Expected phasing is: samba services: Monday-Tuesday, external rsync: Wednesday-Friday
Sorry for the inconvenience (if any)...
Power Supply Dead 20181112
A power supply died unexpectedly. All filesystems were stuck, and data processing severely disturbed.
A new PSU has been ordered.
Power Outage 20181025
Due to work on water pipes above the main power lines, electricity will be shutdown for about an hour around 10:30.
Some services are already down.
Reception failure 20180927
Hard drive of the reception computer was full, of unexpected CrIS data arrived through HVS, preventing reception of new data.
After a full trashing of those data, reception restarted normally.
Power Outage 20180904
Processing is interrupted sine die.
Water dammage 20180808
Water leaked onto the main power lines causing a massive electrical spark. All systems went down (ungracefuly).
Service will be restarting as soon as possible.
Retirement plan 20180712
One of the older nodes has definitivly refused to boot.
These computers were running 24/7 for about 10 years, which sounds like a good life full of work for these machines.
Time has come to think of a renewal, goodbye old chaps :-)
Power Outage 20180616
Once again!!! A major problem has been discovered in the high power supply line. Faulty pieces have been replaced.
Services are partly restarted, and will be fully operational when I'll be back at work.
Power Outage 20180602
Yes, another one... and one dead node.
Power Outage 20180530
An unexpected power outage put all systyems down this morning. A 60Amps fuse was broken and has been replaced.
Systems are recovering slowly and backprocessing should start soon.
Brescia 20180401 (2018052127)
Short update: Backprocessing has now sucessfully started. As plotting daily summary plots is time
consuming they used to stuck the queue, delaying the alert plots.
These are now produced on the node executing Brescia. While this speeds-up the alerts, daily summaries are still heavily delayed.
Work on power supply 20180514-20180915
Due to a change in power delivery to our building, electrical cabins will be replaced, and one or
several power outages could occur for a day or more during the above mentionned time periode. So far
no precise schedule is available, and it will probably be short notice.
During these outages and, depending on the timing, probably a few hours before and after, all services will be stopped.
Brescia 20180401 (20180427)
During week 19, I'll start switching the Brescia processing to its new version.
This version will include all recent improvements made on SO2 altitude retrieval, and update dBT processing to HRI methods.
Update will imply a change in the plots available through the ULB MeTop/IASI website for SO2 alerts and for other species. These updates will be slightly delayed relatively to the processing.
In the first days, (possible) SO2 alerts sent by e-mail will lack their corresponding plots. They will be produced later.
When all operational processing will be running smoothly, back processing will be launched to get an homogeneous view.
BUFR V6.x (20180416)
Latest BUFR extractors are available here.
Outage (20180415) -- SOLVED for now
A circuit breaker tripped again this morning around 0300UTC.
Some computers, switches were abruptly put down.
Restart is forecast for Monday 0800UTC.
I've no clue on the source of this recurring problem. Technical support from the electricians will be requested.
HDD failure (20180409) - Update (20180410)
The faulty disk has been replaced. RAID is rebuilding.
Services will restart gradually today.
AJAX (the main server) has lost one of his hard drives. A ticket has been initiated with our hardware provider.
No services will be available until repair.
Service interruption (20180405)
A maintenace interruption is forecast on HESIONE. This will cause interruption of rsync services with IPSL.
Duration is not yet determined, but should not exceed two working days.
A circuit breaker tripped, causing an outage of some services. All servers have restarted.
Further investigations will be needed to find the source of this now recurring problem.
IASI_L2_v6.4 (20180307) updated
V6.4 has been deployed, and a bug in reading the CLP files made all the retrievals worthless. Patches were succesfully applied.
Backprocessing is running.
A new version of L2 will soon be deployed at Eumetsat. It will include a patch to account for the CO2 evolution during time.
This will affect mainly temperature profiles and accordingly Forli (and possibly Brescia) products.
Some preview tests files were processed for 20180107 -> 20180109 on IASI-B.
New year starts with inventory...
All L2 distributed by Eumetcast are now part of the inventory. This makes the display a bit awkward, but since it's for internal purpose...
SO2 data have been added to the database, and provisional HNO3 (NIT) placeholder inserted.
Some missing data have been recovered from IPSL, and back processed as needed.
New preview interface (20171026) updated
Interface has been also enabled for Brescia. New plots will be generated in the new format starting from 20171031. Older plots, will be regenrated in due time.(eg: after Forli)
A new preview interface has been set up for Forli results. The design is theoretically "smart"phone friendly. Please report any odd behaviours.
Interpolation and scatter plots are now shown separately. A checkbox in the selection box allows to switch interpolation on or off.
All plots are now being regenerated, but this will take a while.
Some plots are now publicly available in a seprate page.
Mars Attacks! (20171010)
Servers are currently undergoing severe attacks.
Countermeasures are being evaluated, but this will probably result in unexpected temporary services shutdowns.
Data Access (20171009)
Reports were given about data being inaccessible.
Filesystems were not correctly remounted at last reboot. Situation is no back to normal.
EumetCast reception (20171002) updated
A software upgrade should be performed this day. Reception will probably be shut down and data won't be received during this period.
As with all major software updates, unexpected problems could arise, therefore no duration could be provided.
Update has been performed without any problem. Reception and processing have resumed smoothly.
EumetCast reception (20170913) updated
An intervention (extension) will be performed on our Eumetcast reception system around 08:00 UTC.
During this time, no data will be received. Downtime should be relatively short, although no real estimation could be provided so far.
Intervention last about 1 hour, and successfuly allowed the installation of new modem for receiving second transponder. The presence of a splitter on the single cable arriving from the LNB induced a reduction of signal level, on transponder 1, of about 4dBm (-41 dBm to -45 dBm), which corresponds to a new power level at 40% of the preceding one (10% less than the 50/50 of an ideal splitter). More analysis on transponder 2 later next week.
SO2 Alerts (20170610)
A recent update, introduced a bug in the mail alerting system leading to an incorrect link to the alert picture.
Normally this should be corrected for future alerts.
All HDD have been upgraded to better disks. This took about 2 months!
Now the system seems stable and data are being transfered from other storages for the sake of better unification and readiness for the long awaited L1/L2 reprocessing.
Data are now all available again. Please report any suspicious behaviour or file.
Power Outages (20170425)
Two micro outages occured last night. All systems are down until further notice.
Brescia (20170303) -- SOLVED (20170304)
Since Feb 22, Brescia is crashing. The patch applied to use the corrected version of TWT files had introduced a severe "protection fault" issue.
So far no solution has been found.
System is still unavailable and pourparler are ongoing with Western Digital to possibly replace the 36 drives, which could misbehave with the Synology hardware.
On Feb 16th, Eumetsat changed without any prior warning the encoding of surface pressure. This broke the processing of all products.
Software are now patched accordingly and correct processing is ongoing. Back processing of rotten data, will be launched as soon as possible.
Plots (20170216) -- SOLVED
Plot processing queue is broken. For unknown reason, the jobs remain stucked in queue and must be launched manually.
This could make the daily and alert plots to be delivered lately.
Server was overloaded by dead loop processes generated on the Eumetsat update of the 16th
Another recovery for nothing, another crash occurred while storing data on it. No L1 or L2 are available from 2007 to april 2013. Forli/Brescia results are also partly unavailable.
Just after the recovery of the incident occured on Jan 23, another disk gave up. Interactions with Synology resumed.
No L1 or L2 are available from 2007 to april 2013. Forli/Brescia results are also partly unavailable.
Three disks simultaneously disapeared from the controller, leading to a crash of the RAID Structure and a total loss of about 32TB of data.
This means that no L1 or L2 are available from 2007 to april 2013.
A ticket has been openend with synology, with the hope to recover at least partly the lost data. Otherwise, a full download from Ether will have to be done, which will last around 40 to 50 days.
Mail server (20161028) Update
Apparently someone cut deliberately the power of the server without any permission.
The server had difficulty to restart due to this particularly brutal event. Now things seem to resume slowly.
My mail server went down... I will not answer any mail, and no alerts services will be available till Thursday November 3.
We've just received the new data server (116TB!). Set-up is ongoing.
We're facing mechanical problems to integrate it in the 19" rack.
Disk has been replaced and raid partiotion is rebuilding. Access should be available tomorrow.
One disk of the RAID is failling. All services are down till a new disk will be plugged in.
Timeline service has been eventually resumed in a relatively elementary behaviour.
Power Outage follow-up (20160704)
HESIONE has been fixed, with old spare components. This set-up is not guaranteed to work in the long term.
PRIAM error came from a misconfigured switch, which lost it's configuration at shutdown.
Power Outage results
HESIONE does not reboot anymore. This is unfortunately a definitive failure. This means no external services until further notice (i.e.: rsync and reports).
PRIAM network interface seems damaged. Test will be performed next week to add a new interface if possible.
Power Outage (20160630-20160701)
There will be a power interruption from June 30th to July 1st. No operation will be available during that time.
Services will stop from 1400 UTC to around 0830 UTC, sorry for any inconvenience.
Processing and data access are down. Problem will be investigated Monday.
Processing will restart Tuesday after the last viability checks. Processing restarted on Tuesday at around 10:00 Zulu. Missing data will be reprocessed when normal operation will be in steady state.
An unexpected maintenance has to be performed on AJAX. Outage should be short.
Operation have now resumed. (14:21Z)
Back Processing (20160106)
Back processing is now running in steady regime. Last estimation of procesing time is 8 days platform /day.
This means that from today it remains about 15 months of uninterrupted computations (provisional end date: March 2017) !!!!
Network servers were successfully updated. Apparently most of the services are operational. So far only small problems are visible with some web pages due to the deprecation of PHP 5 in favor of PHP 7.
V20151001 selection tools have been updated here and are now able to read previous file format.
Forli version 20151001 is now operational. Back-processing has started and should last about 417 days (Finished around Jan 2017)!!!
New selection tools are available here.
Server outage (20150927)
New server is almost operational.
Processing has restarted, and data are available.
Back-processing will start as soon as possible.
Server outage (20150923)
New server was delivered yesterday.
Installation has started.
Provisional restart date is Thursday Oct 1.
Server outage (20150902)
New server order has been made. Expect delivery delay of about 3 weeks.
Installation should last about one week.
Provisional restart date is Thursday Oct 1.
Server outage (20150824)
Main server is DEAD (hardware failure). No processing, reception and data available until replacement.
(20150825) An offer has been requested. Order will be placed ASAP. Expect a delay of 4 to 5 weeks between order and reception.
Server outage update (20150823)
Main server will be offline for investigations (as well as all other local services) Monday Aug 24th from 08:00 UTC until further notice.
Due to another crash date has been advanced.
Server crash (20150820)
Main server crashed again. NFS daemon generates a "general protection fault" leading to a kernel panic.
In order to investigate the problem a maintenance shutdown will be performed next week.
SO2 alerts (20150504) Update
Mailing service has now resumed.
Outage (20150415) Updated
Main server unexpectedly died (root partition was full).
All process have recovered now... No apparent loss.
SO2 alerts (20150317) Update
Mailing service is broken since February (due to a security patch in glibc).
The compiler + library association being unable to compile succesfully Brescia for the time being, I've no idea on when service could resume.
Plots are still available on usual webpage.
NPP-CrIS data from 2013 were purged to gain space on storage.
New https (20150125)
Changed server certificates (more secure ones) and removes SSL to keep only TLS.
New design (20150119)
A new design for the website has been implemented. Don't hesitate to send your comments, and eventual bug reports.
Backprocessing to version 20140922 has resumed.
Migration from DVB-S to DVB-S2 is now complete. All receptions parameters seem correct.
Only a few PDU's were definitivly lost during migration.
New BUFR extraction are available here.
This a preliminary version to be tested. As the new TWT are on a 110 grid instead of 90, an interpolation is performed to fit in the amp file structure.
New selection tools are available here for FORLI.
Preliminary COX select tool is available here. Functionalities are almost identical to usual select tools.
Data skimming <sticky>
New plot selection criteria. Based upon statistical analysis of the residuals recommended values are used to avoid partly cloudy scenes:
|CO||-0.15/0.25 10-9||2.7 10-9|
|HNO3||-0.60/0.40 10-9||3.0 10-8|
|O3||-0.75/1.25 10-9||3.5 10-8|
|* insufficient statistics.|
Ozone is the most affected, as the standard flags are normally sufficient for CO and HNO3.