[SOLVED] Networking problems 12/12/2012

We're experiencing networking problems at the moment which results in packet loss. We're looking into the problem right now and keep you updated via this post.

++++

Update 12/12/12 14:11 by Gerben

We're still looking for a root cause. It looks like there is something wrong with the 2 connections between our 2 sites. 

++++

Update 12/12/12 14:30 by Dennis

It appears the issues were caused by corrupted forwarding tables on two of our core switches. We failed over the routing engines to the backup switches and have rebooted the affected switches.

So far everything seems to be stable again. We will continue monitoring our network closely.

++++

Update 12/12/12 15:28 by Gerben

We see some packet loss again. Looking into it.

++++

Update 12/12/12 16:59 by Dennis

The packet forwarding process crashed again. We have opened a case with Juniper networks JTAC. The errors we see suggest the crashes might be related to a MAC learning bug. We have made a couple of small changes to our broadcast domains to reduce the number of addresses in the local MAC table in the hope of stabilizing our network.

++++

Update 12/12/12 17:28 by Dennis

The packet forwarding process crashed again. We are still working with JTAC to find the root cause.

++++

Update 12/12/12 19:59 by Dennis

We haven't seen a crash anymore. We're still waiting on JTAC to give us an update. In the meantime we've made a couple of configuration changes. Still too early to tell if these changes made any difference.

++++

Update 12/12/12 20:16 by Dennis

Another crash. Our modifications didn't make any difference.

++++

Update 12/12/12 21:29 by Dennis

We got feedback from JTAC that this is a known issue in the JunOS release that we're running (null pointer exception in the packet forwarding engine daemon). They recommend we upgrade the JunOS firmware to a version that includes a fix. We will probably have to schedule some emergency maintenance for tonight.

++++

Update 12/12/12 23:38 by Dennis

We have announced an emergency maintenance window to our clients. We start with maintenance of our core routers in Haarlem between 0:00AM and 1:00AM CET with about 15 to 30 minutes of downtime for clients hosted there. Between 1:00AM and 2:00AM we will do the same for our core routers in Amsterdam. We will keep this entry updated with progress.

++++

Update 12/12/13 00:05 by Dennis

Core routers in Haarlem are currently rebooting and should be back within 15 minutes.

++++

Update 12/12/13 00:15 by Dennis

Core routers in Haarlem have been succesfully upgraded. Total downtime was 8 minutes. At 01:00AM CET we will do the same for our Amsterdam core routers.

++++

Update 12/12/13 01:05 by Dennis

Core routers in Haarlem are stable so far. We are now rebooting the core routers in Amsterdam. They should be back within 10 minutes.

++++

Update 12/12/13 01:14 by Dennis

Core routers in Amsterdam have been succesfully upgraded as well. Downtime was 8 minutes.

++++

Update 12/12/13 14:28 by Dennis

We haven't seen any crashes since the upgrade and the network is stable again.

Hebt u meer vragen? Een aanvraag indienen

0 Opmerkingen

Artikel is gesloten voor opmerkingen.