Chapter 16. Troubleshooting a service network
Typically, you can create a service network without referencing this troubleshooting guide. However, this guide provides some tips for situations when the service network does not perform as expected.
See Section 16.8, “Resolving common problems” if you have encountered a specific issue using the skupper
CLI.
A typical troubleshooting workflow is to check all the sites and create debug tar files.
16.1. Checking sites
Using the skupper
command-line interface (CLI) provides a simple method to get started with troubleshooting Skupper.
Procedure
Check the site status:
$ skupper status --namespace west Skupper is enabled for namespace "west" in interior mode. It is connected to 2 other sites. It has 1 exposed services.
The output shows:
- A site exists in the specified namespace.
- A link exists to two other sites.
- A service is exposed on the service network and is accessible from this namespace.
Check the service network:
$ skupper network status Sites: ├─ [local] a960b766-20bd-42c8-886d-741f3a9f6aa2(west) │ │ namespace: west │ │ site name: west │ │ version: 1.8.1 │ ╰─ Linked sites: │ ├─ 496ca1de-0c80-4e70-bbb4-d0d6ec2a09c0(east) │ │ direction: outgoing │ ╰─ 484cccc3-401c-4c30-a6ed-73382701b18a() │ direction: incoming ├─ [remote] 496ca1de-0c80-4e70-bbb4-d0d6ec2a09c0(east) │ │ namespace: east │ │ site name: east │ │ version: 1.8.1 │ ╰─ Linked sites: │ ╰─ a960b766-20bd-42c8-886d-741f3a9f6aa2(west) │ direction: incoming ╰─ [remote] 484cccc3-401c-4c30-a6ed-73382701b18a() │ site name: vm-user-c3d98 │ version: 1.8.1 ╰─ Linked sites: ╰─ a960b766-20bd-42c8-886d-741f3a9f6aa2(west) direction: outgoing
NoteIf the output is not what you expected, you might want to check links before proceeding.
The output shows:
-
There are 3 sites on the service network,
vm-user-c3d98
,east
andwest
. - Details for each site, for example the namespace names.
-
There are 3 sites on the service network,
Check the status of services exposed on the service network (
-v
is only available on Kubernetes):$ skupper service status -v Services exposed through Skupper: ╰─ backend:8080 (tcp) ╰─ Sites: ├─ 4d80f485-52fb-4d84-b10b-326b96e723b2(west) │ policy: disabled ╰─ 316fbe31-299b-490b-9391-7b46507d76f1(east) │ policy: disabled ╰─ Targets: ╰─ backend:8080 name=backend-9d84544df-rbzjx
The output shows the
backend
service and the related target of that service.NoteAs part of output each site reports the status of the policy system on that cluster.
List the Skupper events for a site:
$ skupper debug events NAME COUNT AGE GatewayQueryRequest 3 9m12s 3 gateway request 9m12s SiteQueryRequest 3 9m12s 3 site data request 9m12s ServiceControllerEvent 9 10m24s 2 service event for west/frontend 10m24s 1 service event for west/backend 10m26s 1 Checking service for: backend 10m26s 2 Service definitions have changed 10m26s 1 service event for west/skupper-router 11m4s DefinitionMonitorEvent 15 10m24s 2 service event for west/frontend 10m24s 1 service event for west/backend 10m26s 1 Service definitions have changed 10m26s 5 deployment event for west/frontend 10m34s 1 deployment event for west/skupper-service-controller 11m4s ServiceControllerUpdateEvent 1 10m26s 1 Updating skupper-internal 10m26s ServiceSyncEvent 3 10m26s 1 Service interface(s) added backend 10m26s 1 Service sync sender connection to 11m4s amqps://skupper-router-local.west.svc.cluster.local:5671 established 1 Service sync receiver connection to 11m4s amqps://skupper-router-local.west.svc.cluster.local:5671 established IpMappingEvent 5 10m34s 1 172.17.0.7 mapped to frontend-6b4688bf56-rp9hc 10m34s 2 mapped to frontend-6b4688bf56-rp9hc 10m54s 1 172.17.0.4 mapped to 11m4s skupper-service-controller-6c97c5cf5d-6nzph 1 172.17.0.3 mapped to skupper-router-547dffdcbf-l8pdc 11m4s TokenClaimVerification 1 10m59s 1 Claim for efe3a241-3e4f-11ed-95d0-482ae336eb38 succeeded 10m59s
The output shows sites being linked and a service being exposed on a service network. However, this output is most useful when reporting an issue and is included in the Skupper debug tar file.
List the Kubernetes events for a site:
kubectl get events | grep "deployment/skupper-service-controller" 10m Normal ServiceSyncEvent deployment/skupper-service-controller Service sync receiver connection to amqps://skupper-router-local.private1.svc.cluster.local:5671 established 10m Normal ServiceSyncEvent deployment/skupper-service-controller Service sync sender connection to amqps://skupper-router-local.private1.svc.cluster.local:5671 established 10m Normal ServiceControllerCreateEvent deployment/skupper-service-controller Creating service productcatalogservice 7m59s Normal TokenHandler deployment/skupper-service-controller Connecting using token link1 7m54s Normal TokenHandler deployment/skupper-service-controller Connecting using token link2
The output shows events relating to Kubernetes resources.
Additional information
16.2. Checking links
You must link sites before you can expose services on the service network.
By default, tokens expire after 5 minutes and you can only use a token once. Generate a new token if the link is not connected. You can also generate tokens using the -token-type cert
option for permanent reusable tokens.
This section outlines some advanced options for checking links.
Check the link status:
$ skupper link status --namespace east Links created from this site: ------------------------------- Link link1 is connected
A link exists from the specified site to another site, meaning a token from another site was applied to the specified site.
NoteRunning
skupper link status
on a connected site produces output only if a token was used to create a link.If you use this command on a site where you did not create the link, but there is an incoming link to the site:
$ skupper link status --namespace west Links created from this site: ------------------------------- There are no links configured or connected Currently connected links from other sites: ---------------------------------------- A link from the namespace east on site east(536695a9-26dc-4448-b207-519f56e99b71) is connected
Check the verbose link status:
$ skupper link status link1 --verbose --namespace east Cost: 1 Created: 2022-10-24 12:50:33 +0100 IST Name: link1 Namespace: east Site: east-536695a9-26dc-4448-b207-519f56e99b71 Status: Connected
The output shows detail about the link, including a timestamp of when the link was created and the associated relative cost of using the link.
The status of the link must be
Connected
to allow service traffic.
Additional information
16.3. Checking gateways
By default, skupper gateway
creates a service type gateway and these gateways run properly after a machine restart.
However, if you create a docker or podman type gateway, check that the container is running after a machine restart. For example:
Check the status of Skupper gateways:
$ skupper gateway status Gateway Definition: ╰─ machine-user type:podman version:1.8 ╰─ Bindings: ╰─ mydb:3306 tcp mydb:3306 localhost 3306
This shows a podman type gateway.
Check that the container is running:
$ podman ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 4e308ef8ee58 quay.io/skupper/skupper-router:1.8 /home/skrouterd/b... 26 seconds ago Up 27 seconds ago machine-user
This shows the container running.
NoteTo view stopped containers, use
podman ps -a
ordocker ps -a
.Start the container if necessary:
$ podman start machine-user
16.4. Checking policies
As a developer you might not be aware of the Skupper policy applied to your site. Follow this procedure to explore the policies applied to the site.
Procedure
- Log into a namespace where a Skupper site has been initialized.
Check whether incoming links are permitted:
$ kubectl exec deploy/skupper-service-controller -- get policies incominglink ALLOWED POLICY ENABLED ERROR ALLOWED BY false true Policy validation error: incoming links are not allowed
In this example incoming links are not allowed by policy.
Check other policies:
$ kubectl exec deploy/skupper-service-controller -- get policies Validates existing policies Usage: get policies [command] Available Commands: expose Validates if the given resource can be exposed incominglink Validates if incoming links can be created outgoinglink Validates if an outgoing link to the given hostname is allowed service Validates if service can be created or imported
As shown, there are commands to check each policy type by specifying what you want to do, for example, to check if you can expose an nginx deployment:
$ kubectl exec deploy/skupper-service-controller -- get policies expose deployment nginx ALLOWED POLICY ENABLED ERROR ALLOWED BY false true Policy validation error: deployment/nginx cannot be exposed
If you allowed an nginx deployment, the same command shows that the resource is allowed and displays the name of the policy CR that enabled it:
$ kubectl exec deploy/skupper-service-controller -- get policies expose deployment nginx ALLOWED POLICY ENABLED ERROR ALLOWED BY true true allowedexposedresources
16.5. Creating a Skupper debug tar file
The debug tar file contains all the logs from the Skupper components for a site and provides detailed information to help debug issues.
Create the debug tar file:
$ skupper debug dump my-site Skupper dump details written to compressed archive: `my-site.tar.gz`
You can expand the file using the following command:
$ tar -xvf kind-site.tar.gz k8s-versions.txt skupper-versions.txt skupper-router-deployment.yaml skupper-router-867f5ddcd8-plrcg-skstat-g.txt skupper-router-867f5ddcd8-plrcg-skstat-c.txt skupper-router-867f5ddcd8-plrcg-skstat-l.txt skupper-router-867f5ddcd8-plrcg-skstat-n.txt skupper-router-867f5ddcd8-plrcg-skstat-e.txt skupper-router-867f5ddcd8-plrcg-skstat-a.txt skupper-router-867f5ddcd8-plrcg-skstat-m.txt skupper-router-867f5ddcd8-plrcg-skstat-p.txt skupper-router-867f5ddcd8-plrcg-router-logs.txt skupper-router-867f5ddcd8-plrcg-config-sync-logs.txt skupper-service-controller-deployment.yaml skupper-service-controller-7485756984-gvrf6-events.txt skupper-service-controller-7485756984-gvrf6-service-controller-logs.txt skupper-site-configmap.yaml skupper-services-configmap.yaml skupper-internal-configmap.yaml skupper-sasl-config-configmap.yaml
These files can be used to provide support for Skupper, however some items you can check:
- versions
-
See
*versions.txt
for the versions of various components. - ingress
-
See
skupper-site-configmap.yaml
to determine theingress
type for the site. - linking and services
-
See the
skupper-service-controller-*-events.txt
file to view details of token usage and service exposure.
16.6. Understanding Skupper sizing
In September 2023, a number of tests were performed to explore Skupper performance at varying allocations of router CPU. You can view the results in the sizing guide.
The conclusions for router CPU and memory are shown below.
Router CPU
The primary factor to consider when scaling Skupper for your workload is router CPU. (Note that due to the nature of cluster ingress and connection routing, it is important to focus on scaling the router vertically, not horizontally.)
Two CPU cores (2,000 millicores) per router is a good starting point. It includes some headroom and provides low latencies for a large set of workloads.
If the peak throughput required by your workload is low, it is possible to achieve satisfactory latencies with less router CPU.
Some workloads are sensitive to network latency. In these cases, the overhead introduced by the router can limit the achievable throughput. This is when CPU amounts higher than two cores per router may be required.
On the flip side, some workloads are tolerant of network latency. In these cases, one core or less may be sufficient.
These benchmark results are not the last word. They depend on the specifics of our test environment. To get a better idea of how Skupper performs in your environment, you can run these benchmarks yourself.
Router memory
Router memory use scales with the number of open connections. In general, a good starting point is 4G.
Memory | Concurrent open connections | |
512M | 8,192 | |
1G | 16,384 | |
2G | 32,768 | |
4G | 65,536 | |
8G | 131,072 | |
16G | 262,144 | |
32G | 524,288 | |
64G | 104,8576 |
16.7. Improving Skupper router performance
If you encounter Skupper router performance issues, you can scale the Skupper router to address those concerns.
Currently, you must delete and recreate a site to reconfigure the Skupper router.
For example, use this procedure to increase throughput, and if you have many clients, latency.
Delete your site or create a new site in a different namespace.
Note all configuration and delete your existing site:
$ skupper delete
As an alternative, you can create a new namespace and configure a new site with optimized Skupper router performance. After validating the performance improvement, you can delete and recreate your original site.
Create a site with optimal performance CPU settings:
$ skupper init --router-cpu 5
- Recreate your configuration from step 1, recreating links and services.
While you can address availability concerns by scaling the number of routers, typically this is not necessary.
16.8. Resolving common problems
The following issues and workarounds might help you debug simple scenarios when evaluating Skupper.
Cannot initialize skupper
If the skupper init
command fails, consider the following options:
Check the load balancer.
If you are evaluating Skupper on minikube, use the following command to create a load balancer:
$ minikube tunnel
For other Kubernetes flavors, see the documentation from your provider.
Initialize without ingress.
This option prevents other sites from linking to this site, but linking outwards is supported. Once a link is established, traffic can flow in either direction. Enter the following command:
$ skupper init --ingress none
NoteSee the Skupper Podman CLI reference documentation for
skupper init
.
Cannot link sites
To link two sites, one site must be accessible from the other site. For example, if one site is behind a firewall and the other site is on an AWS cluster, you must:
- Create a token on the AWS cluster site.
- Create the link on the site inside the firewall.
By default, a token is valid for only 15 minutes and can only be used once. See Using Skupper tokens for more information on creating different types of tokens.
Cannot access Skupper console
Starting with Skupper release 1.3, the console is not enabled by default. To use the new console, see Using the console.
Use skupper status
to find the console URL.
Use the following command to display the password for the admin
user:doctype: article
$ kubectl get secret/skupper-console-users -o jsonpath={.data.admin} | base64 -d
Cannot create a token for linking clusters
There are several reasons why you might have difficulty creating tokens:
- Site not ready
After creating a site, you might see the following message when creating a token:
Error: Failed to create token: Policy validation error: Skupper is not enabled in namespace
Use
skupper status
to verify the site is working and try to create the token again.- No ingress
You might see the following note after using the
skupper token create
command:Token written to <path> (Note: token will only be valid for local cluster)
This output indicates that the site was deployed without an ingress option. For example
skupper init --ingress none
. You must specify an ingress to allow sites on other clusters to link to your site.You can also use the
skupper token create
command to check if an ingress was specified when the site was created.