Chapter 13. Troubleshooting a service network
Typically, you can create a service network without referencing this troubleshooting guide. However, this guide provides some tips for situations when the service network does not perform as expected.
See Section 13.8, “Resolving common problems” if you have encountered a specific issue using the skupper CLI.
A typical troubleshooting workflow is to check all the sites and create debug tar files.
13.1. Checking sites Copy linkLink copied to clipboard!
Using the skupper command-line interface (CLI) provides a simple method to get started with troubleshooting Skupper.
Procedure
Check the site status:
skupper status --namespace west
$ skupper status --namespace west Skupper is enabled for namespace "west" in interior mode. It is connected to 2 other sites. It has 1 exposed services.Copy to Clipboard Copied! Toggle word wrap Toggle overflow The output shows:
- A site exists in the specified namespace.
- A link exists to two other sites.
- A service is exposed on the service network and is accessible from this namespace.
Check the service network:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteIf the output is not what you expected, you might want to check links before proceeding.
The output shows:
-
There are 3 sites on the service network,
vm-user-c3d98,eastandwest. - Details for each site, for example the namespace names.
-
There are 3 sites on the service network,
Check the status of services exposed on the service network (
-vis only available on Kubernetes):Copy to Clipboard Copied! Toggle word wrap Toggle overflow The output shows the
backendservice and the related target of that service.NoteAs part of output each site reports the status of the policy system on that cluster.
List the Skupper events for a site:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The output shows sites being linked and a service being exposed on a service network. However, this output is most useful when reporting an issue and is included in the Skupper debug tar file.
List the Kubernetes events for a site:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The output shows events relating to Kubernetes resources.
Additional information
13.2. Checking links Copy linkLink copied to clipboard!
You must link sites before you can expose services on the service network.
By default, tokens expire after 5 minutes and you can only use a token once. Generate a new token if the link is not connected. You can also generate tokens using the -token-type cert option for permanent reusable tokens.
This section outlines some advanced options for checking links.
Check the link status:
skupper link status --namespace east
$ skupper link status --namespace east Links created from this site: ------------------------------- Link link1 is connectedCopy to Clipboard Copied! Toggle word wrap Toggle overflow A link exists from the specified site to another site, meaning a token from another site was applied to the specified site.
NoteRunning
skupper link statuson a connected site produces output only if a token was used to create a link.If you use this command on a site where you did not create the link, but there is an incoming link to the site:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the verbose link status:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The output shows detail about the link, including a timestamp of when the link was created and the associated relative cost of using the link.
The status of the link must be
Connectedto allow service traffic.
Additional information
13.3. Checking gateways Copy linkLink copied to clipboard!
By default, skupper gateway creates a service type gateway and these gateways run properly after a machine restart.
However, if you create a docker or podman type gateway, check that the container is running after a machine restart. For example:
Check the status of Skupper gateways:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow This shows a podman type gateway.
Check that the container is running:
podman ps
$ podman ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 4e308ef8ee58 quay.io/skupper/skupper-router:1.5 /home/skrouterd/b... 26 seconds ago Up 27 seconds ago machine-userCopy to Clipboard Copied! Toggle word wrap Toggle overflow This shows the container running.
NoteTo view stopped containers, use
podman ps -aordocker ps -a.Start the container if necessary:
podman start machine-user
$ podman start machine-userCopy to Clipboard Copied! Toggle word wrap Toggle overflow
13.4. Checking policies Copy linkLink copied to clipboard!
As a developer you might not be aware of the Skupper policy applied to your site. Follow this procedure to explore the policies applied to the site.
Procedure
- Log into a namespace where a Skupper site has been initialized.
Check whether incoming links are permitted:
kubectl exec deploy/skupper-service-controller -- get policies incominglink
$ kubectl exec deploy/skupper-service-controller -- get policies incominglink ALLOWED POLICY ENABLED ERROR ALLOWED BY false true Policy validation error: incoming links are not allowedCopy to Clipboard Copied! Toggle word wrap Toggle overflow In this example incoming links are not allowed by policy.
Check other policies:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow As shown, there are commands to check each policy type by specifying what you want to do, for example, to check if you can expose an nginx deployment:
kubectl exec deploy/skupper-service-controller -- get policies expose deployment nginx
$ kubectl exec deploy/skupper-service-controller -- get policies expose deployment nginx ALLOWED POLICY ENABLED ERROR ALLOWED BY false true Policy validation error: deployment/nginx cannot be exposedCopy to Clipboard Copied! Toggle word wrap Toggle overflow If you allowed an nginx deployment, the same command shows that the resource is allowed and displays the name of the policy CR that enabled it:
kubectl exec deploy/skupper-service-controller -- get policies expose deployment nginx
$ kubectl exec deploy/skupper-service-controller -- get policies expose deployment nginx ALLOWED POLICY ENABLED ERROR ALLOWED BY true true allowedexposedresourcesCopy to Clipboard Copied! Toggle word wrap Toggle overflow
13.5. Creating a Skupper debug tar file Copy linkLink copied to clipboard!
The debug tar file contains all the logs from the Skupper components for a site and provides detailed information to help debug issues.
Create the debug tar file:
skupper debug dump my-site
$ skupper debug dump my-site Skupper dump details written to compressed archive: `my-site.tar.gz`Copy to Clipboard Copied! Toggle word wrap Toggle overflow You can expand the file using the following command:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow These files can be used to provide support for Skupper, however some items you can check:
- versions
-
See
*versions.txtfor the versions of various components. - ingress
-
See
skupper-site-configmap.yamlto determine theingresstype for the site. - linking and services
-
See the
skupper-service-controller-*-events.txtfile to view details of token usage and service exposure.
13.6. Understanding Skupper sizing Copy linkLink copied to clipboard!
In September 2023, a number of tests were performed to explore Skupper performance at varying allocations of router CPU. You can view the results in the sizing guide.
The conclusions for router CPU and memory are shown below.
Router CPU
The primary factor to consider when scaling Skupper for your workload is router CPU. (Note that due to the nature of cluster ingress and connection routing, it is important to focus on scaling the router vertically, not horizontally.)
Two CPU cores (2,000 millicores) per router is a good starting point. It includes some headroom and provides low latencies for a large set of workloads.
If the peak throughput required by your workload is low, it is possible to achieve satisfactory latencies with less router CPU.
Some workloads are sensitive to network latency. In these cases, the overhead introduced by the router can limit the achievable throughput. This is when CPU amounts higher than two cores per router may be required.
On the flip side, some workloads are tolerant of network latency. In these cases, one core or less may be sufficient.
These benchmark results are not the last word. They depend on the specifics of our test environment. To get a better idea of how Skupper performs in your environment, you can run these benchmarks yourself.
Router memory
Router memory use scales with the number of open connections. In general, a good starting point is 4G.
| Memory | Concurrent open connections | |
| 512M | 8,192 | |
| 1G | 16,384 | |
| 2G | 32,768 | |
| 4G | 65,536 | |
| 8G | 131,072 | |
| 16G | 262,144 | |
| 32G | 524,288 | |
| 64G | 104,8576 |
13.7. Improving Skupper router performance Copy linkLink copied to clipboard!
If you encounter Skupper router performance issues, you can scale the Skupper router to address those concerns.
Currently, you must delete and recreate a site to reconfigure the Skupper router.
For example, use this procedure to increase throughput, and if you have many clients, latency.
Delete your site or create a new site in a different namespace.
Note all configuration and delete your existing site:
skupper delete
$ skupper deleteCopy to Clipboard Copied! Toggle word wrap Toggle overflow As an alternative, you can create a new namespace and configure a new site with optimized Skupper router performance. After validating the performance improvement, you can delete and recreate your original site.
Create a site with optimal performance CPU settings:
skupper init --router-cpu 5
$ skupper init --router-cpu 5Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Recreate your configuration from step 1, recreating links and services.
While you can address availability concerns by scaling the number of routers, typically this is not necessary.
13.8. Resolving common problems Copy linkLink copied to clipboard!
The following issues and workarounds might help you debug simple scenarios when evaluating Skupper.
Cannot initialize skupper
If the skupper init command fails, consider the following options:
Check the load balancer.
If you are evaluating Skupper on minikube, use the following command to create a load balancer:
minikube tunnel
$ minikube tunnelCopy to Clipboard Copied! Toggle word wrap Toggle overflow For other Kubernetes flavors, see the documentation from your provider.
Initialize without ingress.
This option prevents other sites from linking to this site, but linking outwards is supported. Once a link is established, traffic can flow in either direction. Enter the following command:
skupper init --ingress none
$ skupper init --ingress noneCopy to Clipboard Copied! Toggle word wrap Toggle overflow NoteSee the Skupper Podman CLI reference documentation for
skupper init.
Cannot link sites
To link two sites, one site must be accessible from the other site. For example, if one site is behind a firewall and the other site is on an AWS cluster, you must:
- Create a token on the AWS cluster site.
- Create the link on the site inside the firewall.
By default, a token is valid for only 15 minutes and can only be used once. See Using Skupper tokens for more information on creating different types of tokens.
Cannot access Skupper console
Starting with Skupper release 1.3, the console is not enabled by default. To use the new console, see Using the console.
Use skupper status to find the console URL.
Use the following command to display the password for the admin user:doctype: article
kubectl get secret/skupper-console-users -o jsonpath={.data.admin} | base64 -d
$ kubectl get secret/skupper-console-users -o jsonpath={.data.admin} | base64 -d
Cannot create a token for linking clusters
There are several reasons why you might have difficulty creating tokens:
- Site not ready
After creating a site, you might see the following message when creating a token:
Error: Failed to create token: Policy validation error: Skupper is not enabled in namespace
Error: Failed to create token: Policy validation error: Skupper is not enabled in namespaceCopy to Clipboard Copied! Toggle word wrap Toggle overflow Use
skupper statusto verify the site is working and try to create the token again.- No ingress
You might see the following note after using the
skupper token createcommand:Token written to <path> (Note: token will only be valid for local cluster)
Token written to <path> (Note: token will only be valid for local cluster)Copy to Clipboard Copied! Toggle word wrap Toggle overflow This output indicates that the site was deployed without an ingress option. For example
skupper init --ingress none. You must specify an ingress to allow sites on other clusters to link to your site.You can also use the
skupper token createcommand to check if an ingress was specified when the site was created.
13.9. Deleting services from the service network Copy linkLink copied to clipboard!
This section describes how services can be disabled for a service network.
Prerequisites
- A service network
- An exposed service
Procedure
- Navigate to the context where the service was exposed.
Delete the service:
skupper service delete <service-name>
$ skupper service delete <service-name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow where
service-nameis the name of the service you want to remove.NoteTCP connections can remain active for an extended duration. After deleting the service, existing connections continue to communicate until the TCP connection is terminated. For example, if you exposed a database connection over the service network, and then delete the service. New database connections cannot be established. However, deleting the service does not affect existing connections. To terminate the existing connection, manually stop the database connection.
Check that the service is removed.
skupper service status
$ skupper service statusCopy to Clipboard Copied! Toggle word wrap Toggle overflow The service should not be listed.
NoteTypically, if the service is still listed, it is because you issued the
deletecommand from the wrong site context. By default, when you expose a service from a site, that service becomes available on all other sites, however you can delete the service only from the original site context.