What is Searx?

Searx is a search engine that allows you to search multiple different engines at once (also known as a metasearch engine) and removes all kinds of personal information from your queries. Searx does a lot of other cool things under the hood that you can read about here. I am writing this guide because there are tons of old, outdated, inaccurate, and just plain wrong articles about searx. To further muddy the waters, there are now two searx forks that are very very similar (searx and searxng). This was initially supposed to just be my personal guide on how to set searx up in case I ever needed to rebuild it in the future but I figured it would be helpful to others.

With that being said, this guide will show you how to set up searx on an EC2 instance and use security groups to make it only accessible from 1 IP address.

Creating an EC2 Instance

Creating an AWS EC2 instance is a great (and cheap) way to host a server without having to deal with other server hosting shenangains. To create one for free, create an AWS account and head over to the EC2 console. Click launch instances and select “Ubuntu Server 20.04 LTS”.

Fun fact, Ubuntu version numbers that are even are the long term support versions (LTS) which are updated and maintained for 5 years.

Anyway, select next and chose the “Free Tier Eligible” t2.micro instance. Click “Review and launch” then “Launch”. You will then be prompted about creating a key pair. A key pair allows you to authenticate to a server based on a “key file” instead of a password. This is more generally more secure because key files are much longer than passwords and the server can be only access if you are in possession of that file. More info on SSH keys here. Select “create a new key pair” and give it a name. For this I used searxEc2. Then select “Download Key Pair”.

The next steps are assuming you’re using a Linux host.

  1. Copy the key from ~/Downloads/ to ~/.ssh
cp ~/Downloads/searxEc2.pem ~/.ssh
  1. Change permissions to key to make it not world readable
sudo chmod 600 .ssh/searxEc2.pem
  1. SSH into server using your key file (note, the default user on an Ubuntu instance is “Ubuntu”)
ssh -i ~/.ssh/searxEc2.pem ubuntu@your.instance.ip

Setting up Security groups

Security groups are essentially a firewall that allows you to only allow traffic to/from certain ports and allows you to limit which IP’s can access your EC2 instance. Navigate back to the EC2 dashboard and click “Security Groups” under Network & Security. Click “Create security group” , give it a name (searx) and add the following ports.

Make sure you are editing the INBOUND RULES, not the outbound rules.

  • SSH: 22
  • HTTP: 80
  • SSL: 443
  • Web: 8080
  • Filtron Proxy: 4040
  • Filtron API: 4041

Under “Source”, enter in your public IPv4 Address and /32 as the CIDR (which is just a single host). If you don’t know what your public IP is, there is a very hand website for that.. Every “Source” field should look something like this (but with your public IP) 123.123.123.123/32

Save the rules and click Instances. Right click on your instance -> Security -> Change security groups -> Select your searx Group -> click “Add security group” -> Save

Make sure you click “Add security group” before hitting save or you changes will not be applied.

Great, now reconnect to your EC2 instance with ssh -i ~/.sshSearxEc2.pem ubuntu@your.instance.ip and verify you can still connect.

Bonus: Verify Ports With Nmap

You can easily verify that your security groups worked by running an nmap scan on your instance IP.

This command will run a quick scan (-T4) on your IP and all ports (-p-). It will take some time but when it is done you should see that the only open ports are the ones you defined above.

➜ $ nmap -T4 <your_instance_ip> -p-
Starting Nmap 7.80 ( https://nmap.org ) at 1970-01-01 00:00 EST
Nmap scan report for <your_instance> (your_instance_ip)
Host is up (0.015s latency).
Not shown: 9996 filtered ports
PORT     STATE  SERVICE
22/tcp   open   ssh
80/tcp   open   http
443/tcp  open   https
4041/tcp closed houston
8080/tcp closed http-proxy

Nmap done: 1 IP address (1 host up) scanned in 420.69 seconds

Pointing Your Domain to Searx

You’re going to want to point your domain name to your EC2 Instance’s public IP address early on in this process or you will only be able to access your searx instance via IP and that’s boring! To accomplish this, head over to your domain provider where you bought your domain (Route53, google domains, namecheap, etc). Once there you will need to find the DNS settings where you can add your own records. You will only need to add one record which will be an A record. The host will be @ and the value will be the IP address of your EC2 instance. The @ is used to denote the current origin more info on that here.

It is important to note that DNS can take a long time to propagate (up to 24 hours). If you want to verify your DNS setting have propagated, you can run dig <your domain name>. If you see the ip address of your EC2 instance returned in the A value field then you know your DNS records have propagated to the DNS server your EC2 instance is using.

Installing docker and searx

Now that you’ve pointed your domain to your EC2 instance and you’ve verified you can connect via SSH, you are ready to begin setting up docker and searx. SSH Into your instance using ssh -i ~/.sshSearxEc2.pem ubuntu@your.instance.ip.

  1. Install docker and docker-compose
sudo apt install docker docker-compose
  1. Elevate to root
sudo su
  1. Change to the /usr/local/ directory
cd /usr/local/
  1. Clone the searx docker container
git clone https://github.com/searx/searx-docker.git
  1. Change into the directory you just cloned
cd searx-docker
  1. Use sed to replace the string in the .env file “ReplaceWithARealKey!” with a random 33 digit string using the openssl command
sed -i "s|ReplaceWithARealKey\!|$(openssl rand -base64 33)|g" .env
  1. Edit the .env file and inset your domain name (example.com) after the SEARX_HOSTNAME= variable and add a valid email to the end of the LETSENCRYPT_EMAIL= variable. This is required for getting a valid letsencrypt certificate.

  1. Run the startup script
./start.sh
  1. Create a copy of the template for systemd to utilize
cp searx-docker.service.template searx-docker.service
  1. Allow the searx docker container to start on system boot.
systemctl enable $(pwd)/searx-docker.service
  1. Start the searx docker container (right now) using systemd
systemctl start searx-docker.service
  1. Reboot just to be safe.
reboot

That’s it! You should now be able to navigate to your domain name and be greeted with the Searx page!

Bonus: Adding to flame

This probably does not pertain to most people, but I figured I would showcase adding searx to flame. Flame is a self-hostable start page that allows you to use a custom search engine for your search bar.

In order to tell flame what search engine you want to use, you need the “search url” of your searx instance. To get this head the “preferences” tab of your searx instance and scroll down to the Search URL. Copy this long string of text and make sure you don’t include anything after the “=” at the end of the string. Anything after that “=” will be queried when you search using flame.

Now go to the search settings of flame, click “Add new search provider” and in the “Query Template” field, add the long string you copied before. (Reminder that the search string should end with a “=”)

Lets Have a Chat

Is Searx Actually Private?

To answer this you need to understand the difference between a public and private instance. You have two options when it comes to using searx.

Use a public instance

A public instance is an instance of searx that is hosted by another company or person. It is still using searx, however, you don’t know what is happening to those requests. Only do this if you trust the person/company hosting it.

Pros of using a public instance:

  • You get to use searx without having to selfhost it
  • You are not the only one using that instance, thus any search is not directly tied back to your device/IP.

Cons of using a public instance:

  • You don’t know what’s going on in the backend, the admins could be doing anything they want.
  • They may go down or be slow.

Use a private instance.

A private instance is a selfhosted one like I created above.

**Pros of a private instance: **

  • You know exactly what is going on in the backend and know your data is not being collected, watched, or sold.
  • You have total control over all preferences, availability, etc

Cons of using a private instance

  • Searches performed on your instance are tied to your EC2 instance IP, if you’re the only one using it then it was obviously you who issued the search.

So which one is better? Imagine searx is a room with a one way mirror with a $20 bill sitting in the middle of the room.

A public instance would be akin to 100 people walking into a room and one stealing the $20. Lots of people went in so its not clear who took the $20, but if the owner of the instance was sitting behind the one way mirror watching (IE: Snooping in the server logs), they would know who took it.

A private instance would be like 1 person (you) walking into the room and stealing the $20 bill. You have reasonable assurance no one is watching the room behind the mirror because you built the room, but it was obviously you who took the $20. The only way someone would know is if they secretly had access to the room (your EC2 instance).

Personally, I think selfhosting it is a better solution because I don’t trust anyone but myself to host searx but I’m simply using searx so I put less data out there for companies to sell, if you need additional protection you should be using a VPN/TOR to hide your IP. More info can be found here and various other blogs.

What is the difference between Searx and Searx-NG?

According to what I could find by looking through github issues, Searx was developed by two people, asciimoo and dalf. Dalf wanted to implement new features such as stats into searx but asciimoo didn’t want add this so dalf forked it into searx-ng. According to the current maintainer of searx:

The difference of vision between the maintainers is regarding how privacy-respecting searx should be. The original guidelines of searx state that features that risk the privacy of users are not welcomed to the repo. Consider this fork of searx as a shinier version with more features that might expose you to others. Searx is going to be kept minimal with fewer features and less danger to your privacy. Also, I do not think that searxng is going to abandon all privacy-protecting features, but that part should be clarified by @dalf.

It does not mean that searx is not maintained anymore. It just has a higher bar when it comes to privacy. If you would like to get privacy without compromising, I suggest you use searx. If you are willing to trade some of the protections searx provides for more features, you should choose this fork.

What is filtron?

Filtron is a reverse HTTP proxy that filters requests. Essentially its a WAF that was designed for searx. It runs on port 4005

What is morty?

Morty is a sanitizer proxy. It rewrites webpages to

  • Remove malicious HTML
  • Rewrite HTML references
  • Block Javascript
  • Remove referrers
  • Removes Etags

Plus some other cool things.

Conclusion

Hopefully this clears up any questions you had about searx. It’s my preferred search engine but there is lots of terrible documentation, outdated guides, and inaccurate information out there. Have questions? Reach out to me on twitter

References

https://searx.github.io/searx/

https://en.wikipedia.org/wiki/Searx

https://github.com/searx/searx

https://github.com/searx/searx-docker

https://danten.io/searx-how-to-setup-your-own-search-engine/

https://michaelbovyn.com/?page_id=1237

https://news.ycombinator.com/item?id=28674242