Using Telegram as infinite cloud storage

14 Feb, 2024

A while ago I was backing up my homelab and the total size was around 300 GB, there was no problem storing it locally, but I still need an offsite backup.

So the journey began: looking for a cloud storage provider worth paying for.

Cloud Storage Providers

Google Drive

I don’t really like storing files in Google Drive, but since the backups are encrypted, there’s no problems storing them there.

Taking a look at the plans, the one that would fit is the premium one with 2 TB for R$ 39 per month. It’s “ok” but when you realize that most of the time you won’t be using it that much, it’s not worth it.

OneDrive

Microsoft OneDrive is just stupid, the service sucks and the prices are absurd: R$ 359 per year for 1 TB (and a bunch of trash, aka the rest of the 365 services). Google Drive is R$ 468 per year but you get 2 TB (and the service sucks less).

The only “benefit” is that OneDrive also comes with the rest of Microsoft 365 services, but I don’t use them and I feel like they are there just to try and justify the price.

Mega

Mega is also expensive, R$ 533,07 per year for 2 TB.

And now? If basically every cloud storage provider isn’t worth it, where am I going to store it?

Teledrive

While doing some research I remembered a project a friend told me about a long time ago called Teledrive that does exactly that, it gives you a pretty web ui and uploads files to Telegram.

The problem it that the project is basically dead and I was felling like doing some tinkering.

How that works?

Telegram has basically 2 APIs, one for bots and another for clients. The API for bots is easier to use but has more limitations, download limit of 20 MB, 50 MB upload and rate limit. The regular API for client in other hand has only a 2 GB per file limit.

With that information I decided to write some code in rust with grammers as a proof-of-concept.

Connecting to Telegram

With bots, you can just put the token and there’s no further interaction needed, but for clients you need to confirm with an code. Fortunately grammers has a way to store the session into a file and load it later.

use grammers_client::{Client, Config, SignInError};
use grammers_session::Session;
use std::io::{self, BufRead, Write};

// From https://github.com/Lonami/grammers/blob/f2ad7a37a2ad466623dcaef014e8075102723a30/lib/grammers-client/examples/downloader.rs#L158
fn prompt(message: &str) -> io::Result<String> {
    let mut stdout = io::stdout().lock();

    stdout.write_all(message.as_bytes())?;
    stdout.flush()?;

    let mut stdin = io::stdin().lock();
    let mut line = String::new();

    stdin.read_line(&mut line)?;

    Ok(line)
}

const SESSION_FILE: &str = "telegram.session";

#[actix_web::main]
async fn main() -> std::io::Result<()> {
    let telegram_client = Client::connect(Config {
        session: Session::load_file_or_create(SESSION_FILE)?,
        api_id: env::var("API_ID")
            .expect("API_ID not set")
            .parse()
            .expect("Failed to parse API_ID"),
        api_hash: env::var("API_HASH")
            .expect("API_HASH not set")
            .parse()
            .expect("Failed to parse API_HASH"),
        params: Default::default(),
    })
    .await
    .unwrap();

    if !telegram_client.is_authorized().await.unwrap() {
        println!("Signing in");

        let phone_number = prompt("Enter your phone number: ")?;
        let token = telegram_client
            .request_login_code(&phone_number)
            .await
            .unwrap();
        let code = prompt("Enter the code that you received: ")?;
        let signed_in = telegram_client.sign_in(&token, &code).await;

        match signed_in {
            Err(SignInError::PasswordRequired(pass_token)) => {
                let hint = pass_token.hint().unwrap();
                let prompt_msg = format!("Enter your password (hint {}): ", &hint);
                let password = prompt(prompt_msg.as_str())?;

                telegram_client
                    .check_password(pass_token, password.trim())
                    .await
                    .unwrap();
            }
            Ok(_) => (),
            Err(err) => panic!("{}", err),
        }

        println!("Signed in. Saving session file");
        match telegram_client.session().save_to_file(SESSION_FILE) {
            Ok(_) => println!("Done."),
            Err(err) => {
                println!("Failed to save session. Signing out\n{}", err);

                telegram_client
                    .sign_out_disconnect()
                    .await
                    .expect("Failed to sign out");
            }
        }
    } else {
        println!("Connected.\n");
    }
}

Encrypting files

Another part of the idea is end-to-end encryption, and that is pretty simple. I’ll be using orion aed.

use orion::aead;
use sha2::{Digest, Sha256};
use std::env;

pub fn encrypt(data: &[u8]) -> Vec<u8> {
    let mut hasher = Sha256::new();
    hasher.update(
        env::var("ENCRYPTION_KEY")
            .expect("Missing environment variable ENCRYPTION_KEY")
            .as_bytes(),
    );
    let pass_hash = hasher.finalize();
    let key = aead::SecretKey::from_slice(&pass_hash).unwrap();

    aead::seal(&key, data).unwrap()
}

pub fn decrypt(data: &[u8]) -> Vec<u8> {
    let mut hasher = Sha256::new();
    hasher.update(
        env::var("ENCRYPTION_KEY")
            .expect("Missing environment variable ENCRYPTION_KEY")
            .as_bytes(),
    );
    let pass_hash = hasher.finalize();
    let key = aead::SecretKey::from_slice(&pass_hash).unwrap();

    aead::open(&key, data).unwrap()
}

Is that the best way to make it? I don’t really know, and in the future I’ll probably use openssl and AES.

Sending messages

Ok now we know how to connect, encrypt but how about actually sending messages? Again, fortunately grammers makes it pretty simple.

use grammers_client::{
    client::auth::InvocationError,
    types::{Downloadable, Message},
    Client, InputMessage,
};
use grammers_session::{PackedChat, PackedType};
use std::{env, sync::Arc};
use tokio::sync::Mutex;

// This is only required when using it with actix
// or something similar
type TelegramClient = Arc<Mutex<Client>>;

pub fn message_packet() -> PackedChat {
    PackedChat {
        ty: PackedType::Chat,
        id: env::var("GROUP_ID")
            .expect("Missing environment variable GROUP_ID")
            .parse()
            .expect("Failed to parse GROUP_ID to i64"),
        access_hash: None,
    }
}

pub async fn send_message_with_document(
    client: TelegramClient,
    filepath: &String,
) -> Result<Message, InvocationError> {
    let client = client.lock().await;
    let uploaded_file = (*client).upload_file(filepath).await.unwrap();
    (*client)
        .send_message(
            message_packet(),
            // blank message with a file
            InputMessage::text("").document(uploaded_file),
        )
        .await
}

pub async fn download_message_document(
    client: TelegramClient,
    message_id: i32,
    output_path: String,
) {
    let client = client.lock().await;
    let message = &(*client)
        .get_messages_by_id(message_packet(), &[message_id])
        .await
        .unwrap()[0];
    // No need to loop
    // There's will be only one message with the id

    match message {
        Some(msg) => {
            if let Some(media) = msg.media() {
                (*client)
                    .download_media(&Downloadable::Media(media), output_path)
                    .await
                    .expect("Failed to download file")
            }
        }
        None => {}
    }
}

Splitting the files in chunks

If the file is bigger than the Telegram upload limit (2 GB), we will need to split the file into smaller chunks. In cli you could use the split command but we are implementing this in code this time.

This is probably due to the way that I implemented and probably there’s a better way to do, but we will need to open the file so we could parse and split. For smaller files this is not a problem, but when the file is big and bigger than the amount of free ram that you have, OOM will kill the process. So we need to solve that issue: opening the file for parsing without using all the ram.

And for that you could use memory mapped files, which allows you to read files without loading it entire in ram at the cost of more heavy disk usage. I’ll be using mmap for that.

use memmap::Mmap;
use std::{fs, path::PathBuf};

// ~ 200 MB
pub const TARGET_CHUNK_SIZE: i64 = 209715200;
// 2GB
pub const TELEGRAM_MAX_FILESIZE: u64 = 2000000000;

pub async fn parse(...) {
    let file = fs::File::open(&filepath).unwrap();
    let filesize = file.metadata().unwrap().len();

    // If the file is smaller than 2 GB
    // we can just encrypt and upload
    if filesize < TELEGRAM_MAX_FILESIZE {
        let mmap_file = unsafe { Mmap::map(&file).unwrap() };
        let encrypted = encryption::encrypt(&mmap_file);

        // Send and store metadata about the file
        // in a database
        // ...
    } else {
        // If the file is bigger than 2GB
        // we will need to split the file into smaller chunks.
        let mmap_file = unsafe { Mmap::map(&file).unwrap() };
        let filesize = filesize as i64;
        // Calculate the amount of chunks based on the
        // target chunk size.
        // Smaller TARGET_CHUNK_SIZE equals more chunks
        let num_chunks = (filesize + TARGET_CHUNK_SIZE - 1) / TARGET_CHUNK_SIZE;
        let chunksize = (filesize + num_chunks - 1) / num_chunks;
        // Split the contents in chunks
        let chunks = mmap_file.chunks(chunksize as usize).collect::<Vec<&[u8]>>();

        for (index, chunk) in chunks.iter().enumerate() {
            let encrypted = encryption::encrypt(chunk);

            // Again, send and store metadata about the file
        }
    }
}

This is a pretty bare bones implementation but it explains how its done.

Splitting files into chunks also have other benefits such as faster uploads/downloads (specially if you have a slow connection) and faster encryption (who would guess that encrypting less data means a faster encryption).

Mounting the file

Mounting the chunks is easy, since you only need to invert the process.

This depends on how you stored the metadata about the file. The way that I did is to create 2 tables, one for files and another for chunks. If the file has 1 chunks (which means that it was not divided), you could just call the download_message_document function with the message ID. But if the file was divided and you will need to loop over every chunk and download each one.

My tables look like this:

CREATE TABLE Files (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  filename TEXT NOT NULL,
  size INTEGER NOT NULL,
  message_id INTEGER NOT NULL,
  total_chunks INTEGER NOT NULL,
)

CREATE TABLE Chunks (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  parent_id INTEGER NOT NULL, // file id
  message INTEGER NOT NULL
)

Again, another bare bones implementation as an example.

use memmap::Mmap;
use serde::{Deserialize, Serialize};
use std::{fs, io::Write};
use uuid::Uuid;

#[derive(Deserialize, Serialize, Debug)]
pub struct FileQuery {
    id: i64,
    filename: String,
    message_id: i64,
    total_chunks: i64,
}

#[derive(Deserialize, Serialize, Debug)]
pub struct ChunkQuery {
    id: i64,
    parent_id: i64,
    message: i64,
}

pub async fn mount(...) {
    let file_query = sqlx::query_as!(
        FileQuery,
        "SELECT id, filename, message_id, total_chunks FROM Files WHERE id=?",
        id
    )
    .fetch_one(&mut *conn)
    .await;

    match file_query {
        Ok(file) => {
            // The file was split, we will need
            // to fetch all the chunks
            if file.total_chunks > 1 {
                let chunks_query = sqlx::query_as!(
                    ChunkQuery,
                    "SELECT * FROM Chunks WHERE parent_id=?",
                    file.id
                )
                .fetch_all(&mut *conn)
                .await;

                match chunks_query {
                    Ok(chunks) => {
                        // Temp path for storing the chunks
                        let uuid = Uuid::new_v4();
                        let path = format!("./downloads/{}", uuid);

                        fs::create_dir_all(path.clone()).unwrap();

                        let mut outfile = fs::OpenOptions::new()
                            .write(true)
                            .create(true)
                            .append(true)
                            .open(format!("{}/{}", path, file.filename))
                            .unwrap();

                        fs::create_dir_all(&path).unwrap();

                        // Download each chunk and join its contents
                        for (i, chunk) in chunks.iter().enumerate() {
                            let chunk_path = &*format!("{}/{}", path, i);
                            message_utils::download_message_document(
                                telegram_client,
                                chunk.message as i32,
                                chunk_path.to_string(),
                            )
                            .await;

                            let chunk_file = fs::File::open(chunk_path).unwrap();
                            // There's no real reason to use mmap since the file is small,
                            // but why not?
                            let chunk_contents = unsafe { Mmap::map(&chunk_file).unwrap() };
                            // Decrypt the contents
                            let decrypted = decrypt(&chunk_contents);

                            outfile.write(&decrypted).unwrap();
                        }
                    }
                    Err(err) => {}
                }
            } else {
                // One chunks = not split
                // So we can just download and decrypt
                let uuid = Uuid::new_v4();
                // Temp path
                let path = format!("./downloads/{}", uuid);
                let filename = format!("{}/{}", path, file.filename);

                fs::create_dir_all(path).unwrap();

                message_utils::download_message_document(
                    telegram_client,
                    file.message_id as i32,
                    format!("{}-encrypted", filename),
                )
                .await;

                let mut outfile = fs::OpenOptions::new()
                    .write(true)
                    .create(true)
                    .append(true)
                    .open(&filename)
                    .unwrap();
                let encrypted_file = fs::File::open(format!("{}-encrypted", &filename)).unwrap();
                let encrypted_file_contents = unsafe { Mmap::map(&encrypted_file).unwrap() };
                let decrypted = decrypt(&encrypted_file_contents);

                outfile.write(&decrypted).unwrap();
            }
        }
        Err(err) => {}
    }
}

Conclusion

It’s pretty simple to apply that ideia and use Telegram as infinite cloud storage but its not perfect. Download and upload speeds are slow for big files, since we need to do a bunch of things before sending and Telegram itself isn’t fast for that.

The initial idea was to create a clone of Teledrive, but dealing with these things in a web server is pretty annoying and I think that would be better to do an app running locally for that. From now I will try to do that and see how that goes.

See ya!