First CF round at Cambridge. Solved A,B,D1 in the round. Dropped from purple to blue...
Still a long way to go...
Solution: CF Round #830 (Div. 2) D1&D2 Balance
Easy Version
Brute-force
Evidently the most brute-force way is to create a set to collect the \(x\) added. Then for all query with \(k,\) check \(k,2k,3k,\cdots\) till the first multiple of \(k\) that is not contained in the set. Output it.
Obviously it is doomed to TLE, especially when you are queried by the same \(k\) multiple times with very large \(k\text{-mex}\).
Becoming Elegant
We try to optimize the brute-force by reducing the time cost if queried by the same \(k.\) As there is no remove, if you are queried by \(k\) and you find the \(k\text{-mex},\) it is obvious that the next time if you are queried by the same \(k,\) the answer must be greater than or equal to the previous one.
Therefore, we can memorize all the "previous answers." If \(k\) that has a memorized answer is queried, we start checking the set from its previous answer instead of from \(1\cdot k.\)
Wait, do we avoid TLE just by this "subtle" optimization?
Calculation of time complexity (not rigorous):
For queries with the same \(k,\) every multiples of \(k\) in the set is checked at most once. So the time complexity is the same as if every query is moved to the end of the operations, and every \(k\) is queried at most once.
Then, the worst case happens (intuitively) when the first \(q/2\) operations fill the set with numbers between \(1\) and \(q/2,\) and the next \(q/2\) operations query for \(k=1,2,\cdots,q/2.\) In this case, the number of times checking the set is \(O(\sum_{k=1}^{q/2} \frac{q}{2k})=O(q\log q)\) by harmonic series. As every check of the set takes \(O(\log q)\) of time, the overall time complexity is \(O(q\log^2 q).\)
Code (795 ms / 12600 KB)
We use a map to memorize the previous answers. The function \(\mathtt{srch(x,fs)}\) takes \(\mathtt x\) as queried \(k\) and \(\mathtt{fs}\) its starting number.
//This program is written by Brian Peng.
#include<bits/stdc++.h>
using namespace std;
#define int long long
#define Rd(a) (a=rd())
#define Gc(a) (a=getchar())
#define Pc(a) putchar(a)
int rd(){
int x;char c(getchar());bool k;
while(!isdigit(c)&&c^'-')if(Gc(c)==EOF)exit(0);
c^'-'?(k=1,x=c&15):k=x=0;
while(isdigit(Gc(c)))x=x*10+(c&15);
return k?x:-x;
}
void wr(int a){
if(a<0)Pc('-'),a=-a;
if(a<=9)Pc(a|'0');
else wr(a/10),Pc((a%10)|'0');
}
signed const INF(0x3f3f3f3f),NINF(0xc3c3c3c3);
long long const LINF(0x3f3f3f3f3f3f3f3fLL),LNINF(0xc3c3c3c3c3c3c3c3LL);
#define Ps Pc(' ')
#define Pe Pc('\n')
#define Frn0(i,a,b) for(int i(a);i<(b);++i)
#define Frn1(i,a,b) for(int i(a);i<=(b);++i)
#define Frn_(i,a,b) for(int i(a);i>=(b);--i)
#define Mst(a,b) memset(a,b,sizeof(a))
#define File(a) freopen(a".in","r",stdin),freopen(a".out","w",stdout)
int q,x;
char opt;
set<int>s;
map<int,int>ans; //Memorization
int srch(int x,int fs);
signed main(){
Rd(q);
while(q--){
cin>>opt,Rd(x);
if(opt=='+')s.insert(x);
else{
if(ans.find(x)!=ans.end())ans[x]=srch(x,ans[x]);
else ans[x]=srch(x,x);
wr(ans[x]),Pe;
}
}
exit(0);
}
int srch(int x,int fs){
while(s.find(fs)!=s.end())fs+=x;
return fs;
}
Hard Version
Now the "remove" operation is added, and we can no longer memorize the previous answers simply.
Maybe we can use something more powerful, which is able to record "removed" \(x\)'s?
The most useful tool to record and query the existence of numbers in a given range is
Segment Tree
For every queried \(k,\) instead of memorizing the previous \(k\text{-mex}\), we build a segment tree of \(k\) recording the checked and not removed multiples of \(k.\) In the following text, we let \(\text{St}_k\) denote the "SegTree of \(k\)", and use \(x\in \text{St}_k\) to denote that \(x\) is recorded in the SegTree of \(k.\)
For a query with \(k\), if \(\text{St}_k\) is not set up yet (i.e. \(\text{St}_k\) is empty), we go through the multiples of \(k\) in the set, which are \(k,2k,3k,\cdots\) till the first multiple of \(k\) (say \(nk\)) that is not in the set. Then, the SegTree of \(k\) is built with the entries from \(1\) to \(n-1\) set as \(1,\) meaning that \(k,2k,\cdots,(n-1)k\in\text{St}_k.\) (As we only insert multiples of \(k\) into \(\text{St}_k,\) we let the \(i\)th entry of \(\text{St}_k\) represent the number \(ik\) to make the tree more compact.)
Thus, if a number is recorded in a SegTree, it is in the set.
Then, for removal of \(x\), we need to remove \(x\) from not only the set, but also from every SegTree that records it. To achieve this, we create a list of \(x\) (say \(\text{Tk}_x\)) that consists of all the \(k\)'s such that \(x\in\text{St}_k.\) In other words, if a certain \(x\) is recorded in the SegTree of \(k\), we add \(k\) into the list \(\text{Tk}_x\) so that when \(x\) is removed from the set, we remove \(x\) from all SegTrees recording it by going through every \(k\) in \(\text{Tk}_x\) and setting the \(x/k\)th entry in \(\text{St}_k\) to be \(0.\) We clear \(\text{Tk}_x\) at the end of the process as \(x\) is no longer recorded in any SegTree.
Now, if a \(k\) is queried a second time, we find the least entry in \(\text{St}_k\) that is \(0.\) (This is why we need to use a SegTree instead of an array, as we may check whether a sub-interval is set all \(1\) by checking if the sum of the interval is equal to its length.) Say this entry is the \(n\)th. If \(nk\) is not in the set, we output \(nk\) as \(k\text{-mex}.\) Otherwise, if \(nk\) is in the set, we update the \(n\)th entry in \(\text{St}_k\) to be \(1,\) add \(k\) into the list \(\text{Tk}_{nk},\) and repeat the process of seeking the least entry in \(\text{St}_k\) that is \(0.\)
Code Implementation
As the range of \(k\) and \(x\) in the input is very large, I use #define int long long
(a wicked trick) for convenience and signed
is used in place of int
if such a large range is not needed.
Lazy Created Segment Tree
We may see that most of the entries in a SegTree are \(0,\) and most of the \(k\)'s even do not have a SegTree if they are never queried. Thus, we need Lazy Created SegTree to reduce time and memory complexity.
The following is the structure of a node in a lazy created SegTree:
struct SgT{signed ls,rs,v;}t[10000000];
Here, \(\mathtt {ls,rs}\) represent the ids of left/right-son respectively, and \(\mathtt v\) represents the sum of the interval the node represents. (The interval is not stored in the nodes explicitly, but they will be clear in functions.)
How lazy creation is achieved
-
We use a counter \(\mathtt{tn}\) (initial value \(0\)) to record the highest id of the SegTree nodes. Then whenever a new node is created, we add \(1\) to \(\mathtt{tn}\) and use it as the id of the new node.
-
Particularly, the node with id \(0\) represents an interval with entries all \(0,\) and at the beginning every SegTree has only node \(0.\) If a son of a node is \(0,\) it means that its half-interval is filled with \(0.\)
-
We use a map
map<int,signed>rt
to store the root of \(\text{St}_k\) (rt[k]
). For every SegTree, we set its root interval be \([1,q]\) as any number greater than or equal \((q+1)k\) can never be \(k\text{-mex}.\) (Why?) -
We also use a map
map<int,list<int>>tk
to store the lists \(\text{Tk}_x\) (tk[x]
).
Note: apart from the SegTree, the use of \(\mathtt{map}\) for roots and lists are also Lazy Creation.
For convenience, we use \(\mathtt u\) as the id of the node we are dealing with in a function, and we use #define
to simplify the id of its two sons:
#define Ls (t[u].ls)
#define Rs (t[u].rs)
How let's look at how these operations are implemented.
Query: Lazy Creation, Updating, and Query in one function
Suppose we are dealing with the SegTree \(\text{St}_k.\) The qry function \(\mathtt{qry(u,k,l,r)}\) returns the least \(n\) in the interval \([l,r]\) such that \(nk\) is not in the set. If the interval is all filled with \(1,\) return \(0\) as the default value.
int qry(signed&u,int k,int l,int r){
if(!u)u=++tn;
//Lazy Creation happen's here. !!!IMPORTANT: Pass u by Reference!!!
if(t[u].v==r-l+1)return 0;
//If the sum is equal to length, then every entry is 1.
if(l==r){
//Check if l*k is in the set.
if(st.find(l*k)!=st.end()){
t[u].v=1,tk[l*k].push_back(k);
//l*k is in the set. Update the SegTree and add k into the list tk[l*k].
return 0;
}
else return l;
//l*k is not in the set, return l (meaning the current k-mex is l*k).
}
int md((l+r)/2),ql(qry(Ls,k,l,md));
//Query the left half-interval first
if(ql){
//Found the k-mex, update the SegTree and return.
t[u].v=t[Ls].v+t[Rs].v;
return ql;
}
//Left half-interval filled with 1. Query the right-interval.
int qr(qry(Rs,k,md+1,r));
t[u].v=t[Ls].v+t[Rs].v;
return qr;
}
Removal
The modification function \(\mathtt{mdf(u,l,r,x)}\) set the \(\mathtt x\)th entry (Note that the \(\mathtt x\)th entry represents the number \(\mathtt xk\) recorded in \(\text{St}_k\)) to be \(0\) in the SegTree with root \(\mathtt u.\) For the SegTree \(\text{St}_k,\) if we want to remove the number \(x,\) we implement mdf(rt[k],1,q,x/k)
.
// When implementing, always set l=1 and r=q.
void mdf(signed u,int l,int r,int x){
while(1){
//Descending from the root to the leaf.
--t[u].v;
if(l==r)return;
int md((l+r)/2);
x<=md?(r=md,u=Ls):(l=md+1,u=Rs);
//Direction chosen by x.
}
}
Time complexity Calculation (Not rigorous)
As every checking of set (\(O(\log q)\)) is accompanied by a SegTree search (\(O(\log q)\) as the SegTree interval is \([1,q]\)) and possibly a SegTree modification (also \(O(\log q)\)) for a "remove" later, the time complexity is the same as the easy version: \(O(q\log^2 q).\)
Code (1200 ms / 162500 KB)
//This program is written by Brian Peng.
#include<bits/stdc++.h>
using namespace std;
#define int long long
#define Rd(a) (a=rd())
#define Gc(a) (a=getchar())
#define Pc(a) putchar(a)
int rd(){
int x;char c(getchar());bool k;
while(!isdigit(c)&&c^'-')if(Gc(c)==EOF)exit(0);
c^'-'?(k=1,x=c&15):k=x=0;
while(isdigit(Gc(c)))x=x*10+(c&15);
return k?x:-x;
}
void wr(int a){
if(a<0)Pc('-'),a=-a;
if(a<=9)Pc(a|'0');
else wr(a/10),Pc((a%10)|'0');
}
signed const INF(0x3f3f3f3f),NINF(0xc3c3c3c3);
long long const LINF(0x3f3f3f3f3f3f3f3fLL),LNINF(0xc3c3c3c3c3c3c3c3LL);
#define Ps Pc(' ')
#define Pe Pc('\n')
#define Frn0(i,a,b) for(int i(a);i<(b);++i)
#define Frn1(i,a,b) for(int i(a);i<=(b);++i)
#define Frn_(i,a,b) for(int i(a);i>=(b);--i)
#define Mst(a,b) memset(a,b,sizeof(a))
#define File(a) freopen(a".in","r",stdin),freopen(a".out","w",stdout)
#define Ls (t[u].ls)
#define Rs (t[u].rs)
int q,x,tn;
char opt;
map<int,signed>rt;
map<int,list<int>>tk;
set<int>st;
struct SgT{signed ls,rs,v;}t[10000000];
int qry(signed&u,int k,int l,int r);
void mdf(signed u,int l,int r,int x);
signed main(){
Rd(q);
Frn1(i,1,q){
cin>>opt,Rd(x);
if(opt=='+')st.insert(x);
else if(opt=='-'){
st.erase(x);
//Remove x from the set, and go through the list tk[x] if exists.
if(tk.find(x)!=tk.end()){
for(int k:tk[x])mdf(rt[k],1,q,x/k);
tk.erase(x);
//Remove the list tk[x].
}
}else{
signed tmp(rt[x]?rt[x]:(rt[x]=++tn));
//As Pass by Reference cannot be used with map,
//we do lazy creation manually
wr(qry(tmp,x,1,q)*x),Pe;
}
}
exit(0);
}
int qry(signed&u,int k,int l,int r){
if(!u)u=++tn;
if(t[u].v==r-l+1)return 0;
if(l==r){
if(st.find(l*k)!=st.end()){
t[u].v=1,tk[l*k].push_back(k);
return 0;
}
else return l;
}
int md((l+r)/2),ql(qry(Ls,k,l,md));
if(ql){
t[u].v=t[Ls].v+t[Rs].v;
return ql;
}
int qr(qry(Rs,k,md+1,r));
t[u].v=t[Ls].v+t[Rs].v;
return qr;
}
void mdf(signed u,int l,int r,int x){
while(1){
--t[u].v;
if(l==r)return;
int md((l+r)/2);
x<=md?(r=md,u=Ls):(l=md+1,u=Rs);
}
}
Conclusion
Why is Lazy Created SegTree effective in the hard version problem? An intuitive explanation:
In the easy version of the problem, there is no remove, so the non-decresing nature of \(k\text{-mex}\) with time for a fixed \(k\) leads us to the idea of storing answers, so that we can "move up" from the previous answer in a later query of the same \(k\).
In the hard problem, the non-decresing nature of \(k\text{-mex}\) is destroyed by the remove operation, and we can no longer record previous answer only. Recording the "checked numbers" in a SegTree, on the other hand, provides us with an efficient way to "move back" to a removed \(x,\) and "jump up" if the removed \(x\) is inserted into the set again.
Last but not least, the idea of Lazy Creation speeds our code up by creating the data structure only when they are to be used. This idea is extremely useful when the data range (\(1\leqslant x,k\leqslant 10^{18}\)) is a lot larger than the number of operations (\(1\leqslant q\leqslant 2\cdot 10^5\)).