标签：Genomics set Cow 斑点 Bovine int ch cows he

[USACO17OPEN] Bovine Genomics G

题目描述

Farmer John owns \(N\) cows with spots and \(N\) cows without spots. Having just completed a course in bovine genetics, he is convinced that the spots on his cows are caused by mutations in the bovine genome.

At great expense, Farmer John sequences the genomes of his cows. Each genome is a string of length \(M\) built from the four characters A, C, G, and T. When he lines up the genomes of his cows, he gets a table like the following, shown here

for \(N=3\) and \(M=8\):

Positions: 1 2 3 4 5 6 7 8

Spotty Cow 1: A A T C C C A T
Spotty Cow 2: A C T T G C A A
Spotty Cow 3: G G T C G C A A

Plain Cow 1: A C T C C C A G
Plain Cow 2: A C T C G C A T
Plain Cow 3: A C T T C C A T

Looking carefully at this table, he surmises that the sequence from position 2 through position 5 is sufficient to explain spottiness. That is, by looking at the characters in just these these positions (that is, positions \(2 \ldots 5\)), Farmer John can predict which of his cows are spotty and which are not. For example, if he sees the characters GTCG in these locations, he knows the cow must be spotty.

Please help FJ find the length of the shortest sequence of positions that can explain spottiness.

农夫约翰拥有N头带斑点的奶牛和N头没有斑点的奶牛。他刚刚完成了牛遗传学课程，他确信奶牛上的斑点是由牛基因组突变引起的。

农夫约翰花了大钱对他奶牛的基因组进行测序。每个基因组都是一串长度为M的字符串，由四个字符A，C，G和T构成。当他排列奶牛的基因组时，他得到一张如下表，如下所示

对于N = 3和M = 8：

Positions：1 2 3 4 5 6 7 8

Positions: 1 2 3 4 5 6 7 8

Spotty Cow 1: A A T C C C A T
Spotty Cow 2: A C T T G C A A
Spotty Cow 3: G G T C G C A A

Plain Cow 1: A C T C C C A G
Plain Cow 2: A C T C G C A T
Plain Cow 3: A C T T C C A T
他仔细查看该表，认为从位置2到位置5的顺序足以解释斑点。也就是说，仅通过查看这些位置（即位置2 ……5）中的字符，农夫约翰就可以预测出他的哪些母牛斑点，哪些不是斑点。例如，如果他在这些位置看到字符GTCG，他就知道那头母牛一定是斑点的。

请帮助FJ找到可以解释斑点的最短位置序列的长度。

FJ有一些有斑点和一些没有斑点的牛，他想搞清楚到底什么基因控制这个牛有没有斑点。

于是他找了n有斑点的牛和n头没有斑点的牛

这些牛的基因长度为m（基因中之包含ATCG四个字母）

求这个序列中的一个子串，可以确定是否有斑点。

子串需要符合要求：有斑点的牛这部分的子串，不能和无斑点的牛的这部分子串相同

求最短子串长度

输入格式

The first line of input contains \(N\) (\(1 \leq N \leq 500\)) and \(M\) (\(3 \leq M \leq 500\)). The next \(N\) lines each contain a string of \(M\) characters; these describe the genomes of the spotty cows.

The final \(N\) lines describe the genomes of the plain cows. No spotty cow has the same exact genome as a plain cow.

输出格式

Please print the length of the shortest sequence of positions that is sufficient to explain spottiness. A sequence of positions explains spottiness if the spottiness trait can be predicted with perfect accuracy among Farmer John's population of cows by looking at just those locations in the genome.

样例 #1

样例输入 #1

3 8
AATCCCAT
ACTTGCAA
GGTCGCAA
ACTCCCAG
ACTCGCAT
ACTTCCAT

样例输出 #1

分析

这本是一个字符串hash题

但是set容器的使用让这道题变得~~轻而易举~~

读题可知

给定n个A串和n个B串，长度均为m，求一个最短的区间[l,r]

使得不存在一个A串a和一个B串b，使得a[l,r]=b[l,r]

求最大的最小

直接二分

上代码

code

Elaina's code

#include<bits/stdc++.h>
using namespace std;
const int N=505;
#define ll long long
//#define int long long
#define inf 0x3f
#define INF 0x3f3f3f3f
#define Elaina 0
inline int read(){
    int x=0,f=1;
    char ch=getchar();
    for(;!isdigit(ch);ch=getchar()) if(ch=='-') f=-1;
    for(;isdigit(ch);ch=getchar()) x=x*10+ch-'0';
    return x*f;
}

int n,m;
string a[N],b[N];
set<string> s;//set容器 -->集合 

bool check(int x,int k){
	s.clear();//清空 
	
	for(int i=1;i<=n;i++){
		s.insert(a[i].substr(x,k));//插入元素 
	}
	for(int i=1;i<=n;i++){
		if(s.count(b[i].substr(x,k))){//count(i)判断容器中是否存在i这个元素 
			return 0;
		}
	}
	return 1;
}

main(){
	n=read(),m=read();
	
	for(int i=1;i<=n;i++){
		cin>>a[i];
	}
	
	for(int i=1;i<=n;i++){
		cin>>b[i];
	}
	
	int l=1,r=m,mid;
	bool flag;
	while(l<r){
		flag=0;
		mid=(l+r)>>1;
		for(int i=0;i<=m-mid;i++){
			if(check(i,mid)){
				flag=1;
				break;
			}
		}
		if(flag){
			r=mid;
		}else{
			l=mid+1;
		}
	}
	
	printf("%d\n",l);
	return Elaina;
}

~~都看到这了，真的不点个赞吗(>ω<*)~~

标签：Genomics,set,Cow,斑点,Bovine,int,ch,cows,he
From： https://www.cnblogs.com/Elaina-0/p/18154586

P3667 [USACO17OPEN] Bovine Genomics G （set容器）